# Pandas

Pandas is a Python library written for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. Some of its features include:
- **DataFrame object for data manipulation.**
- Tools for reading and writing data between in-memory data structures and different file formats.
- Data alignment and integrated handling of missing data.
- Reshaping and pivoting of data sets.
- Label-based slicing, fancy indexing, and subsetting of large data sets.
- Time series-functionality: Date range generation, frequency conversions, moving window statistics, moving window linear regressions, date shifting and lagging.
- Optimized with core parts being written in C.
- Many more

Memorizing all functionalities of pandas is unfeasible and impractical. It's probably a good idea to have handy the [documentation](https://pandas.pydata.org/docs/reference/index.html#api).

## The DataFrame object

**DataFrame** is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table. It is generally the most commonly used pandas object. DataFrames are used to organize and manipulate data. They also enable access to most of pandas' functionalities. Let's see some simple examples of how to create a DataFrame

<img src="indexing.png" alt="drawing" width="500"/>

In [2]:
import pandas as pd

In [3]:
df = pd.DataFrame()

In [14]:
df = pd.read_csv('flights.dat', sep=";",date_parser=pd.to_datetime,parse_dates=['timestamp'])
with pd.option_context('display.max_colwidth', None):
    display(df)

Unnamed: 0,id,origin,dest,type,lat,lon,alt,timestamp
0,BHL731,ZZZZ,ZZZZ,S92,60.783333,3.433333,1000,2024-01-13 07:20:00
1,BHL731,ZZZZ,ZZZZ,S92,60.716667,3.566667,1000,2024-01-13 07:00:00
2,BHL741,ZZZZ,ZZZZ,S92,60.716667,3.566667,1000,2024-01-13 07:00:00
3,BHL358,ZZZZ,ENWV,S92,56.316667,3.350000,1000,2024-01-13 11:10:00
4,EWG6YJ,EDDL,LROP,A320,51.280833,6.757222,100,2024-01-13 09:51:00
...,...,...,...,...,...,...,...,...
1266394,ASL78E,LYBE,LBSF,AT72,42.598611,23.648611,7700,2024-01-13 13:16:18
1266395,ASL78E,LYBE,LBSF,AT72,42.681667,23.658056,5600,2024-01-13 13:17:31
1266396,ASL78E,LYBE,LBSF,AT72,42.683611,23.620556,5000,2024-01-13 13:17:53
1266397,ASL78E,LYBE,LBSF,AT72,42.691667,23.470833,2500,2024-01-13 13:19:37


### Tasks for today
- data cleaning
- filtering
- number of flights
- flight duration
- get all OD pairs
- for a given flight, extract the whole trajectory
- **Homework: Find closest distance between flights!**
