# Analyze NYC-Flight data

**Description:** The dataset contains information about all flights that departed from NYC (e.g. EWR, JFK and LGA) in 2013. Total flights are 336,776.

**What to do with this data:** Extract the following information from data,
* Departure delays
* Best airport in terms of time departure %
* Aircraft speed analysis
* On time arrival % analysis
* Maximum number of flights headed to some particular destination


**Set working directory path**

In [1]:
working_directory_path="D://PythonLearn/"

**Import numpy and pandas package**

In [2]:
import numpy as np
import pandas as pd

**Read flight data from file 'flight_data.csv'**

In [3]:
flight_data = pd.read_csv(working_directory_path+'flight_data.csv')
flight_data

Unnamed: 0,year,month,day,dep_time,sched_dep_time,dep_delay,arr_time,sched_arr_time,arr_delay,carrier,flight,tailnum,origin,dest,air_time,distance,hour,minute,time_hour
0,2013,1,1,517.0,515,2.0,830.0,819,11.0,UA,1545,N14228,EWR,IAH,227.0,1400,5,15,01-01-2013 05:00
1,2013,1,1,533.0,529,4.0,850.0,830,20.0,UA,1714,N24211,LGA,IAH,227.0,1416,5,29,01-01-2013 05:00
2,2013,1,1,542.0,540,2.0,923.0,850,33.0,AA,1141,N619AA,JFK,MIA,160.0,1089,5,40,01-01-2013 05:00
3,2013,1,1,544.0,545,-1.0,1004.0,1022,-18.0,B6,725,N804JB,JFK,BQN,183.0,1576,5,45,01-01-2013 05:00
4,2013,1,1,554.0,600,-6.0,812.0,837,-25.0,DL,461,N668DN,LGA,ATL,116.0,762,6,0,01-01-2013 06:00
5,2013,1,1,554.0,558,-4.0,740.0,728,12.0,UA,1696,N39463,EWR,ORD,150.0,719,5,58,01-01-2013 05:00
6,2013,1,1,555.0,600,-5.0,913.0,854,19.0,B6,507,N516JB,EWR,FLL,158.0,1065,6,0,01-01-2013 06:00
7,2013,1,1,557.0,600,-3.0,709.0,723,-14.0,EV,5708,N829AS,LGA,IAD,53.0,229,6,0,01-01-2013 06:00
8,2013,1,1,557.0,600,-3.0,838.0,846,-8.0,B6,79,N593JB,JFK,MCO,140.0,944,6,0,01-01-2013 06:00
9,2013,1,1,558.0,600,-2.0,753.0,745,8.0,AA,301,N3ALAA,LGA,ORD,138.0,733,6,0,01-01-2013 06:00


**Extract the flights which are departed late**

In [4]:
print(type(flight_data))

<class 'pandas.core.frame.DataFrame'>


In [5]:
delayed_flight = flight_data[flight_data.dep_delay > 0]
print("Total number of flights depayed = " + str(len(delayed_flight)))
delayed_flight

Total number of flights depayed = 128432


Unnamed: 0,year,month,day,dep_time,sched_dep_time,dep_delay,arr_time,sched_arr_time,arr_delay,carrier,flight,tailnum,origin,dest,air_time,distance,hour,minute,time_hour
0,2013,1,1,517.0,515,2.0,830.0,819,11.0,UA,1545,N14228,EWR,IAH,227.0,1400,5,15,01-01-2013 05:00
1,2013,1,1,533.0,529,4.0,850.0,830,20.0,UA,1714,N24211,LGA,IAH,227.0,1416,5,29,01-01-2013 05:00
2,2013,1,1,542.0,540,2.0,923.0,850,33.0,AA,1141,N619AA,JFK,MIA,160.0,1089,5,40,01-01-2013 05:00
19,2013,1,1,601.0,600,1.0,844.0,850,-6.0,B6,343,N644JB,EWR,PBI,147.0,1023,6,0,01-01-2013 06:00
25,2013,1,1,608.0,600,8.0,807.0,735,32.0,MQ,3768,N9EAMQ,EWR,ORD,139.0,719,6,0,01-01-2013 06:00
26,2013,1,1,611.0,600,11.0,945.0,931,14.0,UA,303,N532UA,JFK,SFO,366.0,2586,6,0,01-01-2013 06:00
27,2013,1,1,613.0,610,3.0,925.0,921,4.0,B6,135,N635JB,JFK,RSW,175.0,1074,6,10,01-01-2013 06:00
31,2013,1,1,623.0,610,13.0,920.0,915,5.0,AA,1837,N3EMAA,LGA,MIA,153.0,1096,6,10,01-01-2013 06:00
41,2013,1,1,632.0,608,24.0,740.0,728,12.0,EV,4144,N13553,EWR,IAD,52.0,212,6,8,01-01-2013 06:00
47,2013,1,1,644.0,636,8.0,931.0,940,-9.0,UA,1701,N75435,EWR,FLL,151.0,1065,6,36,01-01-2013 06:00


**Best airport in terms of time departure %**

Best airport is where (no of flights are departed on or before time from a airport / Total no of flights from a airport) x 100

In [9]:
airport_EWR = delayed_flight[delayed_flight.origin == 'EWR']
airport_JFK = delayed_flight[delayed_flight.origin == 'JFK']
airport_LGA = delayed_flight[delayed_flight.origin == 'LGA']
airport_count = pd.Series([len(airport_EWR), len(airport_JFK), len(airport_LGA)], index=['EWR', 'JFK', 'JGA'])
print (airport_count)
airport_count.idxmin()


EWR    52711
JFK    42031
JGA    33690
dtype: int64


'JGA'