# N01: DATA ENGINEERING: ETL

## Notebook Description

This notebook marks the beginning of the `Airlines Delays Analysis` project and focuses on the ETL processes in Data Engineering: gathering data from API endpoints, transforming it, and uploading it into a database. The data will subsequently be processed and analyzed in the `Data Analytics` notebooks (4, 5, and 6).

For this project, a custom API service has been developed, available at: https://api-datalab.coderslab.com/api/. The service provides four key endpoints:

1. **airport**: Contains data about airports.
2. **weather**: Offers weather information recorded at an airport on a specific day.
3. **aircraft**: Contains data about aircraft.
4. **flight**: Provides daily departures from a specific airport.

To extract data from the API endpoints, a dedicated `token` has been generated. If you wish to download the data yourself, please contact the author of this repository via direct message.

Additionally, a file named `airports.csv` (located in the `data/` directory) is included to assist with fetching information requiring the `airportId` parameter.

Finally, for ease of use, a `readme.txt` file is included in the `data/` folder to help navigate column names in the downloaded data.

###
## Notebook Configuration

<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Libraries import</p>

In [1]:
import requests
import pandas as pd
import time
from datetime import datetime
import os
from project_dir import DIR_PATH

<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>API connection parameters</p>

In [2]:
api_key = ("contact author to obtain the token")
authorization = {'authorization': api_key}

SyntaxError: invalid syntax. Perhaps you forgot a comma? (1400204839.py, line 1)

<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Loading the 'airports.csv' file to extract data from API endpoints</p>

In [3]:
airports = pd.read_csv(f"{DIR_PATH}{os.path.sep}data{os.path.sep}airports.csv")
airports.head()

Unnamed: 0,origin_airport_id
0,10874
1,11233
2,13360
3,15008
4,11638


<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Changing 'airports.csv' file into a list</p>

In [4]:
airports = pd.read_csv(f"{DIR_PATH}{os.path.sep}data{os.path.sep}airports.csv", header = 0)
# airports_list = list(airports.origin_airport_id)

<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Iterating through the 'airports_list' to identify airports with a connection status of 200. Then, creating a new list, 'true_airports', that includes only those airports</p>

In [5]:
%%time

url_airport = "https://api-datalab.coderslab.com/api/airport/{airportID}"
airports_list = list(airports.origin_airport_id)
true_airports = []

for airportID in airports_list:
    url = url_airport.replace("{airportID}", str(airportID))
    response = requests.get(url, headers = authorization)
    
    if response.status_code == 200:
        true_airports.append(airportID)

CPU times: user 16.8 s, sys: 1.83 s, total: 18.6 s
Wall time: 1min 18s


<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Creating an 'airport_df' dictionary</p>

In [6]:
%%time

airport_df = {}

for airportID in true_airports:
    url = url_airport.replace("{airportID}", str(airportID))
    response = requests.get(url, headers = authorization)
    
    if response.status_code == 200:
        airport_data = response.json()
        airport_df[airportID] = airport_data
    
airport_df

CPU times: user 4.52 s, sys: 336 ms, total: 4.86 s
Wall time: 20.9 s


{11638: {'ORIGIN_AIRPORT_ID': 11638,
  'DISPLAY_AIRPORT_NAME': 'Fresno Air Terminal',
  'ORIGIN_CITY_NAME': 'Fresno, CA',
  'NAME': 'FRESNO YOSEMITE INTERNATIONAL, CA US'},
 13342: {'ORIGIN_AIRPORT_ID': 13342,
  'DISPLAY_AIRPORT_NAME': 'General Mitchell Field',
  'ORIGIN_CITY_NAME': 'Milwaukee, WI',
  'NAME': 'MILWAUKEE MITCHELL AIRPORT, WI US'},
 13244: {'ORIGIN_AIRPORT_ID': 13244,
  'DISPLAY_AIRPORT_NAME': 'Memphis International',
  'ORIGIN_CITY_NAME': 'Memphis, TN',
  'NAME': 'MEMPHIS INTERNATIONAL AIRPORT, TN US'},
 15096: {'ORIGIN_AIRPORT_ID': 15096,
  'DISPLAY_AIRPORT_NAME': 'Syracuse Hancock International',
  'ORIGIN_CITY_NAME': 'Syracuse, NY',
  'NAME': 'SYRACUSE HANCOCK INTERNATIONAL AIRPORT, NY US'},
 10397: {'ORIGIN_AIRPORT_ID': 10397,
  'DISPLAY_AIRPORT_NAME': 'Atlanta Municipal',
  'ORIGIN_CITY_NAME': 'Atlanta, GA',
  'NAME': 'ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPORT, GA US'},
 10529: {'ORIGIN_AIRPORT_ID': 10529,
  'DISPLAY_AIRPORT_NAME': 'Bradley International',
 

###
## Extractig `Airport` Data From An API Endpoint

<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Testing connection</p>

In [7]:
url_airport = "https://api-datalab.coderslab.com/api/airport/{airportID}"
response = requests.get(url_airport, headers = authorization)
response.status_code

400

<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Generating the DataFrame 'airport_df' using data from the 'airport_df' dictionary</p>

In [8]:
airport_df = pd.DataFrame.from_records(airport_df)
airport_df = airport_df.transpose() 
airport_df

Unnamed: 0,ORIGIN_AIRPORT_ID,DISPLAY_AIRPORT_NAME,ORIGIN_CITY_NAME,NAME
10140,10140,Albuquerque International Sunport,"Albuquerque, NM","ALBUQUERQUE INTERNATIONAL AIRPORT, NM US"
10257,10257,Albany International,"Albany, NY","ALBANY INTERNATIONAL AIRPORT, NY US"
10299,10299,Anchorage International,"Anchorage, AK","ANCHORAGE TED STEVENS INTERNATIONAL AIRPORT, A..."
10397,10397,Atlanta Municipal,"Atlanta, GA",ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO...
10423,10423,Austin - Bergstrom International,"Austin, TX","AUSTIN BERGSTROM INTERNATIONAL AIRPORT, TX US"
...,...,...,...,...
15304,15304,Tampa International,"Tampa, FL","TAMPA INTERNATIONAL AIRPORT, FL US"
15370,15370,Tulsa International,"Tulsa, OK","OKLAHOMA CITY WILL ROGERS WORLD AIRPORT, OK US"
15376,15376,Tucson International,"Tucson, AZ","PHOENIX AIRPORT, AZ US"
15412,15412,McGhee Tyson,"Knoxville, TN","KNOXVILLE AIRPORT, TN US"


<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Saving DataFrame 'airport_df' to 'airport_list.csv'</p>

In [9]:
airport_df.to_csv(f"{DIR_PATH}{os.path.sep}data{os.path.sep}raw{os.path.sep}airport_list.csv", index=False)

###
## Extractig `Weather` Data From An API Endpoint

<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Testing connection</p>

In [10]:
url_weather = "https://api-datalab.coderslab.com/api/airportWeather?date={value}"
response = requests.get(url_weather, headers = authorization)
response.status_code

200

<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Considering the volume of data, a list with dates is first created before saving it as a DataFrame:</p>

In [11]:
start_date = datetime(2019, 1, 1)
num_months = 15

dates = []

for i in range(num_months):
    dates.append(start_date.strftime("%Y-%m"))
    if start_date.month == 12:
        start_date = start_date.replace(year=start_date.year + 1, month=1)
    else:
        start_date = start_date.replace(month=start_date.month + 1)

dates

['2019-01',
 '2019-02',
 '2019-03',
 '2019-04',
 '2019-05',
 '2019-06',
 '2019-07',
 '2019-08',
 '2019-09',
 '2019-10',
 '2019-11',
 '2019-12',
 '2020-01',
 '2020-02',
 '2020-03']

<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Secondly, data from the API endpoint is extracted in JSON format, along with the use of the 'print(dates)' function to monitor progress. Lastly, the 'airport_weather_df' DataFrame is created</p>

In [12]:
%%time

airport_weather_df = pd.DataFrame()
for date in dates:
    url = url_weather.replace("{value}", str(date))
    response = requests.get(url, headers = authorization)
    print(str(date))
    data = response.json()
    df_single_w = pd.json_normalize(data)
    airport_weather_df = pd.concat([airport_weather_df, df_single_w], ignore_index=True)

2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
CPU times: user 1.25 s, sys: 127 ms, total: 1.38 s
Wall time: 14 s


In [13]:
airport_weather_df

Unnamed: 0,WT18,STATION,NAME,DATE,AWND,PRCP,SNOW,SNWD,TAVG,TMAX,...,PGTM,WT10,WESD,SN32,SX32,PSUN,TSUN,TOBS,WT07,WT11
0,,USW00013874,ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO...,2019-01-01,4.70,0.14,0.0,0.0,64.0,66.0,...,,,,,,,,,,
1,,USW00013874,ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO...,2019-01-02,4.92,0.57,0.0,0.0,56.0,59.0,...,,,,,,,,,,
2,,USW00013874,ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO...,2019-01-03,5.37,0.15,0.0,0.0,52.0,55.0,...,,,,,,,,,,
3,,USW00013874,ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO...,2019-01-04,12.08,1.44,0.0,0.0,56.0,66.0,...,,,,,,,,,,
4,,USW00013874,ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO...,2019-01-05,13.42,0.00,0.0,0.0,49.0,59.0,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
46221,,USW00014762,"PITTSBURGH ALLEGHENY CO AIRPORT, PA US",2020-03-27,3.58,0.21,,,,59.0,...,146.0,,,,,,,,,
46222,,USW00014762,"PITTSBURGH ALLEGHENY CO AIRPORT, PA US",2020-03-28,6.93,1.29,,,,77.0,...,1535.0,,,,,,,,,
46223,,USW00014762,"PITTSBURGH ALLEGHENY CO AIRPORT, PA US",2020-03-29,16.55,0.02,,,,78.0,...,1408.0,,,,,,,,,
46224,,USW00014762,"PITTSBURGH ALLEGHENY CO AIRPORT, PA US",2020-03-30,13.42,0.00,,,,57.0,...,817.0,,,,,,,,,


<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Saving DataFrame 'airport_weather_df' to 'airport_weather.csv'</p>

In [14]:
airport_weather_df.to_csv(f"{DIR_PATH}{os.path.sep}data{os.path.sep}raw{os.path.sep}airport_weather.csv", index=False)

###
## Extractig `Aircraft` Data From an API Endpoint

<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Testing connection</p>

In [15]:
url_aircraft = "https://api-datalab.coderslab.com/api/Aircraft"
response = requests.get(url_aircraft, headers = authorization)
response.status_code

200

<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Extracting the data and saving it into the 'aircraft_df' DataFrame</p>

In [16]:
aircraft_df = pd.DataFrame()
data_aircraft = response.json()
variable_for_normalization = pd.json_normalize(data_aircraft)
aircraft_df = pd.concat([aircraft_df, variable_for_normalization], ignore_index=True)

aircraft_df

Unnamed: 0,MANUFACTURE_YEAR,TAIL_NUM,NUMBER_OF_SEATS
0,1944,N54514,0.0
1,1945,N1651M,0.0
2,1953,N100CE,0.0
3,1953,N141FL,0.0
4,1953,N151FL,0.0
...,...,...,...
7378,2019,N14011,337.0
7379,2019,N16008,337.0
7380,2019,N16009,337.0
7381,2019,N2250U,276.0


<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Saving DataFrame 'aircraft_df' to 'aircraft.csv'</p>

In [17]:
aircraft_df.to_csv(f"{DIR_PATH}{os.path.sep}data{os.path.sep}raw{os.path.sep}aircraft.csv", index=False)

###
## Extractig `Flight` Data From An API Endpoint

<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Testing connection</p>

In [18]:
url_flight = "https://api-datalab.coderslab.com/api/flight?airportId={value_1}&date={value_2}"
response = requests.get(url_flight, headers = authorization)
response.status_code

400

<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Loading the list of airports, necessary for correct data extraction from the API endpoint</p>

In [19]:
airports = pd.read_csv(f"{DIR_PATH}{os.path.sep}data{os.path.sep}airports.csv", header = 0)
airports_list = list(airports.origin_airport_id)
airports_list

[10874,
 11233,
 13360,
 15008,
 11638,
 14150,
 15323,
 14814,
 12007,
 11337,
 13342,
 15070,
 13244,
 12280,
 15096,
 11641,
 13832,
 10268,
 10397,
 15041,
 10529,
 12119,
 11537,
 11092,
 10581,
 13829,
 15389,
 10140,
 12389,
 11648,
 15023,
 11982,
 10967,
 11525,
 10792,
 14259,
 11637,
 10466,
 10599,
 10208,
 15841,
 14831,
 12898,
 13241,
 13367,
 11481,
 14108,
 13931,
 13873,
 10157,
 10245,
 11146,
 13277,
 11292,
 11109,
 13459,
 11775,
 16218,
 14698,
 14252,
 13256,
 13139,
 12250,
 11259,
 11468,
 14952,
 12402,
 14574,
 11996,
 11977,
 11867,
 11203,
 11995,
 15016,
 10747,
 14905,
 12012,
 14783,
 14730,
 10431,
 10434,
 16869,
 10408,
 12264,
 11618,
 15304,
 13577,
 12954,
 11624,
 13541,
 13422,
 14057,
 13232,
 10800,
 14689,
 12391,
 10868,
 14711,
 10257,
 11067,
 10562,
 11695,
 13796,
 14109,
 13970,
 14193,
 11076,
 14092,
 11122,
 11288,
 11308,
 10754,
 12884,
 15376,
 14588,
 11884,
 12915,
 13851,
 14843,
 11603,
 14457,
 12206,
 11252,
 11905,
 15412,


<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Extracting data from the 'flight' API endpoint. Given the large volume of data, several safety measures were added to the code, such as a limit of 500 API calls per minute and 60-second intervals between API calls</p>

In [45]:
%%time

url_flight = "https://api-datalab.coderslab.com/api/flight?airportId={value_1}&date={value_2}"
limit = 500  
interval = 60 / limit  

flight_df = pd.DataFrame()

for airportID in airports_list:
    for date in dates:
        url = url_flight.replace("{value_1}", str(airportID)).replace("{value_2}", str(date))
        response = requests.get(url, headers=authorization)
        time.sleep(interval)

        print(str(date))

        try:
            flight_data = response.json()
            df_single_f = pd.json_normalize(flight_data)
            flight_df = pd.concat([flight_df, df_single_f], ignore_index=True)
        except ValueError as e:
            print(f"Skipping record - JSONDecodeError: {e}")
            continue

2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
Skipping record - JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
201

2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06


2019-12
Skipping record - JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2020-01
Skipping record - JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2020-02
Skipping record - JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2020-03
Skipping record - JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019

2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
Skipping record - JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2019-02
Skipping record - JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2019-03
Skipping record - JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2019-04
Skipping record - JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
Skipping rec

2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
Skipping record - JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2020-03
Skipping record - JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2019-0

In [46]:
flight_df

Unnamed: 0,MONTH,DAY_OF_MONTH,DAY_OF_WEEK,OP_UNIQUE_CARRIER,TAIL_NUM,OP_CARRIER_FL_NUM,ORIGIN_AIRPORT_ID,DEST_AIRPORT_ID,CRS_DEP_TIME,DEP_TIME,...,CRS_ELAPSED_TIME,ACTUAL_ELAPSED_TIME,DISTANCE,DISTANCE_GROUP,YEAR,CARRIER_DELAY,WEATHER_DELAY,NAS_DELAY,SECURITY_DELAY,LATE_AIRCRAFT_DELAY
0,1,1,2,9E,N931XJ,3290,10874,10397,600,557.0,...,129.0,100.0,528,3,2019,,,,,
1,1,1,2,OH,N723PS,5495,10874,11057,704,723.0,...,115.0,82.0,394,2,2019,,,,,
2,1,1,2,OH,N525EA,5416,10874,11057,1944,1942.0,...,101.0,96.0,394,2,2019,,,,,
3,1,1,2,OH,N706PS,5426,10874,11057,1521,1518.0,...,103.0,93.0,394,2,2019,,,,,
4,1,1,2,OH,N262PS,5440,10874,14100,756,800.0,...,93.0,74.0,335,2,2019,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9251875,3,30,1,MQ,N240NN,3535,12511,11298,620,612.0,...,83.0,93.0,327,2,2020,,,,,
9251876,3,30,1,MQ,N673AE,3744,12511,13930,1410,1400.0,...,106.0,79.0,484,2,2020,,,,,
9251877,3,31,2,MQ,,3979,12511,13930,540,,...,106.0,,484,2,2020,,,,,
9251878,3,31,2,MQ,,4160,12511,11298,1616,,...,87.0,,327,2,2020,,,,,


<p style='background-color: #FFFFE0; margin-top:20px; padding:5px 15px; font-weight: 500'>Saving DataFrame 'flight_df' to 'flight.csv'</p>

In [48]:
flight_df.to_csv(f"{DIR_PATH}{os.path.sep}data{os.path.sep}raw{os.path.sep}flight.csv", index=False)