### Getting flights data for the cities
Gans wants to know when the flights arrive to the city of interest to be able to predict the escooter usage. To obtains flights arrival times for the day of interest, I use the AeroDataBox api ([API Documentation](https://rapidapi.com/aedbx-aedbx/api/aerodatabox/details)). This code will generate two SQL tables:

1. **Airports icao codes**: A function *icao_airport_codes(latitudes, longitudes)* will take in city coordinates and find airports closer to the selected cities. Importantly, it will store city icao codes that will be used to obtain the flights data.  
2. **Flights data**: A function *get_arrival_data(icao_list)* will take in airport ICAO codes and produce the following flights data columns. Notably, the flight data for the next day will be produced to be able to forecast e-scooter traffic. 
- *"arrival_airport_icao"* - Arrival airport's ICAO code
- "departure_airport_icao"* - Departure airport's ICAO code
- *"departure_airport_name"* - Departure airport's name
- *"scheduled_arrival_time"* - Scheduled arrival time
- *"flight_number"* - Flight number
- *"data_retrieved_at"* - Date of the data retrieval

#### Import libraries and cities df

In [1]:
import pandas as pd
import requests
import sqlalchemy

import datetime
from pytz import timezone
from datetime import datetime, timedelta

import warnings
warnings.filterwarnings('ignore')

# Get the api key
from keys import AeroDatabox, MySQL_bootcamp

In [2]:
df_cities = pd.read_csv("data/df_sql_city_id.csv")
df_cities

Unnamed: 0,City,country_2c,latitude,longitude,is_capital,Country,Elevation (in m),Population,city_id,Elevation
0,Berlin,DE,52.5167,13.3833,True,Germany,34.0,3576873,1,34
1,Hamburg,DE,53.55,10.0,False,Germany,23.0,1945532,2,23
2,Munich,DE,48.1372,11.5755,False,Germany,520.0,1512491,3,520
3,Cologne,DE,50.9422,6.9578,False,Germany,37.0,1073096,4,37
4,Paris,FR,48.8566,2.3522,True,France,35.0,2102650,5,35
5,Nice,FR,43.7034,7.2663,False,France,10.0,348085,6,10
6,Rome,IT,41.8931,12.4828,True,Italy,21.0,2860009,7,21
7,Milan,IT,45.4669,9.19,False,Italy,120.0,1371498,8,120
8,Warsaw,PL,52.2167,21.0333,True,Poland,100.0,1863056,9,100
9,Barcelona,ES,41.3825,2.1769,False,Spain,12.0,1620343,10,12


#### Get airport icao codes for selected cities

In [3]:
# Creating a function to get airports for the city
def icao_airport_codes(latitudes, longitudes):
  #assert len(latitudes) == len(longitudes)
  list_for_df = []
  for i in range(len(latitudes)):
    url = "https://aerodatabox.p.rapidapi.com/airports/search/location"
    querystring = {"lat":latitudes[i],"lon":longitudes[i],"radiusKm":"50","limit":"5","withFlightInfoOnly":"true"}
    headers = {
      "X-RapidAPI-Host": "aerodatabox.p.rapidapi.com",
      "X-RapidAPI-Key": AeroDatabox
    }
    response = requests.request("GET", url, headers=headers, params=querystring)
    list_for_df.append(pd.json_normalize(response.json()['items']))
  return pd.concat(list_for_df, ignore_index=True)

df_airports = pd.DataFrame(icao_airport_codes(df_cities["latitude"], df_cities["longitude"]))
df_airports

Unnamed: 0,icao,iata,name,shortName,municipalityName,countryCode,location.lat,location.lon
0,EDDB,BER,Berlin Brandenburg,Brandenburg,Berlin,DE,52.35139,13.493889
1,EDDH,HAM,Hamburg,Hamburg,Hamburg,DE,53.6304,9.988229
2,EDDM,MUC,Munich,Munich,Munich,DE,48.3538,11.7861
3,EDDK,CGN,Cologne Bonn,Bonn,Cologne,DE,50.8659,7.142739
4,EDDL,DUS,Duesseldorf Düsseldorf,Düsseldorf,Duesseldorf,DE,51.2895,6.766779
5,LFPB,LBG,Paris -Le Bourget,-Le Bourget,Paris,FR,48.9694,2.44139
6,LFPO,ORY,Paris -Orly,-Orly,Paris,FR,48.7253,2.35944
7,LFPG,CDG,Paris Charles de Gaulle,Charles de Gaulle,Paris,FR,49.0128,2.549999
8,LFMN,NCE,Nice -Côte d'Azur,-Côte d'Azur,Nice,FR,43.6584,7.215869
9,LIRA,CIA,Roma Ciampino–G. B. Pastine,Ciampino–G. B. Pastine,Roma,IT,41.7994,12.5949


Restrict the dataframe to preferred information

In [4]:
df_airports = df_airports[["icao", "municipalityName"]]

#### Merge airports dataframe with cities df
First let's rename cities to match city_df city names

In [5]:
df_airports.loc[df_airports["municipalityName"]=="Newcastle upon Tyne", "municipalityName"]="Newcastle"
df_airports.loc[df_airports["municipalityName"]=="Seville", "municipalityName"]="Sevilla"
df_airports.iloc[27, df_airports.columns.get_loc('municipalityName')] = "The Hague"

Merge two dataframes and restrict the merged df to contains needed columns

In [6]:
df_airports_merged = df_cities.merge(df_airports, how="left", left_on="City", right_on="municipalityName")
df_airports_merged = df_airports_merged[["city_id", "icao"]]
df_airports_merged

Unnamed: 0,city_id,icao
0,1,EDDB
1,2,EDDH
2,3,EDDM
3,4,EDDK
4,5,LFPB
5,5,LFPO
6,5,LFPG
7,6,LFMN
8,7,LIRF
9,8,LIML


#### Create airports table in SQL DB

In [7]:
# Create a connecting link
schema = "gans_cities"
host = "127.0.0.1"
user = "root"
password = MySQL_bootcamp
port = 3306

connection_string = f'mysql+pymysql://{user}:{password}@{host}:{port}/{schema}'

In [8]:

df_airports_merged.to_sql('airports',
                  if_exists='append',
                  con=connection_string,
                  index=False)

27

#### Save the airports df

In [9]:
df_airports_merged.to_csv("data/df_airports_merged.csv", sep=',', index=False, encoding='utf-8')

#### Get flights data

In [10]:
# Create a function to get arrival info at the selected airport
def get_arrival_data(icao_list):
  api_key = AeroDatabox

  berlin_timezone = timezone('Europe/Berlin')
  today = datetime.now(berlin_timezone).date()
  tomorrow = (today + timedelta(days=1))

  flight_items = []

  for icao in icao_list:
    # the api can only make 12 hour calls, therefore, 2 12 hour calls make a full day
    # using the nested lists below we can make a morning call and extract the data
    # then make an afternoon call and extract the data
    times = [["00:00","11:59"],
             ["12:00","23:59"]]

    for time in times:
      url = f"https://aerodatabox.p.rapidapi.com/flights/airports/icao/{icao}/{tomorrow}T{time[0]}/{tomorrow}T{time[1]}"

      querystring = {"withLeg":"true",
                    "direction":"Arrival",
                    "withCancelled":"false",
                    "withCodeshared":"true",
                    "withCargo":"false",
                    "withPrivate":"false"}

      headers = {
          'x-rapidapi-host': "aerodatabox.p.rapidapi.com",
          'x-rapidapi-key': api_key
          }

      response = requests.get(url, headers=headers, params=querystring)

      flights_json = response.json()

      retrieval_time = datetime.now(berlin_timezone).strftime("%Y-%m-%d %H:%M:%S")

      for item in flights_json["arrivals"]:
        flight_item = {
            "arrival_airport_icao": icao,
            "departure_airport_icao": item["departure"]["airport"].get("icao", None),
            "departure_airport_name": item["departure"]["airport"].get("name", None),
            "scheduled_arrival_time": item["arrival"]["scheduledTime"].get("local", None),
            "flight_number": item.get("number", None),
            "data_retrieved_at": retrieval_time
        }

        flight_items.append(flight_item)

  flights_df = pd.DataFrame(flight_items)
  flights_df["scheduled_arrival_time"] = flights_df["scheduled_arrival_time"].str[:-6]
  flights_df["scheduled_arrival_time"] = pd.to_datetime(flights_df["scheduled_arrival_time"])
  flights_df["data_retrieved_at"] = pd.to_datetime(flights_df["data_retrieved_at"])

  return flights_df

#### Get tomorrow's flights arrival data for London Heathrow airport

In [11]:
df_airport_london = df_airports.loc[df_airports["icao"]=="EGLL", "icao"]
# Specify the airport code ("EGLL") as a list
airport_codes = list(df_airport_london.values)
flights_df = get_arrival_data(airport_codes)
flights_df

Unnamed: 0,arrival_airport_icao,departure_airport_icao,departure_airport_name,scheduled_arrival_time,flight_number,data_retrieved_at
0,EGLL,FACT,Cape Town,2024-02-16 04:45:00,AA 7109,2024-02-15 13:58:49
1,EGLL,DNMM,Lagos,2024-02-16 04:50:00,AY 5904,2024-02-15 13:58:49
2,EGLL,FACT,Cape Town,2024-02-16 04:45:00,AY 5948,2024-02-15 13:58:49
3,EGLL,FACT,Cape Town,2024-02-16 04:45:00,BA 58,2024-02-15 13:58:49
4,EGLL,DNMM,Lagos,2024-02-16 04:50:00,BA 74,2024-02-15 13:58:49
...,...,...,...,...,...,...
2306,EGLL,LPPT,Lisbon,2024-02-16 22:40:00,TP 1366,2024-02-15 13:58:50
2307,EGLL,LPPT,Lisbon,2024-02-16 22:40:00,UA 6864,2024-02-15 13:58:50
2308,EGLL,LTFM,Istanbul,2024-02-16 22:15:00,UA 6916,2024-02-15 13:58:50
2309,EGLL,OTHH,Doha,2024-02-16 22:10:00,WB 1402,2024-02-15 13:58:50


In [12]:
flights_df["scheduled_arrival_time"].min()

Timestamp('2024-02-16 04:45:00')

In [13]:
flights_df["scheduled_arrival_time"].max()

Timestamp('2024-02-16 22:55:00')

#### Read airports table from SQL DB to get airport_id

In [14]:
df_airports_sql = pd.read_sql("airports", con=connection_string)
df_airports_sql=df_airports_sql[['airport_id', 'icao']]
df_airports_sql

Unnamed: 0,airport_id,icao
0,1,EDDB
1,2,EDDH
2,3,EDDM
3,4,EDDK
4,5,LFPB
5,6,LFPO
6,7,LFPG
7,8,LFMN
8,9,LIRF
9,10,LIML


#### Merge airports table from SQL DB to get airport_id

In [15]:
flights_merged_df= flights_df.merge(df_airports_sql, how="left", left_on="arrival_airport_icao", right_on="icao")
flights_merged_df.columns

Index(['arrival_airport_icao', 'departure_airport_icao',
       'departure_airport_name', 'scheduled_arrival_time', 'flight_number',
       'data_retrieved_at', 'airport_id', 'icao'],
      dtype='object')

In [16]:
flights_merged_df = flights_merged_df.drop("icao", axis=1)

#### Save the flights df

In [17]:
flights_merged_df.to_csv("data/df_flight_arrivals.csv", sep=',', index=False, encoding='utf-8')

#### Create a flights table in SQL DB

In [18]:
flights_merged_df.to_sql('flights',
                  if_exists='append',
                  con=connection_string,
                  index=False)

2311