So far we've learnt how to scrape the web, and how to make a request for information from an API. Some websites make APIs even easier. Check out [RapidAPI](https://rapidapi.com/) they take care of writing most of the code for you.

We will use the [AeroDataBox API](https://rapidapi.com/aedbx-aedbx/api/aerodatabox/), which can retrieve all sorts of information about flights and airports. We will show you how to retrieve information about the airports, and then it's up to you to apply this, along with what you've already learnt this week, to **produce a function, which retrieves tomorrows flight information for the major airports in the cities you web scraped**.

In [158]:
import pandas as pd
#import requests

On the left hand side of the AeroDataBox API page, you'll see a list of options for information that you can retrieve:
> - Flights API
- Subsciption / PUSH API
- Airport API
- Aircraft API
- Healthcheck & Status API

1. We want to select `Airport API`

2. Then within Airport API we want to select `Search airports by location`

3. Now in the middle third you'll want to enter the `latitude` and `longitude` of any city to test... we chose Berlin: latitude 52.31 longitude 13.24. Next we changed the `radiusKM` to only 50km. And finally set `withFlightInfoOnly` to true, so it will only return airports which have flight data (scheduled or live) available.

4. On the right hand third of the screen you should see a block of code that looks pretty unfamiliar. This is because by default the code is probably set to *(Node.js) Axios*. However, we have the power to change this to familiar python. Select the dropdown box at the top of the code and select `python > requests`.

Now you can copy the code to your notebook and it should look a little something like the cell below:

In [159]:
import requests

url = "https://aerodatabox.p.rapidapi.com/airports/search/location/52.31/13.24/km/50/10"

querystring = {"withFlightInfoOnly":"true"}

headers = {
	"X-RapidAPI-Key": "687292277emsh6620811a3972b04p1a4ee9jsn8c02f9bc139b",
	"X-RapidAPI-Host": "aerodatabox.p.rapidapi.com"
}

response = requests.request("GET", url, headers=headers, params=querystring)

print(response.text)

{"searchBy":{"lat":52.31,"lon":13.24},"count":1,"items":[{"icao":"EDDB","iata":"BER","name":"Berlin Brandenburg","shortName":"Brandenburg","municipalityName":"Berlin","location":{"lat":52.35139,"lon":13.493889},"countryCode":"DE","timeZone":"Europe/Berlin"}]}


Let's view the response as `.json()` instead of `.text` so that it's easier to read

In [160]:
response.json()

{'searchBy': {'lat': 52.31, 'lon': 13.24},
 'count': 1,
 'items': [{'icao': 'EDDB',
   'iata': 'BER',
   'name': 'Berlin Brandenburg',
   'shortName': 'Brandenburg',
   'municipalityName': 'Berlin',
   'location': {'lat': 52.35139, 'lon': 13.493889},
   'countryCode': 'DE',
   'timeZone': 'Europe/Berlin'}]}

In [161]:
airport_resp = response.json()

In [162]:
airport_resp["items"][0]["icao"]

'EDDB'

We can now turn this into a dataframe using `.json_normalize()`

In [163]:
pd.json_normalize(response.json()['items'])

Unnamed: 0,icao,iata,name,shortName,municipalityName,countryCode,timeZone,location.lat,location.lon
0,EDDB,BER,Berlin Brandenburg,Brandenburg,Berlin,DE,Europe/Berlin,52.35139,13.493889


Let's now use this for the latitude and longitude of multiple cities

In [164]:
url = "https://aerodatabox.p.rapidapi.com/airports/search/location/52.31/13.24/km/50/10"

In [165]:
def icao_airport_codes(latitudes, longitudes):

  #assert len(latitudes) == len(longitudes)

  list_for_df = []

  for index, value in enumerate(latitudes):

    url = f"https://aerodatabox.p.rapidapi.com/airports/search/location/{value}/{longitudes[index]}/km/50/10"

    querystring = {"withFlightInfoOnly":"true"}

    headers = {
      "X-RapidAPI-Host": "aerodatabox.p.rapidapi.com",
      "X-RapidAPI-Key": "687292277emsh6620811a3972b04p1a4ee9jsn8c02f9bc139b"
    }

    response = requests.request("GET", url, headers=headers, params=querystring)

    list_for_df.append(pd.json_normalize(response.json()['items']))

  return pd.concat(list_for_df, ignore_index=True)

In [166]:
list(enumerate(latitudes))

[(0, 52.52), (1, 48.8567), (2, 51.5072)]

In [167]:
# coordinates for Berlin, Paris, London
latitudes = [52.5200, 48.8567, 51.5072]
longitudes = [13.4050, 2.3522, -0.1275]

icao_airport_codes(latitudes, longitudes)

Unnamed: 0,icao,iata,name,shortName,municipalityName,countryCode,timeZone,location.lat,location.lon
0,EDDB,BER,Berlin Brandenburg,Brandenburg,Berlin,DE,Europe/Berlin,52.35139,13.493889
1,LFPB,LBG,Paris -Le Bourget,-Le Bourget,Paris,FR,Europe/Paris,48.9694,2.44139
2,LFPO,ORY,Paris -Orly,-Orly,Paris,FR,Europe/Paris,48.7253,2.35944
3,LFPG,CDG,Paris Charles de Gaulle,Charles de Gaulle,Paris,FR,Europe/Paris,49.0128,2.549999
4,EGLC,LCY,London City,City,London,GB,Europe/London,51.5053,0.055277
5,EGLL,LHR,London Heathrow,Heathrow,London,GB,Europe/London,51.4706,-0.461941
6,EGKR,KRH,Redhill Aerodrome,Aerodrome,Redhill,GB,Europe/London,51.2136,-0.138611
7,EGKK,LGW,London Gatwick,Gatwick,London,GB,Europe/London,51.1481,-0.190277
8,EGGW,LTN,London Luton,Luton,London,GB,Europe/London,51.8747,-0.368333
9,EGSS,STN,London Stansted,Stansted,London,GB,Europe/London,51.885,0.234999


###### **Challenge:** Arrivals information
Using what you have been shown above, plus the skills you've learnt in the last couple of days:
1. In `AeroDataBox API` use the `Flight API` > `FIDS/Schedules: Airport departures and arrivals (by time range)` section
2. Fill out the parameters in the middle third and then copy the `python: requests` code from the right hand third
3. Explore the data you get back. What would be useful in your DataFrame and what can be excluded? Remember Gans wants to know about when people are arriving in the city
4. Make a DataFrame from the information you see as important
5. Condense everything you did above into a function that can take a list of ICAO codes as an input, and as an output gives you a DataFrame with the information for *tomorrows arrivals*

In [168]:
import requests

url = "https://aerodatabox.p.rapidapi.com/airports/search/location/52.31/13.24/km/50/10"

querystring = {"withFlightInfoOnly":"true"}

headers = {
	"X-RapidAPI-Key": "687292277emsh6620811a3972b04p1a4ee9jsn8c02f9bc139b",
	"X-RapidAPI-Host": "aerodatabox.p.rapidapi.com"
}

response = requests.request("GET", url, headers=headers, params=querystring)

print(response.text)

{"searchBy":{"lat":52.31,"lon":13.24},"count":1,"items":[{"icao":"EDDB","iata":"BER","name":"Berlin Brandenburg","shortName":"Brandenburg","municipalityName":"Berlin","location":{"lat":52.35139,"lon":13.493889},"countryCode":"DE","timeZone":"Europe/Berlin"}]}


In [171]:
import http.client

conn = http.client.HTTPSConnection("aerodatabox.p.rapidapi.com")

url = "https://aerodatabox.p.rapidapi.com//flights/airports/iata/YYZ?offsetMinutes=-120&durationMinutes=720&withLeg=true&direction=Both&withCancelled=true&withCodeshared=true&withCargo=true&withPrivate=true&withLocation=true"

querystring = {"withFlightInfoOnly":"true"}

headers = {
    'x-rapidapi-key': "687292277emsh6620811a3972b04p1a4ee9jsn8c02f9bc139b",
    'x-rapidapi-host': "aerodatabox.p.rapidapi.com"
}


response = requests.request("GET", url, headers=headers, params=querystring)

print(response.text)





{"departures":[{"departure":{"scheduledTime":{"utc":"2024-11-05 13:10Z","local":"2024-11-05 08:10-05:00"},"revisedTime":{"utc":"2024-11-05 13:02Z","local":"2024-11-05 08:02-05:00"},"terminal":"1","gate":"D31","quality":["Basic","Live"]},"arrival":{"airport":{"icao":"CYOW","iata":"YOW","name":"Ottawa","timeZone":"America/Toronto"},"scheduledTime":{"utc":"2024-11-05 14:17Z","local":"2024-11-05 09:17-05:00"},"revisedTime":{"utc":"2024-11-05 14:17Z","local":"2024-11-05 09:17-05:00"},"terminal":"D","gate":"16","baggageBelt":"4","quality":["Basic","Live"]},"number":"AC 444","callSign":"ACA444","status":"Departed","codeshareStatus":"IsOperator","isCargo":false,"aircraft":{"reg":"C-FYKC","modeS":"C04067","model":"Airbus A319"},"airline":{"name":"Air Canada","iata":"AC","icao":"ACA"}},{"departure":{"scheduledTime":{"utc":"2024-11-05 13:00Z","local":"2024-11-05 08:00-05:00"},"revisedTime":{"utc":"2024-11-05 13:03Z","local":"2024-11-05 08:03-05:00"},"runwayTime":{"utc":"2024-11-05 13:17Z","local"

In [172]:
pd.json_normalize(response.json()['departures']).columns

Index(['number', 'callSign', 'status', 'codeshareStatus', 'isCargo',
       'departure.scheduledTime.utc', 'departure.scheduledTime.local',
       'departure.revisedTime.utc', 'departure.revisedTime.local',
       'departure.terminal', 'departure.gate', 'departure.quality',
       'arrival.airport.icao', 'arrival.airport.iata', 'arrival.airport.name',
       'arrival.airport.timeZone', 'arrival.scheduledTime.utc',
       'arrival.scheduledTime.local', 'arrival.revisedTime.utc',
       'arrival.revisedTime.local', 'arrival.terminal', 'arrival.gate',
       'arrival.baggageBelt', 'arrival.quality', 'aircraft.reg',
       'aircraft.modeS', 'aircraft.model', 'airline.name', 'airline.iata',
       'airline.icao', 'departure.runwayTime.utc',
       'departure.runwayTime.local', 'departure.runway',
       'arrival.runwayTime.utc', 'arrival.runwayTime.local', 'arrival.runway',
       'location.pressureAltitude.meter', 'location.pressureAltitude.km',
       'location.pressureAltitude.mile', 'lo

In [173]:
pd.json_normalize(response.json()['departures']).columns.tolist()

['number',
 'callSign',
 'status',
 'codeshareStatus',
 'isCargo',
 'departure.scheduledTime.utc',
 'departure.scheduledTime.local',
 'departure.revisedTime.utc',
 'departure.revisedTime.local',
 'departure.terminal',
 'departure.gate',
 'departure.quality',
 'arrival.airport.icao',
 'arrival.airport.iata',
 'arrival.airport.name',
 'arrival.airport.timeZone',
 'arrival.scheduledTime.utc',
 'arrival.scheduledTime.local',
 'arrival.revisedTime.utc',
 'arrival.revisedTime.local',
 'arrival.terminal',
 'arrival.gate',
 'arrival.baggageBelt',
 'arrival.quality',
 'aircraft.reg',
 'aircraft.modeS',
 'aircraft.model',
 'airline.name',
 'airline.iata',
 'airline.icao',
 'departure.runwayTime.utc',
 'departure.runwayTime.local',
 'departure.runway',
 'arrival.runwayTime.utc',
 'arrival.runwayTime.local',
 'arrival.runway',
 'location.pressureAltitude.meter',
 'location.pressureAltitude.km',
 'location.pressureAltitude.mile',
 'location.pressureAltitude.nm',
 'location.pressureAltitude.feet',
 

In [174]:
pd.json_normalize(response.json()['arrivals']).columns.tolist()

['number',
 'callSign',
 'status',
 'codeshareStatus',
 'isCargo',
 'departure.airport.icao',
 'departure.airport.iata',
 'departure.airport.name',
 'departure.airport.timeZone',
 'departure.scheduledTime.utc',
 'departure.scheduledTime.local',
 'departure.revisedTime.utc',
 'departure.revisedTime.local',
 'departure.quality',
 'arrival.scheduledTime.utc',
 'arrival.scheduledTime.local',
 'arrival.revisedTime.utc',
 'arrival.revisedTime.local',
 'arrival.terminal',
 'arrival.gate',
 'arrival.baggageBelt',
 'arrival.quality',
 'aircraft.reg',
 'aircraft.modeS',
 'aircraft.model',
 'airline.name',
 'airline.iata',
 'airline.icao',
 'arrival.runwayTime.utc',
 'arrival.runwayTime.local',
 'arrival.runway',
 'departure.runwayTime.utc',
 'departure.runwayTime.local',
 'departure.gate',
 'departure.runway',
 'departure.terminal',
 'location.pressureAltitude.meter',
 'location.pressureAltitude.km',
 'location.pressureAltitude.mile',
 'location.pressureAltitude.nm',
 'location.pressureAltitude.

In [179]:
departures_df = pd.json_normalize(response.json()['departures'])
arrivals_df = pd.json_normalize(response.json()['arrivals'])

# Select specific columns
# Replace 'column_name' with the actual names of the columns you're interested in
departures_df_selected = departures_df[['number','arrival.airport.icao']]
arrivals_df_selected = arrivals_df[['number', 'departure.airport.icao', 'arrival.scheduledTime.utc']]


#print(departures_df_selected.columns.tolist())

#print(arrivals_df_selected.columns.tolist())


#departures_df_selected.info()
arrivals_df_selected.head()

# Perform an inner join on the 'number' column
merged_df = pd.merge(departures_df_selected, arrivals_df_selected, on='number', how='inner')

# Display the merged DataFrame
merged_df.head()


Unnamed: 0,number,arrival.airport.icao,departure.airport.icao,arrival.scheduledTime.utc
0,AA 4552,KJFK,KJFK,2024-11-05 14:50Z
1,QF 3113,KJFK,KJFK,2024-11-05 14:50Z
2,AA 4347,KLGA,KLGA,2024-11-05 15:14Z
3,AA 1720,KCLT,KCLT,2024-11-05 16:00Z
4,AA 4556,KLGA,KLGA,2024-11-05 20:00Z


In [94]:
pd.json_normalize(response.json()['arrivals']).info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 637 entries, 0 to 636
Data columns (total 56 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   number                               637 non-null    object 
 1   callSign                             140 non-null    object 
 2   status                               637 non-null    object 
 3   codeshareStatus                      637 non-null    object 
 4   isCargo                              637 non-null    bool   
 5   departure.airport.icao               637 non-null    object 
 6   departure.airport.iata               637 non-null    object 
 7   departure.airport.name               637 non-null    object 
 8   departure.airport.timeZone           637 non-null    object 
 9   departure.scheduledTime.utc          616 non-null    object 
 10  departure.scheduledTime.local        616 non-null    object 
 11  departure.revisedTime.utc       

In [95]:
pd.json_normalize(response.json()['departures']).info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 728 entries, 0 to 727
Data columns (total 54 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   number                               728 non-null    object 
 1   status                               728 non-null    object 
 2   codeshareStatus                      728 non-null    object 
 3   isCargo                              728 non-null    bool   
 4   departure.scheduledTime.utc          728 non-null    object 
 5   departure.scheduledTime.local        728 non-null    object 
 6   departure.revisedTime.utc            724 non-null    object 
 7   departure.revisedTime.local          724 non-null    object 
 8   departure.terminal                   728 non-null    object 
 9   departure.gate                       722 non-null    object 
 10  departure.quality                    728 non-null    object 
 11  arrival.airport.icao            

In [96]:
import pandas as pd

# Function to extract relevant fields for the flights table
def extract_flight_records(data):
    records = []
    
    for i, record in enumerate(data):
        flight_record = {
            'flight_id': i + 1,  # Assuming incremental IDs; adjust if you have a different ID scheme
            'flight_num': record.get('number'),
            'departure_icao': record.get('departure', {}).get('airport', {}).get('icao'),
            'arrival_icao': record.get('arrival', {}).get('airport', {}).get('icao'),
            'arrival_time': record.get('arrival', {}).get('scheduledTime', {}).get('utc')
        }
        records.append(flight_record)
    
    return pd.DataFrame(records)

# Assuming 'response' is the API response object
arrivals_data = response.json()['arrivals']
flights_df = extract_flight_records(arrivals_data)
print(flights_df)


     flight_id flight_num departure_icao arrival_icao       arrival_time
0            1     F8 622           CYYC         None  2024-11-05 11:25Z
1            2    UA 8659           CYYG         None  2024-11-05 12:01Z
2            3     AC 631           CYYG         None  2024-11-05 12:01Z
3            4    TS 7234           KLAS         None  2024-11-05 10:42Z
4            5     PD 656           KLAS         None  2024-11-05 10:42Z
..         ...        ...            ...          ...                ...
632        633     PD 162           CYOW         None  2024-11-05 23:40Z
633        634     AC 506           KORD         None  2024-11-05 23:41Z
634        635    UA 8308           KORD         None  2024-11-05 23:41Z
635        636    AC 1702           KLAS         None  2024-11-05 23:43Z
636        637    AC 8970           KSTL         None  2024-11-05 23:45Z

[637 rows x 5 columns]
