# Open Metro and AeroDataBox API Integration

The purpose of this section is to explore the integration of Open Metro and AeroDataBox APIs for retrieving weather and flight data. The methodology that I used to explore these APIs is as follows:
* Create aeromarket-api and openmeteo-api packages to interact with the respective APIs more easily.
* Use environment variables to securely store API keys and URLs.
* Implement classes and methods to fetch and process data from both APIs. 

In [1]:
# Executing this cell does some magic
%load_ext autoreload
%autoreload 2

In [2]:
from openmeteoapi.WeatherData import Weather
from openmeteoapi.APICaller import OpenMeteoAPICaller
from aeroapi_market.APICaller import APICaller
from aeroapi_market.Flights import Flights
import pandas as pd
import os
from dotenv import load_dotenv

As a reminder, our goal is to get a dataset with the following format:

# Weather Data Retrieval using OpenMeteo API

To retrieve weather data, I created a `Weather` class in the `openmeteo-api` package. This class uses an `APICaller` to make requests to the OpenMeteo API and fetch historical weather data based on specified parameters such as latitude, longitude, start date, end date, and hourly variables. These data will be very helpful for the model to predict flight delays based on weather conditions.


I defined the disiered weather variables in the weather class for easier readability. Alternatively, you can specify the variables directly when calling the methods. The desired weather variables are as follows:
- Temperature at 2 meters (°C)
- Precipitation (mm)
- Wind Speed at 10 meters (km/h)
- Humidity at 2 meters (%)
- rainfall (mm)
- wind gusts at 10 meters (km/h)
- cloud cover (%)
- cloud cover low (%)
- apparent temperature (°C)
- surface pressure (hPa)
- pressure at mean sea level (hPa)

In order to retrieve weather data, we need to create an instance of the 'Weather' class and call 'to_hourly_dataframe' method (or to_daily_dataframe) to get the hourly desired data in a pandas DataFrame format. As follows:

In [None]:
api_caller_weather = OpenMeteoAPICaller()
weather_data = Weather(
    api_caller=api_caller_weather,
    airport_code="KJFK",
    start_date="2024-12-11",
    end_date="2024-12-25",
    code_type="ICAO",
)
pd.reset_option('display.max_columns')
pd.reset_option('display.max_rows')
weather_df = weather_data.to_hourly_dataframe()
weather_df

Similarly, we can retrive weather data for multiple locations.

In [66]:
airports = ["JFK", "LAX", "LHR", "CDG", "HND", "DXB", "SIN", "SYD", "FRA", "AMS"]
api_caller_weather = OpenMeteoAPICaller()
weather = Weather(
    api_caller=api_caller_weather,
    airport_code=airports,
    start_date="2024-12-11",
    end_date="2024-12-25",
    code_type="IATA",
)
weather_df = weather.to_hourly_dataframe()
weather_df

Unnamed: 0,date,snowfall,rain,precipitation,wind_speed_10m,wind_gusts_10m,cloud_cover_low,cloud_cover,temperature_2m,apparent_temperature,surface_pressure,relative_humidity_2m,pressure_msl,queried_airport_code
0,2024-12-11 00:00:00-05:00,0.0,0.1,0.1,16.201000,33.480000,100.0,100.0,11.75,9.913984,1013.135620,99.670563,1013.500000,JFK
1,2024-12-11 01:00:00-05:00,0.0,0.7,0.7,15.865546,33.839996,100.0,100.0,11.55,9.700219,1012.035706,99.670036,1012.400024,JFK
2,2024-12-11 02:00:00-05:00,0.0,0.7,0.7,19.937794,39.959999,73.0,100.0,12.00,9.650640,1011.436462,98.690514,1011.799988,JFK
3,2024-12-11 03:00:00-05:00,0.0,1.2,1.2,19.408306,39.239998,0.0,100.0,12.85,10.625551,1011.337463,94.262718,1011.700012,JFK
4,2024-12-11 04:00:00-05:00,0.0,0.3,0.3,14.458382,38.880001,0.0,100.0,13.20,11.493306,1010.438416,88.260132,1010.799988,JFK
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3595,2024-12-25 19:00:00+01:00,0.0,0.0,0.0,8.131936,16.559999,100.0,100.0,8.85,7.270769,1034.048218,97.661545,1034.800049,AMS
3596,2024-12-25 20:00:00+01:00,0.0,0.0,0.0,7.100310,15.119999,100.0,100.0,8.75,7.295176,1034.447510,97.659706,1035.199951,AMS
3597,2024-12-25 21:00:00+01:00,0.0,0.0,0.0,6.766180,13.320000,100.0,100.0,8.65,7.230936,1034.647095,97.989471,1035.400024,AMS
3598,2024-12-25 22:00:00+01:00,0.0,0.0,0.0,7.329338,13.679999,100.0,100.0,8.70,7.211770,1034.647461,97.990250,1035.400024,AMS


Similarly, we can request weather data for multiple locations:

In [39]:
airports = ["JFK", "LAX", "LHR", "CDG", "HND", "DXB", "SIN", "SYD", "FRA", "AMS"]
airport_list = list(airports)
weather = Weather(
    api_caller=api_caller_weather,
    airport_code=airport_list,
    start_date="2024-12-11",
    end_date="2024-12-25",
    code_type="IATA",
)
weather_df2 = weather.to_hourly_dataframe()
weather_df2

Unnamed: 0,date,snowfall,rain,precipitation,wind_speed_10m,wind_gusts_10m,cloud_cover_low,cloud_cover,temperature_2m,apparent_temperature,surface_pressure,relative_humidity_2m,pressure_msl,queried_airport_code
0,2024-12-11 00:00:00-05:00,0.0,0.1,0.1,16.201000,33.480000,100.0,100.0,11.75,9.913984,1013.135620,99.670563,1013.500000,JFK
1,2024-12-11 01:00:00-05:00,0.0,0.7,0.7,15.865546,33.839996,100.0,100.0,11.55,9.700219,1012.035706,99.670036,1012.400024,JFK
2,2024-12-11 02:00:00-05:00,0.0,0.7,0.7,19.937794,39.959999,73.0,100.0,12.00,9.650640,1011.436462,98.690514,1011.799988,JFK
3,2024-12-11 03:00:00-05:00,0.0,1.2,1.2,19.408306,39.239998,0.0,100.0,12.85,10.625551,1011.337463,94.262718,1011.700012,JFK
4,2024-12-11 04:00:00-05:00,0.0,0.3,0.3,14.458382,38.880001,0.0,100.0,13.20,11.493306,1010.438416,88.260132,1010.799988,JFK
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3595,2024-12-25 19:00:00+01:00,0.0,0.0,0.0,8.131936,16.559999,100.0,100.0,8.85,7.270769,1034.048218,97.661545,1034.800049,AMS
3596,2024-12-25 20:00:00+01:00,0.0,0.0,0.0,7.100310,15.119999,100.0,100.0,8.75,7.295176,1034.447510,97.659706,1035.199951,AMS
3597,2024-12-25 21:00:00+01:00,0.0,0.0,0.0,6.766180,13.320000,100.0,100.0,8.65,7.230936,1034.647095,97.989471,1035.400024,AMS
3598,2024-12-25 22:00:00+01:00,0.0,0.0,0.0,7.329338,13.679999,100.0,100.0,8.70,7.211770,1034.647461,97.990250,1035.400024,AMS


# AeroDataBox Flight Data Retrieval

To retrive flight data, I created a `Flights` class in the `aeromarket-api` package. This class uses an `APICaller` to make requests to the AeroDataBox API and fetch flight data based on specified parameters such as flight number, date, and airport code. These data will be very helpful for the model to predict flight delays based on various factors.

The first step is to set up the API key and base URL for the AeroDataBox API. This can be done using environment variables for security purposes. Here is an example of how to set up the API key and base URL:

In [4]:
load_dotenv()
API_KEY = os.getenv("AERODATABOX_API_KEY")
BASE_URL = os.getenv("AERODATABOX_BASE_URL")

In [8]:
api_caller = APICaller(BASE_URL, API_KEY)
flights = Flights(api_caller, 
    from_local="2025-10-26",
    flight_number="DL 27",
    )

We can see below that flight.get_airport_flights method retrieves all departed and arrived flights for a specific airport on a given date. It returns the data in a json format that can be further processed or analyzed.

In [44]:
flights.get_airport_flights(code_type="iata")

{'departures': [{'movement': {'airport': {'icao': 'RKSI',
     'iata': 'ICN',
     'name': 'Seoul',
     'timeZone': 'Asia/Seoul'},
    'scheduledTime': {'utc': '2025-10-26 03:50Z',
     'local': '2025-10-25 23:50-04:00'},
    'revisedTime': {'utc': '2025-10-26 04:07Z',
     'local': '2025-10-26 00:07-04:00'},
    'runwayTime': {'utc': '2025-10-26 04:07Z',
     'local': '2025-10-26 00:07-04:00'},
    'terminal': 'I',
    'quality': ['Basic', 'Live']},
   'number': 'DL 27',
   'callSign': 'DAL27',
   'status': 'Departed',
   'codeshareStatus': 'IsOperator',
   'isCargo': False,
   'aircraft': {'reg': 'N527DN',
    'modeS': 'A6A347',
    'model': 'Airbus A350-900'},
   'airline': {'name': 'Delta Air Lines', 'iata': 'DL', 'icao': 'DAL'}},
  {'movement': {'airport': {'icao': 'KBOS',
     'iata': 'BOS',
     'name': 'Boston',
     'timeZone': 'America/New_York'},
    'scheduledTime': {'utc': '2025-10-26 02:35Z',
     'local': '2025-10-25 22:35-04:00'},
    'revisedTime': {'utc': '2025-10-26

flights.getairport_flights_df format the retrived data nicely into a pandas DataFrame for easier analysis and manipulation.

In [45]:
import pandas as pd
pd.set_option('display.max_columns', None)  # Show all columns in the DataFrame
flights_df = flights.get_airport_flights_df()
pd.set_option("display.max_rows", None)
# pd.set_option("display.max_columns", None)
display(flights_df)

Unnamed: 0,direction,flight_number,callsign,status,codeshare_status,airline,airline_iata,aircraft_model,aircraft_reg,airport_icao,airport_iata,airport_name,timezone,scheduled_utc,scheduled_local,actual_utc,actual_local,terminal,runway,is_cargo,quality,date,queried_airport_iata
0,Arrival,F9 1241,FFT1241,Arrived,IsOperator,Frontier,F9,Airbus A321,N717FR,KMCO,MCO,Orlando,America/New_York,2025-10-26 04:18:00+00:00,2025-10-26 00:18:00-04:00,2025-10-26 04:01:00+00:00,2025-10-26 00:01:00-04:00,N,09R,False,"Basic, Live",2025-10-26 00:00:00-04:00,ATL
1,Arrival,DL 748,DAL9964,Arrived,IsOperator,Delta Air Lines,DL,Airbus A321 NEO,N534DT,KLAS,LAS,Las Vegas,America/Los_Angeles,2025-10-25 23:27:00+00:00,2025-10-25 19:27:00-04:00,2025-10-26 04:01:00+00:00,2025-10-26 00:01:00-04:00,S,,False,"Basic, Live",2025-10-25 19:00:00-04:00,ATL
2,Arrival,DL 436,DAL436,Arrived,IsOperator,Delta Air Lines,DL,Airbus A320,N347NW,KDFW,DFW,Dallas-Fort Worth,America/Chicago,2025-10-26 03:12:00+00:00,2025-10-25 23:12:00-04:00,2025-10-26 04:02:00+00:00,2025-10-26 00:02:00-04:00,S,08L,False,"Basic, Live",2025-10-25 23:00:00-04:00,ATL
3,Arrival,YV 4049,ASH4049,Arrived,IsOperator,Mesa Airlines,YV,Embraer 175,N88327,KIAH,IAH,Houston,America/Chicago,2025-10-26 04:01:00+00:00,2025-10-26 00:01:00-04:00,2025-10-26 04:13:00+00:00,2025-10-26 00:13:00-04:00,,09R,False,"Basic, Live",2025-10-26 00:00:00-04:00,ATL
4,Arrival,QR 8170,QTR8170,Arrived,IsOperator,Qatar Airways,QR,Boeing 777,A7-BFX,EDDF,FRA,Frankfurt-am-Main,Europe/Berlin,2025-10-26 04:49:00+00:00,2025-10-26 00:49:00-04:00,2025-10-26 04:53:00+00:00,2025-10-26 00:53:00-04:00,,,False,"Basic, Live",2025-10-26 00:00:00-04:00,ATL
5,Departure,DL 27,DAL27,Departed,IsOperator,Delta Air Lines,DL,Airbus A350-900,N527DN,RKSI,ICN,Seoul,Asia/Seoul,2025-10-26 03:50:00+00:00,2025-10-25 23:50:00-04:00,2025-10-26 04:07:00+00:00,2025-10-26 00:07:00-04:00,I,,False,"Basic, Live",2025-10-25 23:00:00-04:00,ATL
6,Departure,DL 865,DAL865,Departed,IsOperator,Delta Air Lines,DL,Airbus A321 NEO,N527DE,KBOS,BOS,Boston,America/New_York,2025-10-26 02:35:00+00:00,2025-10-25 22:35:00-04:00,2025-10-26 04:17:00+00:00,2025-10-26 00:17:00-04:00,S,08R,False,"Basic, Live",2025-10-25 22:00:00-04:00,ATL
7,Departure,CV 27Y,CLX27Y,Departed,IsOperator,Cargolux,CV,Boeing 747-400,LX-ICL,ELLX,LUX,Luxembourg,Europe/Luxembourg,2025-10-26 04:30:00+00:00,2025-10-26 00:30:00-04:00,2025-10-26 04:40:00+00:00,2025-10-26 00:40:00-04:00,,09L,True,"Basic, Live",2025-10-26 00:00:00-04:00,ATL
8,Departure,TTT 3937,TTT3937,Expected,IsOperator,TTT,,Airbus A340-200,N937VQ,KCLT,CLT,Charlotte,America/New_York,2025-10-26 04:43:00+00:00,2025-10-26 00:43:00-04:00,2025-10-26 04:43:00+00:00,2025-10-26 00:43:00-04:00,,,False,"Basic, Live",2025-10-26 00:00:00-04:00,ATL


We can get the number of departure and arrival flights (scheduled_congestion) at the specified airport using the get_count_airport_flights method.

In [46]:
flights.get_count_airport_flights()

9

Finally, we can build a final flight response using the build_final_flight_response method. This method combines various pieces of information about the flight, such as distance, scheduled congestion, and estimated arrival time, into a single DataFrame for easy analysis. This matches the desired format for the dataset that we built.

In [47]:
flights.build_final_flight_response()

Unnamed: 0,Origin,Dest,IATA_Code_Operating_Airline,Distance,scheduled_congestion,CRSElapsedTime,arr_datetime,date_local
0,ATL,ICN,DL,7152.0,9,805.0,2025-10-28 04:00:00,2025-10-26 03:00:00


## Building the Final api call with Weather and Flight Data

Now we can easily join the weather data and flight data based on flight number and date_local that we would get from the user to predict flight delays.

Let say our user wants to predict the delay for flight number "UA 2012" on "2026-01-08" (The one from Baltimore to Denver). We can retrieve the necessary data as follows:

In [5]:
api_caller = APICaller(BASE_URL, API_KEY)
flight_bt = Flights(api_caller, 
    from_local="2026-01-08",
    flight_number="UA 2012",
    )
flight_bt.get_airport_flights(code_type="iata")
flight_bt_df = flight_bt.get_airport_flights_df()
flight_bt_df

Unnamed: 0,direction,flight_number,callsign,status,codeshare_status,airline,airline_iata,aircraft_model,aircraft_reg,airport_icao,...,scheduled_utc,scheduled_local,actual_utc,actual_local,terminal,runway,is_cargo,quality,date,queried_airport_iata
0,Arrival,WN 1223,,Expected,Unknown,Southwest,WN,,,KMDW,...,2026-01-08 21:00:00+00:00,2026-01-08 16:00:00-05:00,NaT,NaT,,,False,Basic,2026-01-08 16:00:00-05:00,BWI
1,Arrival,WN 1060,,Expected,Unknown,Southwest,WN,Boeing 737,,KDEN,...,2026-01-08 21:05:00+00:00,2026-01-08 16:05:00-05:00,NaT,NaT,,,False,Basic,2026-01-08 16:00:00-05:00,BWI
2,Arrival,WN 3478,,Expected,Unknown,Southwest,WN,,,KSDF,...,2026-01-08 21:05:00+00:00,2026-01-08 16:05:00-05:00,NaT,NaT,,,False,Basic,2026-01-08 16:00:00-05:00,BWI
3,Arrival,UA 1115,,Expected,Unknown,United,UA,Boeing 737-900,,KIAH,...,2026-01-08 21:06:00+00:00,2026-01-08 16:06:00-05:00,NaT,NaT,,,False,Basic,2026-01-08 16:00:00-05:00,BWI
4,Arrival,WN 755,,Expected,Unknown,Southwest,WN,,,KRDU,...,2026-01-08 21:25:00+00:00,2026-01-08 16:25:00-05:00,NaT,NaT,,,False,Basic,2026-01-08 16:00:00-05:00,BWI
5,Arrival,F9 4004,,Expected,Unknown,Frontier,F9,Airbus A320 NEO,,KMCO,...,2026-01-08 21:30:00+00:00,2026-01-08 16:30:00-05:00,NaT,NaT,,,False,Basic,2026-01-08 16:00:00-05:00,BWI
6,Arrival,DL 2480,,Expected,Unknown,Delta Air Lines,DL,Boeing 717-200,,KDTW,...,2026-01-08 21:38:00+00:00,2026-01-08 16:38:00-05:00,NaT,NaT,,,False,Basic,2026-01-08 16:00:00-05:00,BWI
7,Arrival,WN 1608,,Expected,Unknown,Southwest,WN,,,MMUN,...,2026-01-08 21:40:00+00:00,2026-01-08 16:40:00-05:00,NaT,NaT,,,False,Basic,2026-01-08 16:00:00-05:00,BWI
8,Arrival,WN 1730,,Expected,Unknown,Southwest,WN,Boeing 737,,KFLL,...,2026-01-08 21:50:00+00:00,2026-01-08 16:50:00-05:00,NaT,NaT,,,False,Basic,2026-01-08 16:00:00-05:00,BWI
9,Arrival,WN 1767,,Expected,Unknown,Southwest,WN,,,,...,2026-01-08 21:55:00+00:00,2026-01-08 16:55:00-05:00,NaT,NaT,,,False,Basic,2026-01-08 16:00:00-05:00,BWI


In [6]:
flight_bt_df = flight_bt.build_final_flight_response()
flight_bt_df

Unnamed: 0,Origin,Dest,IATA_Code_Operating_Airline,Distance,scheduled_congestion,CRSElapsedTime,arr_datetime,date_local
0,BWI,DEN,UA,1491.0,24,220.0,2026-01-08 18:00:00,2026-01-08 21:00:00


Then we can build the weather for this flight.

In [7]:
from datetime import datetime, timedelta
airport_code = flight_bt_df.iloc[0]['Origin']
start_date = flight_bt_df.iloc[0]['date_local']
end_date = (start_date + timedelta(hours=1))
weather_bt = Weather(
    api_caller=api_caller_weather,
    airport_code=airport_code,
    start_date=start_date.strftime("%Y-%m-%d"),
    end_date=end_date.strftime("%Y-%m-%d"),
    code_type="IATA",
    time="future",
)
weather_bt_df = weather_bt.to_hourly_dataframe()
weather_bt_df


Fetching weather data from: https://api.open-meteo.com/v1/forecast


Unnamed: 0,date,snowfall,rain,precipitation,wind_speed_10m,wind_gusts_10m,cloud_cover_low,cloud_cover,temperature_2m,apparent_temperature,surface_pressure,relative_humidity_2m,pressure_msl,queried_airport_code
0,2026-01-08 00:00:00-05:00,0.0,0.0,0.0,3.877318,6.48,78.0,100.0,5.1935,3.359359,1007.097656,97.0,1012.299988,BWI
1,2026-01-08 01:00:00-05:00,0.0,0.0,0.0,5.95906,11.879999,76.0,100.0,5.1935,3.025036,1007.097656,96.0,1012.299988,BWI
2,2026-01-08 02:00:00-05:00,0.0,0.0,0.0,8.587338,21.599998,71.0,90.0,5.4435,2.880919,1007.500366,94.0,1012.700012,BWI
3,2026-01-08 03:00:00-05:00,0.0,0.0,0.0,11.753876,33.48,65.0,75.0,5.8435,2.805331,1008.104553,91.0,1013.299988,BWI
4,2026-01-08 04:00:00-05:00,0.0,0.0,0.0,14.512064,41.399998,58.0,62.0,6.0435,2.513441,1008.80481,87.0,1014.0,BWI
5,2026-01-08 05:00:00-05:00,0.0,0.0,0.0,17.309975,42.119999,49.0,50.0,5.8435,1.641382,1009.696411,80.0,1014.900024,BWI
6,2026-01-08 06:00:00-05:00,0.0,0.0,0.0,19.346441,38.880001,39.0,40.0,5.4435,0.623859,1010.68396,72.0,1015.900024,BWI
7,2026-01-08 07:00:00-05:00,0.0,0.0,0.0,20.620804,35.639999,38.0,39.0,5.0935,-0.17978,1011.672241,65.0,1016.900024,BWI
8,2026-01-08 08:00:00-05:00,0.0,0.0,0.0,19.642281,33.119999,56.0,57.0,4.8935,-0.384938,1012.663269,61.0,1017.900024,BWI
9,2026-01-08 09:00:00-05:00,0.0,0.0,0.0,18.0,30.599998,83.0,83.0,4.7435,-0.374133,1013.655212,59.0,1018.900024,BWI


Now we can simply join the 2 datasets on airport code and date_local to get the final data point needed for prediction.

In [None]:
weather_bt_df['date'] = pd.to_datetime(weather_bt_df['date']).dt.tz_localize(None)
merged_df = flight_bt_df.merge(weather_bt_df, left_on=['Origin', 'date_local'], right_on=['queried_airport_code', 'date'], how='left')
merged_df

Unnamed: 0,Origin,Dest,IATA_Code_Operating_Airline,Distance,scheduled_congestion,CRSElapsedTime,arr_datetime,date_local,date,snowfall,...,wind_speed_10m,wind_gusts_10m,cloud_cover_low,cloud_cover,temperature_2m,apparent_temperature,surface_pressure,relative_humidity_2m,pressure_msl,queried_airport_code
0,BWI,DEN,UA,1491.0,24,220.0,2026-01-08 18:00:00,2026-01-08 21:00:00,2026-01-08 21:00:00,0.0,...,5.495161,15.48,0.0,9.0,3.3935,0.075984,1017.907471,64.0,1023.200012,BWI


: 

This concludes our section on integrating Open Metro and AeroDataBox APIs for retrieving weather and flight data. The next step would be to use this data for training a machine learning model to predict flight delays based on the retrieved weather and flight information.