# Open Metro and AeroDataBox API Integration

The purpose of this section is to explore the integration of Open Metro and AeroDataBox APIs for retrieving weather and flight data. The methodology that I used to explore these APIs is as follows:
* Create aeromarket-api and openmeteo-api packages to interact with the respective APIs more easily.
* Use environment variables to securely store API keys and URLs.
* Implement classes and methods to fetch and process data from both APIs. 

In [1]:
# Executing this cell does some magic
%load_ext autoreload
%autoreload 2

In [3]:
%cd ..

d:\OneDrive\Personal_Things\My Website\My website HTML\backend\supplychainAPI


In [13]:
from openmeteo_api.src.openmeteoapi.WeatherData import Weather
from openmeteo_api.src.openmeteoapi.APICaller import OpenMeteoAPICaller
from aeromarket_api.src.aeroapi_market.APICaller import APICaller
from aeromarket_api.src.aeroapi_market.Flights import Flights
import pandas as pd
import os
from dotenv import load_dotenv

As a reminder, our goal is to get a dataset with the following format:

In [6]:
sample_df = pd.read_parquet("./data/sample_df_v3.parquet")

In [8]:
sample_df.head(2)

Unnamed: 0,Origin,Dest,IATA_Code_Operating_Airline,DepDelayMinutes,Distance,CRSElapsedTime,arr_datetime,dep_scheduled_congestion,dep_snowfall,dep_rain,...,arr_wind_speed_10m,arr_wind_gusts_10m,arr_cloud_cover_low,arr_cloud_cover,arr_temperature_2m,arr_apparent_temperature,arr_surface_pressure,arr_relative_humidity_2m,arr_pressure_msl,arr_scheduled_congestion
24296413,SEA,SMF,AS,10.0,605.0,115.0,2021-06-28 08:00:00,28.0,0.0,0.0,...,11.298495,24.84,0.0,0.0,16.6,15.752708,1006.069519,76.995392,1006.900024,17.0
8823672,DSM,DEN,UA,0.0,589.0,116.0,2019-09-15 18:00:00,9.0,0.0,0.0,...,15.188416,42.48,0.0,100.0,28.4,24.0271,840.228271,15.589545,1008.099976,75.0


# Weather Data Retrieval using OpenMeteo API

To retrieve weather data, I created a `Weather` class in the `openmeteo-api` package. This class uses an `APICaller` to make requests to the OpenMeteo API and fetch historical weather data based on specified parameters such as latitude, longitude, start date, end date, and hourly variables. These data will be very helpful for the model to predict flight delays based on weather conditions.


I defined the disiered weather variables in the weather class for easier readability. Alternatively, you can specify the variables directly when calling the methods. The desired weather variables are as follows:
- Temperature at 2 meters (°C)
- Precipitation (mm)
- Wind Speed at 10 meters (km/h)
- Humidity at 2 meters (%)
- rainfall (mm)
- wind gusts at 10 meters (km/h)
- cloud cover (%)
- cloud cover low (%)
- apparent temperature (°C)
- surface pressure (hPa)
- pressure at mean sea level (hPa)

In order to retrieve weather data, we need to create an instance of the 'Weather' class and call 'to_hourly_dataframe' method (or to_daily_dataframe) to get the hourly desired data in a pandas DataFrame format. As follows:

In [None]:
api_caller_weather = OpenMeteoAPICaller()
weather_data = Weather(
    api_caller=api_caller_weather,
    airport_code="JFK",
    start_date="2024-12-11",
    end_date="2024-12-25",
    code_type="IATA",
)
pd.reset_option('display.max_columns')
pd.reset_option('display.max_rows')
weather_df = weather_data.to_hourly_dataframe()

In [9]:
weather_df.head(2)

Unnamed: 0,date,snowfall,rain,precipitation,wind_speed_10m,wind_gusts_10m,cloud_cover_low,cloud_cover,temperature_2m,apparent_temperature,surface_pressure,relative_humidity_2m,pressure_msl,queried_airport_code
0,2024-12-11 00:00:00-05:00,0.0,0.1,0.1,16.201,33.48,100.0,100.0,11.75,9.913984,1012.649658,99.670563,1013.5,JFK
1,2024-12-11 01:00:00-05:00,0.0,0.7,0.7,15.865546,33.839996,100.0,100.0,11.55,9.700219,1011.549988,99.670036,1012.400024,JFK


Similarly, we can retrive weather data for multiple locations.

In [66]:
airports = ["JFK", "LAX", "LHR", "CDG", "HND", "DXB", "SIN", "SYD", "FRA", "AMS"]
api_caller_weather = OpenMeteoAPICaller()
weather = Weather(
    api_caller=api_caller_weather,
    airport_code=airports,
    start_date="2024-12-11",
    end_date="2024-12-25",
    code_type="IATA",
)
weather_df = weather.to_hourly_dataframe()
weather_df

Unnamed: 0,date,snowfall,rain,precipitation,wind_speed_10m,wind_gusts_10m,cloud_cover_low,cloud_cover,temperature_2m,apparent_temperature,surface_pressure,relative_humidity_2m,pressure_msl,queried_airport_code
0,2024-12-11 00:00:00-05:00,0.0,0.1,0.1,16.201000,33.480000,100.0,100.0,11.75,9.913984,1013.135620,99.670563,1013.500000,JFK
1,2024-12-11 01:00:00-05:00,0.0,0.7,0.7,15.865546,33.839996,100.0,100.0,11.55,9.700219,1012.035706,99.670036,1012.400024,JFK
2,2024-12-11 02:00:00-05:00,0.0,0.7,0.7,19.937794,39.959999,73.0,100.0,12.00,9.650640,1011.436462,98.690514,1011.799988,JFK
3,2024-12-11 03:00:00-05:00,0.0,1.2,1.2,19.408306,39.239998,0.0,100.0,12.85,10.625551,1011.337463,94.262718,1011.700012,JFK
4,2024-12-11 04:00:00-05:00,0.0,0.3,0.3,14.458382,38.880001,0.0,100.0,13.20,11.493306,1010.438416,88.260132,1010.799988,JFK
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3595,2024-12-25 19:00:00+01:00,0.0,0.0,0.0,8.131936,16.559999,100.0,100.0,8.85,7.270769,1034.048218,97.661545,1034.800049,AMS
3596,2024-12-25 20:00:00+01:00,0.0,0.0,0.0,7.100310,15.119999,100.0,100.0,8.75,7.295176,1034.447510,97.659706,1035.199951,AMS
3597,2024-12-25 21:00:00+01:00,0.0,0.0,0.0,6.766180,13.320000,100.0,100.0,8.65,7.230936,1034.647095,97.989471,1035.400024,AMS
3598,2024-12-25 22:00:00+01:00,0.0,0.0,0.0,7.329338,13.679999,100.0,100.0,8.70,7.211770,1034.647461,97.990250,1035.400024,AMS


Similarly, we can request weather data for multiple locations:

In [39]:
airports = ["JFK", "LAX", "LHR", "CDG", "HND", "DXB", "SIN", "SYD", "FRA", "AMS"]
airport_list = list(airports)
weather = Weather(
    api_caller=api_caller_weather,
    airport_code=airport_list,
    start_date="2024-12-11",
    end_date="2024-12-25",
    code_type="IATA",
)
weather_df2 = weather.to_hourly_dataframe()
weather_df2

Unnamed: 0,date,snowfall,rain,precipitation,wind_speed_10m,wind_gusts_10m,cloud_cover_low,cloud_cover,temperature_2m,apparent_temperature,surface_pressure,relative_humidity_2m,pressure_msl,queried_airport_code
0,2024-12-11 00:00:00-05:00,0.0,0.1,0.1,16.201000,33.480000,100.0,100.0,11.75,9.913984,1013.135620,99.670563,1013.500000,JFK
1,2024-12-11 01:00:00-05:00,0.0,0.7,0.7,15.865546,33.839996,100.0,100.0,11.55,9.700219,1012.035706,99.670036,1012.400024,JFK
2,2024-12-11 02:00:00-05:00,0.0,0.7,0.7,19.937794,39.959999,73.0,100.0,12.00,9.650640,1011.436462,98.690514,1011.799988,JFK
3,2024-12-11 03:00:00-05:00,0.0,1.2,1.2,19.408306,39.239998,0.0,100.0,12.85,10.625551,1011.337463,94.262718,1011.700012,JFK
4,2024-12-11 04:00:00-05:00,0.0,0.3,0.3,14.458382,38.880001,0.0,100.0,13.20,11.493306,1010.438416,88.260132,1010.799988,JFK
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3595,2024-12-25 19:00:00+01:00,0.0,0.0,0.0,8.131936,16.559999,100.0,100.0,8.85,7.270769,1034.048218,97.661545,1034.800049,AMS
3596,2024-12-25 20:00:00+01:00,0.0,0.0,0.0,7.100310,15.119999,100.0,100.0,8.75,7.295176,1034.447510,97.659706,1035.199951,AMS
3597,2024-12-25 21:00:00+01:00,0.0,0.0,0.0,6.766180,13.320000,100.0,100.0,8.65,7.230936,1034.647095,97.989471,1035.400024,AMS
3598,2024-12-25 22:00:00+01:00,0.0,0.0,0.0,7.329338,13.679999,100.0,100.0,8.70,7.211770,1034.647461,97.990250,1035.400024,AMS


# AeroDataBox Flight Data Retrieval

To retrive flight data, I created a `Flights` class in the `aeromarket-api` package. This class uses an `APICaller` to make requests to the AeroDataBox API and fetch flight data based on specified parameters such as flight number, date, and airport code. These data will be very helpful for the model to predict flight delays based on various factors.

The first step is to set up the API key and base URL for the AeroDataBox API. This can be done using environment variables for security purposes. Here is an example of how to set up the API key and base URL:

In [9]:
load_dotenv()
API_KEY = os.getenv("AERODATABOX_API_KEY")
BASE_URL = os.getenv("AERODATABOX_BASE_URL")

In [14]:
api_caller = APICaller(BASE_URL, API_KEY)
flights = Flights(api_caller, 
    from_local="2025-10-26",
    flight_number="DL 27",
    )

We can see below that flight.get_airport_flights method retrieves all departed and arrived flights for a specific airport on a given date. It returns the data in a json format that can be further processed or analyzed.

In [15]:
flights.get_airport_flights(code_type="iata")

{'departures': [{'movement': {'airport': {'icao': 'RKSI',
     'iata': 'ICN',
     'name': 'Seoul',
     'timeZone': 'Asia/Seoul'},
    'scheduledTime': {'utc': '2025-10-27 03:50Z',
     'local': '2025-10-26 23:50-04:00'},
    'revisedTime': {'utc': '2025-10-27 04:04Z',
     'local': '2025-10-27 00:04-04:00'},
    'runwayTime': {'utc': '2025-10-27 04:04Z',
     'local': '2025-10-27 00:04-04:00'},
    'terminal': 'I',
    'runway': '09L',
    'quality': ['Basic', 'Live']},
   'number': 'DL 27',
   'callSign': 'DAL27',
   'status': 'Departed',
   'codeshareStatus': 'IsOperator',
   'isCargo': False,
   'aircraft': {'reg': 'N573DZ',
    'modeS': 'A759F1',
    'model': 'Airbus A350-900'},
   'airline': {'name': 'Delta Air Lines', 'iata': 'DL', 'icao': 'DAL'}},
  {'movement': {'airport': {'icao': 'LFPG',
     'iata': 'CDG',
     'name': 'Paris',
     'timeZone': 'Europe/Paris'},
    'scheduledTime': {'utc': '2025-10-27 03:45Z',
     'local': '2025-10-26 23:45-04:00'},
    'revisedTime': {'u

flights.getairport_flights_df format the retrived data nicely into a pandas DataFrame for easier analysis and manipulation.

In [142]:
pd.set_option('display.max_columns', None)  # Show all columns in the DataFrame
flights_df = flights.get_airport_flights_df()
# pd.set_option("display.max_rows", None)
pd.reset_option('display.max_rows')
# pd.set_option("display.max_columns", None)
display(flights_df.head(2))

Unnamed: 0,direction,flight_number,callsign,status,codeshare_status,airline,airline_iata,aircraft_model,aircraft_reg,airport_icao,airport_iata,airport_name,timezone,scheduled_utc,scheduled_local,actual_utc,actual_local,terminal,runway,is_cargo,quality,date,queried_airport_iata
0,Arrival,AA 3261,AAL3261,Arrived,IsOperator,American,AA,Boeing 737-800,N953AN,KMIA,MIA,Miami,America/New_York,2025-10-27 03:41:00+00:00,2025-10-26 23:41:00-04:00,2025-10-27 04:01:00+00:00,2025-10-27 00:01:00-04:00,N,,False,"Basic, Live",2025-10-26 23:00:00-04:00,ATL
1,Arrival,AA 1045,AAL1045,Arrived,IsOperator,American,AA,Airbus A321,N974UY,KMIA,MIA,Miami,America/New_York,2025-10-27 02:19:00+00:00,2025-10-26 22:19:00-04:00,2025-10-27 04:05:00+00:00,2025-10-27 00:05:00-04:00,N,,False,"Basic, Live",2025-10-26 22:00:00-04:00,ATL


We can get the number of departure and arrival flights (scheduled_congestion) at the specified airport using the get_count_airport_flights method.

In [17]:
flights.get_count_airport_flights()

36

Finally, we can build a final flight response using the build_final_flight_response method. This method combines various pieces of information about the flight, such as distance, scheduled congestion, and estimated arrival time, into a single DataFrame for easy analysis. This matches the desired format for the dataset that we built.

In [34]:
sample_df.head(1)

Unnamed: 0,Origin,Dest,IATA_Code_Operating_Airline,DepDelayMinutes,Distance,CRSElapsedTime,arr_datetime,dep_scheduled_congestion,dep_snowfall,dep_rain,dep_precipitation,dep_wind_speed_10m,dep_wind_gusts_10m,dep_cloud_cover_low,dep_cloud_cover,dep_temperature_2m,dep_apparent_temperature,dep_surface_pressure,dep_relative_humidity_2m,dep_pressure_msl,dep_date_local,arr_date,arr_snowfall,arr_rain,arr_precipitation,arr_wind_speed_10m,arr_wind_gusts_10m,arr_cloud_cover_low,arr_cloud_cover,arr_temperature_2m,arr_apparent_temperature,arr_surface_pressure,arr_relative_humidity_2m,arr_pressure_msl,arr_scheduled_congestion
24296413,SEA,SMF,AS,10.0,605.0,115.0,2021-06-28 08:00:00,28.0,0.0,0.0,0.0,6.725354,7.92,0.0,0.0,20.049999,21.273323,994.155151,79.275009,1007.900024,2021-06-28 06:00:00,2021-06-28 08:00:00-06:00,0.0,0.0,0.0,11.298495,24.84,0.0,0.0,16.6,15.752708,1006.069519,76.995392,1006.900024,17.0


In [39]:
flights.build_final_flight_response()

Unnamed: 0,Origin,Dest,IATA_Code_Operating_Airline,Distance,dep_scheduled_congestion,arr_scheduled_congestion,CRSElapsedTime,arr_datetime,dep_date_local
0,ATL,ICN,DL,7152.0,36,39,805.0,2025-10-28 04:00:00,2025-10-26 23:00:00-04:00


This looks very identical to the desired format for our dataset (except for the weather which will be added later).

## Building the Final api call with Weather and Flight Data

Now we can easily join the weather data and flight data based on flight number and date_local that we would get from the user to predict flight delays.

Let say our user wants to predict the delay for flight number "UA 2012" on "2026-01-08" (The one from Baltimore to Denver). We can retrieve the necessary data as follows:

In [41]:
api_caller = APICaller(BASE_URL, API_KEY)
flight_bt = Flights(api_caller, 
    from_local="2026-01-08",
    flight_number="UA 2012",
    )
flight_bt.get_airport_flights(code_type="iata")
flight_bt_df = flight_bt.get_airport_flights_df()

In [42]:
flight_bt_df = flight_bt.build_final_flight_response()
flight_bt_df

Unnamed: 0,Origin,Dest,IATA_Code_Operating_Airline,Distance,dep_scheduled_congestion,arr_scheduled_congestion,CRSElapsedTime,arr_datetime,dep_date_local
0,BWI,DEN,UA,1491.0,25,111,220.0,2026-01-08 18:00:00,2026-01-08 16:00:00-05:00


Then we can build the weather for this flight.

In [133]:
from datetime import datetime, timedelta
api_caller_weather = OpenMeteoAPICaller()
airport_codes = [flight_bt_df.iloc[0]['Origin'], flight_bt_df.iloc[0]['Dest']]
start_date = flight_bt_df.iloc[0]['dep_date_local']
end_date = (start_date + timedelta(days=3))
weather_bt = Weather(
    api_caller=api_caller_weather,
    airport_code=airport_codes,
    start_date=start_date.strftime("%Y-%m-%d"),
    end_date=end_date.strftime("%Y-%m-%d"),
    code_type="IATA",
    time="future",
)
weather_bt_df = weather_bt.to_hourly_dataframe()

Fetching weather data from: https://api.open-meteo.com/v1/forecast
Successfully fetched data for 2 airport(s).


In [None]:
weather_bt_df = weather_bt_df['data']

In [108]:
prefix_dep = "dep_"
prefix_arr = "arr_"
weather_dep = weather_bt_df[weather_bt_df['queried_airport_code'] == flight_bt_df.iloc[0]['Origin']].copy()
weather_dep = weather_dep.add_prefix(prefix_dep)
weather_arr = weather_bt_df[weather_bt_df['queried_airport_code'] == flight_bt_df.iloc[0]['Dest']].copy()
weather_arr = weather_arr.add_prefix(prefix_arr)

Now we can simply join the 2 datasets on airport code and date_local to get the final data point needed for prediction.

In [None]:
flight_bt_df['arr_datetime'] = (
    flight_bt_df['arr_datetime']
    .dt.tz_localize(None)
)
flight_bt_df['dep_date_local'] = (
    flight_bt_df['dep_date_local']
    .dt.tz_localize(None)
)

In [110]:
weather_arr['arr_date'] = pd.to_datetime(weather_arr['arr_date'], errors='coerce')
weather_dep['dep_date'] = pd.to_datetime(weather_dep['dep_date'], errors='coerce')

In [None]:
weather_arr['arr_date'] = weather_arr['arr_date'].dt.tz_localize(None)
weather_dep['dep_date'] = weather_dep['dep_date'].dt.tz_localize(None)

In [127]:
merged_df = flight_bt_df.merge(weather_dep, left_on=['Origin', 'dep_date_local'], right_on=['dep_queried_airport_code', 'dep_date'], how='left')
merged_df = merged_df.merge(weather_arr, left_on=['Dest', 'arr_datetime'], right_on=['arr_queried_airport_code', 'arr_date'], how='left')
merged_df

Unnamed: 0,Origin,Dest,IATA_Code_Operating_Airline,Distance,dep_scheduled_congestion,arr_scheduled_congestion,CRSElapsedTime,arr_datetime,dep_date_local,arr_datetime_local,dep_date,dep_snowfall,dep_rain,dep_precipitation,dep_wind_speed_10m,dep_wind_gusts_10m,dep_cloud_cover_low,dep_cloud_cover,dep_temperature_2m,dep_apparent_temperature,dep_surface_pressure,dep_relative_humidity_2m,dep_pressure_msl,dep_queried_airport_code,arr_date,arr_snowfall,arr_rain,arr_precipitation,arr_wind_speed_10m,arr_wind_gusts_10m,arr_cloud_cover_low,arr_cloud_cover,arr_temperature_2m,arr_apparent_temperature,arr_surface_pressure,arr_relative_humidity_2m,arr_pressure_msl,arr_queried_airport_code
0,BWI,DEN,UA,1491.0,25,111,220.0,2026-01-08 18:00:00,2026-01-08 16:00:00,2026-01-08 18:00:00,2026-01-08 16:00:00,0.0,0.0,0.0,4.693825,9.36,0.0,100.0,11.961,9.893611,1015.667053,59.0,1021.400024,BWI,2026-01-08 18:00:00,0.56,0.4,1.2,27.210379,40.32,36.0,100.0,-1.7495,-8.26672,824.80304,91.0,1008.700012,DEN


In [None]:
columns_to_drop = [
    'dep_queried_airport_code', 'arr_queried_airport_code', 'arr_datetime_local', 'dep_date', 'arr_date', ]
merged_df = merged_df.drop(columns=columns_to_drop)

In [140]:
sample_df.drop(columns='arr_date', inplace=True)
display(len(merged_df.columns), len(sample_df.columns))

33

34

This concludes our section on integrating Open Metro and AeroDataBox APIs for retrieving weather and flight data. The next step would be to use this data for training a machine learning model to predict flight delays based on the retrieved weather and flight information.

Recap:
1. Created `openmeteo-api` and `aeromarket-api` packages to interact with Open Metro and AeroDataBox APIs.
2. Implemented classes and methods to fetch and process weather and flight data.
3. Retrieved weather data for specific locations and dates using the OpenMeteo API.
4. Retrieved flight data for specific flights and airports using the AeroDataBox API.
5. Merged weather and flight data to create the desired dataset format for predicting flight delays.