# Open Metro and AeroDataBox API Integration

The purpose of this notebook is to explore the integration of Open Metro and AeroDataBox APIs for retrieving weather and flight data. The methodology that I used to explore these APIs is as follows:
* Create aeromarket-api and openmeteo-api packages to interact with the respective APIs more easily.
* Use environment variables to securely store API keys and URLs.
* Implement classes and methods to fetch and process data from both APIs. 

In [1]:
# Executing this cell does some magic
%load_ext autoreload
%autoreload 2

In [4]:
from openmeteoapi.WeatherData import Weather
from openmeteoapi.APICaller import OpenMeteoAPICaller
from aeroapi_market.APICaller import APICaller
from aeroapi_market.Flights import Flights
import os
from dotenv import load_dotenv

# Weather Data Retrieval using OpenMeteo API

To retrieve weather data, I created a `Weather` class in the `openmeteo-api` package. This class uses an `APICaller` to make requests to the OpenMeteo API and fetch historical weather data based on specified parameters such as latitude, longitude, start date, end date, and hourly variables. These data will be very helpful for the model to predict flight delays based on weather conditions.


I defined the disiered weather variables in the weather class for easier readability. Alternatively, you can specify the variables directly when calling the methods. The desired weather variables are as follows:
- Temperature at 2 meters (°C)
- Precipitation (mm)
- Wind Speed at 10 meters (km/h)
- Humidity at 2 meters (%)
- rainfall (mm)
- wind gusts at 10 meters (km/h)
- cloud cover (%)
- cloud cover low (%)
- apparent temperature (°C)
- surface pressure (hPa)
- pressure at mean sea level (hPa)

In order to retrieve weather data, we need to create an instance of the 'Weather' class and call 'to_hourly_dataframe' method (or to_daily_dataframe) to get the hourly desired data in a pandas DataFrame format. As follows:

In [20]:
import airportsdata

airports = airportsdata.load("ICAO")  # load ICAO-indexed data

icao = "KJFK"
airport = airports.get(icao)

if airport:
    lat = airport["lat"]
    lon = airport["lon"]
    print(lat, lon)


40.63993 -73.77869


In [30]:
api_caller_weather = OpenMeteoAPICaller()
weather_data = Weather(
    api_caller=api_caller_weather,
    airport_icao="KJFK",
    start_date="2024-12-11",
    end_date="2024-12-25",
)
weather_df = weather_data.to_hourly_dataframe()
weather_df

America/New_York


Unnamed: 0,date,snowfall,rain,precipitation,wind_speed_10m,wind_gusts_10m,cloud_cover_low,cloud_cover,temperature_2m,apparent_temperature,surface_pressure,relative_humidity_2m,pressure_msl,queried_airport_icao
0,2024-12-11 00:00:00-05:00,0.0,0.1,0.1,16.201000,33.480000,100.0,100.0,11.75,9.913984,1013.135620,99.670563,1013.500000,KJFK
1,2024-12-11 01:00:00-05:00,0.0,0.7,0.7,15.865546,33.839996,100.0,100.0,11.55,9.700219,1012.035706,99.670036,1012.400024,KJFK
2,2024-12-11 02:00:00-05:00,0.0,0.7,0.7,19.937794,39.959999,73.0,100.0,12.00,9.650640,1011.436462,98.690514,1011.799988,KJFK
3,2024-12-11 03:00:00-05:00,0.0,1.2,1.2,19.408306,39.239998,0.0,100.0,12.85,10.625551,1011.337463,94.262718,1011.700012,KJFK
4,2024-12-11 04:00:00-05:00,0.0,0.3,0.3,14.458382,38.880001,0.0,100.0,13.20,11.493306,1010.438416,88.260132,1010.799988,KJFK
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
355,2024-12-25 19:00:00-05:00,0.0,0.0,0.0,9.734988,20.160000,0.0,95.0,-2.50,-6.895881,1029.709839,72.037743,1030.099976,KJFK
356,2024-12-25 20:00:00-05:00,0.0,0.0,0.0,10.981475,21.599998,0.0,97.0,-2.00,-6.606600,1029.810791,67.842216,1030.199951,KJFK
357,2024-12-25 21:00:00-05:00,0.0,0.0,0.0,11.340000,23.400000,0.0,99.0,-2.60,-7.338129,1029.809814,66.410843,1030.199951,KJFK
358,2024-12-25 22:00:00-05:00,0.0,0.0,0.0,10.441552,22.680000,0.0,99.0,-3.00,-7.664631,1029.909302,65.040344,1030.300049,KJFK


# AeroDataBox Flight Data Retrieval

To retrive flight data, I created a `Flights` class in the `aeromarket-api` package. This class uses an `APICaller` to make requests to the AeroDataBox API and fetch flight data based on specified parameters such as flight number, date, and airport code. These data will be very helpful for the model to predict flight delays based on various factors.

The first step is to set up the API key and base URL for the AeroDataBox API. This can be done using environment variables for security purposes. Here is an example of how to set up the API key and base URL:

In [5]:
load_dotenv()
API_KEY = os.getenv("AERODATABOX_API_KEY")
BASE_URL = os.getenv("AERODATABOX_BASE_URL")

In [87]:
api_caller = APICaller(BASE_URL, API_KEY)
flights = Flights(api_caller, 
    airport_code="ATL",
    from_local="2025-10-26T00:00",
    to_local="2025-10-26T12:00",
    keep_cols=True
    )

We can see below that flight.get_airport_flights method retrieves all departed and arrived flights for a specific airport on a given date. It returns the data in a json format that can be further processed or analyzed.

In [88]:
flights.get_airport_flights(code_type="iata")

  import airportsdata


{'departures': [{'movement': {'airport': {'icao': 'RKSI',
     'iata': 'ICN',
     'name': 'Seoul',
     'timeZone': 'Asia/Seoul'},
    'scheduledTime': {'utc': '2025-10-26 03:50Z',
     'local': '2025-10-25 23:50-04:00'},
    'revisedTime': {'utc': '2025-10-26 04:07Z',
     'local': '2025-10-26 00:07-04:00'},
    'runwayTime': {'utc': '2025-10-26 04:07Z',
     'local': '2025-10-26 00:07-04:00'},
    'terminal': 'I',
    'quality': ['Basic', 'Live']},
   'number': 'DL 27',
   'callSign': 'DAL27',
   'status': 'Departed',
   'codeshareStatus': 'IsOperator',
   'isCargo': False,
   'aircraft': {'reg': 'N527DN',
    'modeS': 'A6A347',
    'model': 'Airbus A350-900'},
   'airline': {'name': 'Delta Air Lines', 'iata': 'DL', 'icao': 'DAL'}},
  {'movement': {'airport': {'icao': 'KBOS',
     'iata': 'BOS',
     'name': 'Boston',
     'timeZone': 'America/New_York'},
    'scheduledTime': {'utc': '2025-10-26 02:35Z',
     'local': '2025-10-25 22:35-04:00'},
    'revisedTime': {'utc': '2025-10-26

flights.getairport_flights_df format the retrived data nicely into a pandas DataFrame for easier analysis and manipulation.

In [89]:
import pandas as pd
pd.set_option('display.max_columns', None)  # Show all columns in the DataFrame
flights_df = flights.get_airport_flights_df()
flights_df

Unnamed: 0,direction,flight_number,callsign,status,codeshare_status,airline,airline_iata,aircraft_model,aircraft_reg,airport_icao,airport_iata,airport_name,timezone,scheduled_utc,scheduled_local,actual_utc,actual_local,terminal,runway,is_cargo,quality,date,queried_airport_iata
0,Arrival,DL 748,DAL9964,Arrived,IsOperator,Delta Air Lines,DL,Airbus A321 NEO,N534DT,KLAS,LAS,Las Vegas,America/Los_Angeles,2025-10-25 23:27:00+00:00,2025-10-25 19:27:00-04:00,2025-10-26 04:01:00+00:00,2025-10-26 00:01:00-04:00,S,,False,"Basic, Live",2025-10-25 19:00:00-04:00,ATL
1,Arrival,F9 1241,FFT1241,Arrived,IsOperator,Frontier,F9,Airbus A321,N717FR,KMCO,MCO,Orlando,America/New_York,2025-10-26 04:18:00+00:00,2025-10-26 00:18:00-04:00,2025-10-26 04:01:00+00:00,2025-10-26 00:01:00-04:00,N,09R,False,"Basic, Live",2025-10-26 00:00:00-04:00,ATL
2,Arrival,DL 436,DAL436,Arrived,IsOperator,Delta Air Lines,DL,Airbus A320,N347NW,KDFW,DFW,Dallas-Fort Worth,America/Chicago,2025-10-26 03:12:00+00:00,2025-10-25 23:12:00-04:00,2025-10-26 04:02:00+00:00,2025-10-26 00:02:00-04:00,S,08L,False,"Basic, Live",2025-10-25 23:00:00-04:00,ATL
3,Arrival,YV 4049,ASH4049,Arrived,IsOperator,Mesa Airlines,YV,Embraer 175,N88327,KIAH,IAH,Houston,America/Chicago,2025-10-26 04:01:00+00:00,2025-10-26 00:01:00-04:00,2025-10-26 04:13:00+00:00,2025-10-26 00:13:00-04:00,,09R,False,"Basic, Live",2025-10-26 00:00:00-04:00,ATL
4,Arrival,QR 8170,QTR8170,Arrived,IsOperator,Qatar Airways,QR,Boeing 777,A7-BFX,EDDF,FRA,Frankfurt-am-Main,Europe/Berlin,2025-10-26 04:49:00+00:00,2025-10-26 00:49:00-04:00,2025-10-26 04:53:00+00:00,2025-10-26 00:53:00-04:00,,,False,"Basic, Live",2025-10-26 00:00:00-04:00,ATL
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
763,Departure,N7 CZ,N70CZ,Departed,IsOperator,Unknown/Private owner,,Cessna 525C Citation CJ4,N70CZ,KICT,ICT,Wichita,America/Chicago,2025-10-26 15:53:00+00:00,2025-10-26 11:53:00-04:00,2025-10-26 15:53:00+00:00,2025-10-26 11:53:00-04:00,,,False,"Basic, Live",2025-10-26 11:00:00-04:00,ATL
764,Departure,DL 1030,DAL1030,Departed,IsOperator,Delta Air Lines,DL,Boeing 757-200,N698DL,KMCO,MCO,Orlando,America/New_York,2025-10-26 15:45:00+00:00,2025-10-26 11:45:00-04:00,2025-10-26 15:54:00+00:00,2025-10-26 11:54:00-04:00,S,,False,"Basic, Live",2025-10-26 11:00:00-04:00,ATL
765,Departure,DL 414,DAL414,Departed,IsOperator,Delta Air Lines,DL,Airbus A321,N384DN,KDFW,DFW,Dallas-Fort Worth,America/Chicago,2025-10-26 15:36:00+00:00,2025-10-26 11:36:00-04:00,2025-10-26 15:57:00+00:00,2025-10-26 11:57:00-04:00,S,,False,"Basic, Live",2025-10-26 11:00:00-04:00,ATL
766,Departure,DL 775,DAL775,Departed,IsOperator,Delta Air Lines,DL,Boeing 737-900,N933DZ,KCLE,CLE,Cleveland,America/New_York,2025-10-26 15:21:00+00:00,2025-10-26 11:21:00-04:00,2025-10-26 15:58:00+00:00,2025-10-26 11:58:00-04:00,S,,False,"Basic, Live",2025-10-26 11:00:00-04:00,ATL


Now we can easily join the weather data and flight data based on the airport code and date to create a comprehensive dataset for analysis.

In [82]:
start_date = min(flights_df['date']).strftime('%Y-%m-%d')
end_date = max(flights_df['date']).strftime('%Y-%m-%d')
airport_code = flights_df['queried_airport_iata'].unique()[0]
weather_data = Weather(
    api_caller=api_caller_weather,
    airport_code=airport_code,
    code_type="iata",
    start_date=start_date,
    end_date=end_date,
)
merged_df = pd.merge(
    flights_df,
    weather_data.to_hourly_dataframe(),
    left_on=['queried_airport_iata', 'date'],
    right_on=['queried_airport_code', 'date'],
    how='left'
)
merged_df

Unnamed: 0,direction,date,queried_airport_iata,snowfall,rain,precipitation,wind_speed_10m,wind_gusts_10m,cloud_cover_low,cloud_cover,temperature_2m,apparent_temperature,surface_pressure,relative_humidity_2m,pressure_msl,queried_airport_code
0,Arrival,2025-10-25 19:00:00-04:00,ATL,0.0,0.0,0.0,12.781267,29.160000,0.0,100.0,17.25,13.933897,987.797729,40.934200,1023.400024,ATL
1,Arrival,2025-10-26 00:00:00-04:00,ATL,0.0,0.0,0.0,15.121070,30.599998,0.0,100.0,13.95,10.177357,988.940796,48.473717,1025.000000,ATL
2,Arrival,2025-10-25 23:00:00-04:00,ATL,0.0,0.0,0.0,14.830076,32.039997,0.0,100.0,14.65,10.938719,988.930481,46.656029,1024.900024,ATL
3,Arrival,2025-10-26 00:00:00-04:00,ATL,0.0,0.0,0.0,15.121070,30.599998,0.0,100.0,13.95,10.177357,988.940796,48.473717,1025.000000,ATL
4,Arrival,2025-10-26 00:00:00-04:00,ATL,0.0,0.0,0.0,15.121070,30.599998,0.0,100.0,13.95,10.177357,988.940796,48.473717,1025.000000,ATL
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
763,Departure,2025-10-26 11:00:00-04:00,ATL,0.0,0.0,0.0,15.568700,36.000000,100.0,100.0,13.85,10.278877,989.121582,53.641556,1025.199951,ATL
764,Departure,2025-10-26 11:00:00-04:00,ATL,0.0,0.0,0.0,15.568700,36.000000,100.0,100.0,13.85,10.278877,989.121582,53.641556,1025.199951,ATL
765,Departure,2025-10-26 11:00:00-04:00,ATL,0.0,0.0,0.0,15.568700,36.000000,100.0,100.0,13.85,10.278877,989.121582,53.641556,1025.199951,ATL
766,Departure,2025-10-26 11:00:00-04:00,ATL,0.0,0.0,0.0,15.568700,36.000000,100.0,100.0,13.85,10.278877,989.121582,53.641556,1025.199951,ATL


Now we have a merged dataset that combines both flight and weather data. This dataset can be used for further analysis, such as predicting flight delays based on weather conditions.

We need the counts of departure and arrival flights for a given date and airport, so we can use the following methods:

In [83]:
flights.get_count_airport_flights()

768

We can also get the hourly counts for each hour of the day using the following methods:

In [84]:
flights.get_hourly_airport_flights_count()

Unnamed: 0,date,flight_count
0,2025-10-25 19:00:00-04:00,1
1,2025-10-25 21:00:00-04:00,1
2,2025-10-25 22:00:00-04:00,3
3,2025-10-25 23:00:00-04:00,3
4,2025-10-26 00:00:00-04:00,6
5,2025-10-26 01:00:00-04:00,1
6,2025-10-26 02:00:00-04:00,4
7,2025-10-26 03:00:00-04:00,3
8,2025-10-26 05:00:00-04:00,21
9,2025-10-26 06:00:00-04:00,39


In [86]:
mergeed_df2 = pd.merge(
    merged_df,
    flights.get_hourly_airport_flights_count(),
    left_on=['date'],
    right_on=['date'],
    how='left'
)
mergeed_df2

Unnamed: 0,direction,date,queried_airport_iata,snowfall,rain,precipitation,wind_speed_10m,wind_gusts_10m,cloud_cover_low,cloud_cover,temperature_2m,apparent_temperature,surface_pressure,relative_humidity_2m,pressure_msl,queried_airport_code,flight_count
0,Arrival,2025-10-25 19:00:00-04:00,ATL,0.0,0.0,0.0,12.781267,29.160000,0.0,100.0,17.25,13.933897,987.797729,40.934200,1023.400024,ATL,1
1,Arrival,2025-10-26 00:00:00-04:00,ATL,0.0,0.0,0.0,15.121070,30.599998,0.0,100.0,13.95,10.177357,988.940796,48.473717,1025.000000,ATL,6
2,Arrival,2025-10-25 23:00:00-04:00,ATL,0.0,0.0,0.0,14.830076,32.039997,0.0,100.0,14.65,10.938719,988.930481,46.656029,1024.900024,ATL,3
3,Arrival,2025-10-26 00:00:00-04:00,ATL,0.0,0.0,0.0,15.121070,30.599998,0.0,100.0,13.95,10.177357,988.940796,48.473717,1025.000000,ATL,6
4,Arrival,2025-10-26 00:00:00-04:00,ATL,0.0,0.0,0.0,15.121070,30.599998,0.0,100.0,13.95,10.177357,988.940796,48.473717,1025.000000,ATL,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
763,Departure,2025-10-26 11:00:00-04:00,ATL,0.0,0.0,0.0,15.568700,36.000000,100.0,100.0,13.85,10.278877,989.121582,53.641556,1025.199951,ATL,102
764,Departure,2025-10-26 11:00:00-04:00,ATL,0.0,0.0,0.0,15.568700,36.000000,100.0,100.0,13.85,10.278877,989.121582,53.641556,1025.199951,ATL,102
765,Departure,2025-10-26 11:00:00-04:00,ATL,0.0,0.0,0.0,15.568700,36.000000,100.0,100.0,13.85,10.278877,989.121582,53.641556,1025.199951,ATL,102
766,Departure,2025-10-26 11:00:00-04:00,ATL,0.0,0.0,0.0,15.568700,36.000000,100.0,100.0,13.85,10.278877,989.121582,53.641556,1025.199951,ATL,102


This concludes the data retrieval process using OpenMeteo and AeroDataBox APIs. The retrieved weather and flight data can now be used for further analysis and modeling to predict flight delays based on various factors.