Skip to content

Midterm project Lighthouse Labs - Predicting flight delays project using machine learning algorithms in Python

Notifications You must be signed in to change notification settings

hapl/predict-flight-delays

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predicting Flight Delays Project

This project was completed by:

Process

Data review and extraction in PostgreSQL DB for Flights

  • Run multiple queries in PostgreSQL to explore the data structure and extract samples.
  • Extracted 100K entries using the order random to reduce bias.
  • For airports, we pulled the geolocations of each airport on the Flights database using the web page OurAirports.

Weather data extraction

Extracting information for weather using the Meteostat library. This library has historical weather information based on information extracted from different well know weather datasources such as NOAA and Environment Canada. For more information, go to Meteostat website.

Data cleaning and Feature Engineering

  • After extracting the information from the database, we review the data quality, and clean/transform null values to get better accuracy with the models.
  • The latitude and longitude were added to the original data.
  • Label encoding was used to transform the carrier code.

For more information on this topic, go to the exploratory analysis notebook

Model Evaluation and Predictions

  • Tested different regression models to find the one that gave us the best results.
  • We did hyperparameter tuning to optimize the models.
  • The models tested were:
    • Linear Regression, including Lasso and Ridge
    • XGBoost Regressor
    • Tested Random Forest Classifier (after transforming delay time as labels).
  • Created pickles to save our models and avoid running every time we wanted to evaluate results; those can be found here

For more details on this topic, go to the modeling notebook

Results

The model evaluation results are located here

Challenges

  • Running tuning in some models was CPU intensive.
  • Weather information was limited, but we found the Meteostat API that helped us achieve what we needed.