Skip to content

Build a machine-learning model based on historical data. The model should be able to predict the delay and accuracy of the prediction (within the tolerance limit). Also included in the model are those variables that can be captured and measured, prior to flight take off.

Notifications You must be signed in to change notification settings

sumedhkumarprasad/Predict-Flight-Delay

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Predict-Flight-Delay

Build a machine-learning model based on historical data. The model should be able to predict the delay and accuracy of the prediction (within the tolerance limit). Also included in the model are those variables that can be captured and measured, prior to flight take off. Predict Flight Delay

Dataset: Dataset downloaded from the Bureau of Transportation Statistics. Time-period of the dataset was between June-2017 to May-2018. Total 109 columns were there in the dataset.

Methodology: Out of 109 columns, 53 columns had more than 80% data missing, so I had removed those columns. After cleaning the data, Exploratory Data Analysis was carried out. Due to the large dataset, I had chosen the test size as 60% for the model building. I started with the Logistic regression model which gave the accuracy of train and test data as 87% and 88% respectively. As my input variables were continuous, therefore used the Gaussian naïve Bayes to increase the model accuracy.

Key Findings: Based on the carrier wise delay, Southwest has the highest delay followed by American Airlines. Friday shows a uniform delay pattern across all carriers. Cases where the origin and destination city are same (ex: Atlanta), highest delayed is shown by carrier Delta Airlines followed by American Airlines. After running different models, Naïve Bayes was giving good results compared to others. Training data-set’s Accuracy, Precision and Recall were ~95%,~99% and ~75% respectively and on testing dataset, Accuracy, Precision and Recall were ~95%,~99% and ~75% respectively.

Tools and Skills: Logistic Regression, Naïve Bayes and Python.

About

Build a machine-learning model based on historical data. The model should be able to predict the delay and accuracy of the prediction (within the tolerance limit). Also included in the model are those variables that can be captured and measured, prior to flight take off.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published