To run the Jupyter Notebook in this repository on Google Colab, follow these steps:
- Open Google Colab.
- Click on
File
>Open notebook
. - Select the
GitHub
tab in the dialog that appears. - Enter the URL of this GitHub repository (https://github.com/pranthicavuturu/PredictiveAnalytics) and press
Enter
. - Colab will list the notebooks in the repository. Click on the PredictiveAnalyticsProject_Source_Code.ipynb to open.
- Once the notebook is open in Colab, you can run it by clicking on
Runtime
>Run all
to execute all cells, or run each cell individually as needed.
The notebook loads the dataset from the GitHub repository. This process will allow you the run without any local setup requirements, using the computing resources provided by Google Colab.
This project aims to predict flight departure delays for Delta Airlines operating out of Hartsfield-Jackson Atlanta International Airport. By applying predictive analytics techniques, our team seeks to enhance operational efficiency, improve passenger experience, and minimize financial losses caused by flight delays.
- Pranthi Cavuturu
- Muhammad Harris
- Hussain Kaide Johar Manasi
Flight delays are a prevalent issue in the airline industry, leading to significant financial losses and decreased customer satisfaction. This project focuses on developing a predictive model to accurately forecast flight delays, thereby assisting Delta Airlines in strategic planning and operational management.
Our goal is to utilize three predictive analytics methods to forecast flight delays:
- SARIMA: Leveraged for its effectiveness in understanding seasonal travel trends.
- LSTM: Employed to capture complex, nonlinear dependencies within historical delay data.
- Prophet Model: Applied for its proficiency in handling daily, weekly, and yearly seasonality, as well as holiday effects.
The analysis is based on approximately 429,000 records from the Bureau of Transportation Statistics, detailing flights from Hartsfield-Jackson Atlanta International Airport during 2022 and 2023. The dataset includes dates, flight details, scheduled and actual departure times, and reasons for delays.
- Handling missing values through linear interpolation.
- Conversion of date formats and extraction of temporal features.
- Feature scaling using MinMaxScaler and label encoding for categorical features.
- Analysis of seasonal trends and operational factors influencing delays.
- SARIMA
- Stationarity checked via ADF test.
- Model selection through Auto ARIMA.
- LSTM
- Architecture designed to overcome the vanishing gradient problem.
- Two-layered LSTM with dropout for regularization.
- Prophet
- Minimal preprocessing required.
- Seasonality and holiday effects explicitly modeled.
- SARIMA: Achieved a MAE of 2.64 and RMSE of 4.31 on test data.
- LSTM: Showed improvement with a MAE of 13.0 and RMSE of 26.78 on test data.
- Prophet Model: Best performance post-tuning with a MAE of 2.49 and RMSE of 3.42 on test data.
The tuned Prophet Model displayed superior predictive performance, making it the most effective for operational planning and customer service enhancements. Our results can significantly impact resource optimization and strategic planning for Delta Airlines.
To successfully run the notebook included in this repository, ensure the following Python libraries are installed:
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-Learn
- Keras
- Prophet
- PMDARIMA
- Statsmodels
The Jupyter Notebook is compatible with Python 3.7 and newer. It is recommended to use a Python environment that meets this version requirement to ensure compatibility with all dependencies.
- Pranthi Cavuturu: Worked on on the Prophet Model, exploratory data analysis, and overall project coordination.
- Muhammad Harris: Worked on the LSTM model development, data sourcing, and problem statement formulation.
- Hussain Kaide Johar Manasi: Worked on the SARIMA model implementation and data preprocessing.