Skip to content

isaacke9/TAMIDS20

 
 

Repository files navigation

TAMIDS Data Science Competition - Spring 2020

[Texas A&M Institute of Data Science]
Competition Details

Team Big Data Energy


*Members: Isaac Ke '21 and Johnathan Lo '21
*Advisor: Dr. Huiyan Sang
*February 12 - April 22, 2020

Reliable transportation supports a strong economy by facilitating the rapid and timely exchange of goods and services and bolstering tourism revenue. In 2018, the United States transportation industry accounted for $648 billion, which was 3.16% of the GDP. Worldwide, the aviation industry contributes $2.7 trillion (3.6%) of the world’s GDP. In fact, it is projected that by 2036, global air transportation will support $5.7 trillion of the global economy. A key metric for evaluating the efficiency of airline industry production is flight delay time. In 2018, flight delays led to an economic loss of 31.2 billion dollars. For individual companies, delays can influence consumer choice, and for the industry itself, unmitigated delays can impel consumers to switch to substitute services such as automotive or rail-based transport. Therefore, a major goal of this project was to analyze flight delays and diagnose areas for improvement. We intended to create models using the provided datasets as well as other publicly available data that can accurately predict future delays. In doing so, our hope was to uncover significant and controllable factors that can help guide airline companies to reduce flight delays.


Ever since the first commercial airline flight was flown in 1914, the air transportation industry has played an integral part in both boosting the global economy as well as connecting people from all over the world. On occasion, flights can be delayed from their scheduled departure or arrival times, and this results in lost revenue and irritated customers. The goal of our analysis was to not only model flight delays but also create predictive models to forecast future late arrivals, specifically for the third quarter of 2019. We dove into this project by first gathering and tidying up our data, performing exploratory data analysis, then fitting and assessing unique predictive models (of which included MLE of conditional distributions, OLS, logistic regression, and a mixed additive model involving a bootstrap time-series, ARIMA (auto-regressive integrated moving average) model, and multiple weighted regression on the time-series residuals. By gathering substantive knowledge about airlines, we then interpreted and applied our results to creating a model. Interpretability and thoroughness were the driving forces in our analysis.

Various tasks and analyses were completed using software and connections such as Microsoft Power BI, RStudio, Google Cloud Platform APIs, and the NCDC Weather API. Our report was written using LateX (specifically TeXworks).

About

Data Science Competition Spring 2020

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 88.5%
  • TeX 6.6%
  • R 4.9%