Skip to content

pranjals26/Data-Management-Project-Flight-delays

Repository files navigation

Data-Management-Project-Flight-delays

Data Cleaning and Analysis on Flight Delay & Cancellation

Tableau Deshboards

https://public.tableau.com/app/profile/pranjal.shukla

Identify and describe your dataset

For the project, our group would like to look into the Flight delays and cancellations in the United States . More specifically, the dataset presents flight data from all the domestic airlines inthe year 2015. Some of the data includes flight performance(time delays or aheads), flight origins and destinations, flight identifications. You can download the dataset by unzipping the folder which can be downloded here: (https://www.kaggle.com/datasets/usdot/flight-delays?select=flights.csv). The dataset’s size is 592.41 MB and is last updated on 2015. The dataset contains data fromover 5800k domestic flight trips completed.

Identify datasetsource

The flight delay and cancellation data was collected and published by the DOT's Bureau of Transportation Statistics. The U.S. Department of Transportation's (DOT) Bureau of Transportation Statistics tracks the on-time performance of domestic flights operated by largeair carriers. Summary information on the number of on-time, delayed, canceled, and diverted flights is published in DOT's monthly Air Travel Consumer Report and in this dataset of 2015 flight delays and cancellations.

Why is important and what appeals to you about it

While we are in the middle of the holiday season, finding a flight or airline that won’t do you bad can be tricky. What airlines are prone to delays? What airlines have the best overall performance? What airport seems to be the busiest? We hope by analyzing the performance ofairlines and flights, we can probably find some insights to help you save time and money.

As we dive deeper into the reasons for causing the delays, we can also find some insights regarding the reasons and provide recommendations for the airport and area management tobetter manage delays.Another reason for choosing this dataset is because this dataset comes in quite a good shape. There are fewer missing values and we have quite a lot factors to choose from when it comes todata analysis.

Acquire data and perform initial exploration to make sure it is suitable for dimensional modeling and analytical analysis

Yes, this dataset has 40 columns of both numerical and categorical data representing informations regarding flight origins, destinations, time, and location. Those attributes includesairtime, distance, arrival time, cancellation reason, etc. I think with all that information, we can generate a lot of interesting questions for analysis.

Describe the analytical questions you want to answer with thedata. Minimum of 3 major questions are required

Business Question 1:

In order to improve the operating efficiency at major airports, the FAA would like to develop a system of reward and punishment based on the performance of major airline carriers. What aresome measurements that they could use? And based on the result from the busiest route in USA in 2015, which airline performs the best and which airline performs the worst?

Business Question 2:

During what time period, does the air route from Washington D.C. to New York sees the most delays? What are the reason for delays? Advices that we can give to airlines to better improvetheir service.

Business Question 3:

Is there a relationship between customer's attitude on social media with performance of airlines ? Should people rely on social media platforms to choose airlines or should the airline companies use social media to monitor their brands? Describe any concerns with the data and changes you expect to overcome While taking an initial look at the data, we found that there are a lot of missing data in the cancellation reason column, in order to make sure we still have a reasonably large dataset to analyze, we might have to delete this column. Some other areas that we might have to make efforts on are providing supporting information so that they can be understood by the public. The names of airlines and airports are often timeswritten in its assigned spells or initials, which makes it difficult for us to report.

Dimensional Modeling

For our dimension table we have 8 facts, we focus more on departure delay and arrival delay. Airlines,Flights,Date and airports are the four dimension we created.

Final_Iteration Plot

About

Data Cleaning and Analysis on Flight Delay & Cancellation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages