Skip to content
/ flights Public

Flight Stats from the Bureau of Transportation Statistics

Notifications You must be signed in to change notification settings

gose/flights

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Airline Flights

The United States Department of Transportation has Flight Stats available through the Bureau of Transportation Statistics.

https://transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time

The fields we use are specified in detail further below, but here is a sample record:

"FL_DATE" "OP_CARRIER" "TAIL_NUM" "OP_CARRIER_FL_NUM" "ORIGIN" "DEST" "CRS_DEP_TIME" "DEP_TIME" "DEP_DELAY" "TAXI_OUT" "TAXI_IN" "CRS_ARR_TIME" "ARR_TIME" "ARR_DELAY" "CANCELLED" "CANCELLATION_CODE" "DIVERTED" "CRS_ELAPSED_TIME" "ACTUAL_ELAPSED_TIME" "AIR_TIME" "FLIGHTS" "DISTANCE" "CARRIER_DELAY" "WEATHER_DELAY" "NAS_DELAY" "SECURITY_DELAY" "LATE_AIRCRAFT_DELAY"
2017-01-01 "AA" "N787AA" "1" "JFK" "LAX" "0800" "0831" 31.00 25.00 26.00 "1142" "1209" 27.00 0.00 "" 0.00 402.00 398.00 347.00 1.00 2475.00 27.00 0.00 0.00 0.00 0.00

Objective:

  1. Ask questions that can be answered by the data.
  2. Answer the questions in Kibana.
  3. Find surprises in the data.

For example:

  • What's the busiest airport?
  • How many flights were there in 2017?
  • What was the most popular holiday to fly?
  • What aircraft made the most flights?
  • What airport has the most delays (or the least)?

Write down other questions you have so we can answer them with Elastic.

Ingest

Getting this data into Elastic can be accomplished using:

  • Logstash
  • Beats
  • Programming Language

At its core, data is ingested via the Document APIs. These are a set of RESTful APIs that all of the methods above use to ingest data. It's recommended you use whatever tool (or language) you are most comfortable with. Logstash & Beats provide a configuration-driven approach to ingesting data, while a programming langauge will give you more flexibility at the cost of verbosity. There are tradeoffs to each approach but the choice is yours.

Though Go is not part of the official Elasticsearch Clients supported by Elastic, there is a popular Elastic Go library that wraps the REST APIs. We will be using that library to ingest data.

Data Sources

To get the data used for this exercise, select these data fields from the download form linked to above:

  1. FlightDate
  2. IATA_CODE_Reporting_Airline
  3. Tail_Number
  4. Flight_Number_Reporting_Airline
  5. Origin
  6. Dest
  7. CRSDepTime (CRS Departure Time (local time: hhmm))
  8. DepTime (Actual Departure Time (local time: hhmm))
  9. DepDelay (Difference in minutes between scheduled and actual departure time. Early departures show negative numbers.)
  10. TaxiOut (Taxi Out Time, in Minutes)
  11. TaxiIn (Taxi In Time, in Minutes)
  12. CRSArrTime (CRS Arrival Time (local time: hhmm))
  13. ArrTime (Actual Arrival Time (local time: hhmm))
  14. ArrDelay (Difference in minutes between scheduled and actual arrival time. Early arrivals show negative numbers.)
  15. Cancelled (Cancelled Flight Indicator, 1=Yes, 0=No)
  16. CancellationCode (Specifies The Reason For Cancellation: "A","Carrier", "B","Weather", "C","National Air System", "D","Security")
  17. Diverted (Diverted Flight Indicator, 1=Yes, 0=No)
  18. CRSElapsedTime (CRS Elapsed Time of Flight, in Minutes)
  19. ActualElapsedTime (Elapsed Time of Flight, in Minutes)
  20. AirTime (Flight Time, in Minutes)
  21. Flights (Number of Flights)
  22. Distance (Distance between airports (miles))
  23. CarrierDelay (Carrier Delay, in Minutes)
  24. WeatherDelay (Weather Delay, in Minutes)
  25. NASDelay (National Air System Delay, in Minutes)
  26. SecurityDelay (Security Delay, in Minutes)
  27. LateAircraftDelay (Late Aircraft Delay, in Minutes)

Then select each month & year you want data for and click download. Unzip the file and rename it to "YEAR-MONTH.csv" (e.g., 2017-02.csv). Repeat this until you have all the months you want data for.

Download the Airport data to get each Airports latitude and longitude:

https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat

  • Rename that file to airports.csv.
  • Open vim to search and replace \" with nothing using: :%s/\\"//g
  • Add the following lines to the top of that file:

0,"Williston Basin International Airport","Williston","United States","XWA","KXWA",48.2608639,-103.7511389,2353,-6,"N","America/Chicago","airport","OurAirports" 0,"Kearney Regional Airport","Kearney","United States","EAR","KEAR",35.156111,-114.559444,2131,-6,"N","America/Chicago","airport","OurAirports" 0,"Laughlin/Bullhead International Airport","Bullhead City","United States","IFP","KIFP",35.156111,-114.559444,707,-7,"N","America/Phoenix","airport","OurAirports" 0,"Stillwater Regional Airport","Stillwater","United States","SWO","KSWO",36.161111,-97.085556,1295,-6,"A","America/Chicago","airport","OurAirports" 0,"Concord Regional Airport","Concord","United States","USA","KJQF",35.387778,-80.709167,705,-5,"A","America/New_York","airport","OurAirports" 0,"Branson Airport","Branson","United States","BKG","KBBG",36.531944,-93.200556,1302,-6,"A","America/Chicago","airport","OurAirports"

Download the Airline data to get each Airline's full name:

https://raw.githubusercontent.com/jpatokal/openflights/master/data/airlines.dat

Rename that file to airlines.csv.

Put all these data files in the directory ~/data/flights.

About

Flight Stats from the Bureau of Transportation Statistics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published