Skip to content

Web scraping tool for collecting FAA Temporary Flight Restrictions data

License

Notifications You must be signed in to change notification settings

michaelkotrous/tfr-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FAA TFR Data

The code in this project compiles a dataset of temporary flight restrictions issued by the FAA and their corresponding shapefiles.

The scripts are written to run on an AWS EC2 Linux instance, so you may need to fork in order to tweak according to your operating system or production environment. I wrote a blog post outlining the process of setting up an EC2 instance with R to run these scripts.

The FAA lists its active TFRs online. Unfortunately no online repository exists for expired TFRs, stymieing analysis of temporary flight restrictions. The data on active TFRs are conveniently formatted in HTML tables, making the data conducive to using the readHTMLTable function provided by R's XML package.

Download my current TFR data, collected since Sept. 20, 2017.

MIT License

You are free to copy and modify this code as you see fit to collect your own TFR data. If you find a way to improve the dataset, then please submit a pull request!

Dependencies

Preparing the Scripts and Files to Avoid Errors

  1. The shell script shapefile-download.sh is written specifically to run on an EC2 instance, so directory paths (i.e., /home/ec2-user/...) may need to be modified to match the working directory in your production environment.
  2. Edit line 8 of FAA-TFR-scrapper.R to set the working directory to match the path to your installation of the tfr-data repo. Paths may need to be edited on lines 110, 131, and 156 of that file.
  3. For the first collection of TFR data, you'll need to copy tfrData-export-memory-head.csv to a new file named tfrData-export-memory.csv in the working directory. Additional scrapes will append new TFRs as rows to the end of the file, so you won't want to overwrite it!

Running the Scripts

The scripts can be run manually to compile the active list of TFRs, download their shapefiles, and append them to the records collected in your dataset thus far.

R CMD BATCH /path/to/scripts/FAA-TFR-scraper.R && /path/to/scripts/shapefile-download.sh

These scripts are designed to be run automatically with no need to oversee their operation. For instance, I have these scripts running on an AWS EC2 instance, with a cron task that runs every hour like so:

0 * * * * R CMD BATCH /home/ec2-user/scripts/FAA-TFR-scraper.R && /home/ec2-user/scripts/shapefile-download.sh

I wrote a blog post outlining the process of setting up an EC2 instance to execute the web scrapes and store your data in an S3 bucket. If you do not have an AWS account currently, you can sign up and run an EC2 instance under the "free tier" for 12 months. You can collect TFR data for a year free of charge!

Why do I need the Shell script?

The FAA posts shapefiles for most TFRs that contain valuable information about the landarea and altitudes affected that R's XML package cannot scrape.

Running the shell script after the R script will loop over the URLs where each active TFRs shapefiles are stored and download them using wget. The script in this repo places the shapefiles into directory /tmp/ and then unpackage the archives in a specified directory.

Download TFR Data

You can download my copy of the TFR data and shapefiles at this webpage on my personal website. My dataset has TFR information and corresponding shapefiles dating back to September 20, 2017.

You can use the files as the starting point for your own scraper or for analysis.

Load the Data into R

tfrData <- read.table("path/to/tfrData-export-memory.csv", head=T, sep=",", na.strings = "")

About

Web scraping tool for collecting FAA Temporary Flight Restrictions data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published