In [1]:
import urllib
import time

# Objective

The overall purpose of the project is to be able to use traffic cammera data in order to estimate traffic volume and traffic speed. 

Before we do that, we need to build a historical training set. We identified the folloiwing datasources that we will use in order to generate the models:

* NYC Real-Time Traffic Cameras:  http://nyctmc.org/
* NYC Real-Time Traffic Speed Data: https://data.cityofnewyork.us/Transportation/Real-Time-Traffic-Speed-Data/xsat-x5sa
* Current Weather Data: http://openweathermap.org/current

The goal of this script is to build a dataset composed of real-time traffic camera images, traffic speed and weather data.

# Methodology

## Data Collection

For the purpose of this project, we will collect data for a specific road in NYC. After an initial review of the datasources, we selected the Brooklyn Queens Expressway as a good candidate as it offers good coverage and overlap of camera images, traffic speed and weather data.

In particular we will collect camera images from the following locations:

* Van Wyck Expwy @ 87 Ave: http://nyctmc.org/google_popup.php?cid=590
* Van Wyck Expwy @ Hillside Ave: http://nyctmc.org/google_popup.php?cid=587
* Van Wyck Expwy @ 91 Ave: http://nyctmc.org/google_popup.php?cid=586
* Van Wyck Expwy @ 101 Ave SB: http://nyctmc.org/google_popup.php?cid=584
* Van Wyck Expwy @ 101 Ave NB: http://nyctmc.org/google_popup.php?cid=582

We will collect camera images and speed data at 30 minute intervals for a period of 7 days.

### Camera Image URL

The following are the corresponding URLs of the actual images for each of the cameras above: 

* Van Wyck Expwy @ 87 Ave: http://207.251.86.238/cctv594.jpg
* Van Wyck Expwy @ Hillside Ave: http://207.251.86.238/cctv593.jpg
* Van Wyck Expwy @ 91 Ave: http://207.251.86.238/cctv592.jpg
* Van Wyck Expwy @ 101 Ave SB: http://207.251.86.238/cctv590.jpg
* Van Wyck Expwy @ 101 Ave NB http://207.251.86.238/cctv589.jpg

The actual image URLs were obtained by inspecting the source code of the corresponding camera page.

### Traffic Speed Data URL

The real-time traffic speed data is all contained in one file: http://207.251.86.229/nyc-links-cams/LinkSpeedQuery.txt

We will also download a copy of this file for the purpose of training the model.

###  Weather Data URL

After obtaining an API key from the OpenWeatherMap, we will use the following URL to download current weather data: http://api.openweathermap.org/data/2.5/weather?q=jamaica,us&appid=2de143494c0b295cca9337e1e96b00e0

# The Code

Let's load some meta-data about the cameras, including the URL, camera name, etc.

In [2]:
cameras= [
    {'name':'Van Wyck Expwy @ 87 Ave', 'url': 'http://207.251.86.238/cctv594.jpg','short_name':'cctv594'},
    {'name':'Van Wyck Expwy @ Hillside Ave', 'url': 'http://207.251.86.238/cctv593.jpg','short_name':'cctv593'},
    {'name':'Van Wyck Expwy @ 91 Ave', 'url': 'http://207.251.86.238/cctv592.jpg','short_name':'cctv592'},
    {'name':'Van Wyck Expwy @ 101 Ave SB', 'url': 'http://207.251.86.238/cctv590.jpg','short_name':'cctv590'},
    {'name':'Van Wyck Expwy @ 101 Ave NB', 'url': 'http://207.251.86.238/cctv589.jpg','short_name':'cctv589'}
              ]

In [3]:
speed_data_url = 'http://207.251.86.229/nyc-links-cams/LinkSpeedQuery.txt'
weather_data_url = 'http://api.openweathermap.org/data/2.5/weather?q=jamaica,us&appid=2de143494c0b295cca9337e1e96b00e0'

Let's also define the function that will actually download the data from the respective URLs

In [4]:
def get_from_url(url, output_filename):
    urllib.urlretrieve(url, output_filename)

    
def get_images(cameras, prefix):
    for camera in cameras:
        get_from_url(camera['url'], prefix+'_'+camera['short_name']+'.jpg')
    
def get_speed_data(url,prefix):
    get_from_url (url, prefix+'_speed_data.txt')

def get_weather_data(url, prefix):
    get_from_url (url, prefix+'_weather_data.json')

def log_entry(logfile, timestamp):
    logfile.write(timestamp+'\n')

Now that we have our function that gets the data from the cameras array as well as the speed data url, let's loop it so we get a sample every 30 minutes for a total of 7 days (336 samples).

In [5]:


basedir = 'camera_data'
log_filename = basedir+'/data_capture_log.txt'
samples = 336
sec_between_samples = 30 * 60

i = 0
while i<samples:
    timestamp = str(int(time.time()))
    prefix=basedir+'/'+timestamp
    get_images(cameras,prefix)
    get_speed_data(speed_data_url,prefix)
    get_weather_data(weather_data_url,prefix)
    f= open(log_filename, 'a')
    log_entry(f,timestamp)
    f.close()
    i+=1
    time.sleep(sec_between_samples)
    

