# First Draft: A Project to Optimize the Commute

###This project started out with one goal: to optimize travel times for everyday commuters.

<p>While people have access to real-time traffic information (e.g. google maps), there is no application that helps you predict travel times in advance. Sure enough, people have google maps. They can also sometimes guess what time is best to leave for work in the morning based off of past experience. But what if we could somewhat accurately predict travel duration for certain routes using historical traffic data?</p>
<p>In order to make a prediction, we would require at least three variables: 1) the route (an origin/destination pair) 2) departure time and 3) estimated travel time. With this data, one could build a mulitple linear regression model where departure time is the input variable and predicted travel duration is the response variable.</p>
<p>Unfortunately, historical traffic data is unavailable to the open source community. Furthermore, Maps APIs such as Google typically have quota limits for the number of routes that one can query within a given time period. Given such constraints, I have defined a small list of residential neighborhoods (listed below) and work neighborhoods (also listed below) to act as origins and destinations--depending on the time of day--for the purposes of this project</p>
<p>I have set up two sets of python scripts which collect information every fifteen minutes in the morning for commuters traveling to work and every fifteen minutes in the afternoonoon for people traveling home. For instance, every 15 minutes between the hours of 4am and 11am, I have a script that queries the google maps API for public transit travel duration from each residential location to each work location. Every 15 minutes between the hours of 2pm and 9pm, I have a script that queries the google maps API for public transit travel duration from each work location to each residential location. Since Google's APIs do not provide (free) real-time traffic data, I had to create separate scripts to query Microsoft's Bing maps, which offers free real-time traffic information. All information obtained from the Google Maps and Bing Maps APIs is stored in a MySQL database.</p>
<p>Unfortunately, most of my time has been spent building the scripts and infrastructure to collect data. However, as information flows in, I should be able to start building the model.</p>

In [None]:
#Below are the data points and display names for... 
#origins/destinations in residential and work locations
pacific_heights = "Pacific Heights, San Francisco, CA"
outer_richmond = "Outer Richmond, San+Francisco, CA"
outer_sunset = "Outer Sunset, San Francisco, CA"
mission_district = "Mission District, San Francisco, CA"
noe_valley = "Noe Valley, San+Francisco, CA"
berkeley = "Berkeley, CA"
oakland = "Oakland, CA"
financial_district = "Financial District, San Francisco, CA"
mountain_view = "Mountain View, CA"
residential_neighborhoods = [russian_hill, north_beach, pacific_heights, outer_richmond,\
                            outer_sunset, mission_district, noe_valley,\
                            oakland, berkeley]
work_neighborhoods = [oakland, financial_district, mountain_view]
residential_coordinates = ['37.8010963,-122.4195558', '37.8060532,-122.4103311',\
                        '37.7925153,-122.4382307', '37.777677,-122.49531',\
                         '37.755445,-122.494069', '37.7598648,-122.4147977',\
                            '37.7502378,-122.4337029', '37.8043637,-122.2711137',\
                             '37.8715926,-122.272747']
work_coordinates = ['37.8043637,-122.2711137', '37.7945742,-122.3999445',\
                    '37.3860517,-122.0838511']

In [None]:
#Below is the cron code used to automate the process of querying...
#the Google Maps and Bing Maps APIs in the morning and afternoon
###morning queries
0,15,30,45 11,12,13,14,15,16,17 * * * python /home/ec2-user/DAT_SF_13_homework/final_project/iter_gmaps_morn.py
0,15,30,45 11,12,13,14,15,16,17 * * * python /home/ec2-user/DAT_SF_13_homework/final_project/iter_mmaps_morn.py
0 18 * * * python /home/ec2-user/DAT_SF_13_homework/final_project/iter_gmaps_morn.py
0 18 * * * python /home/ec2-user/DAT_SF_13_homework/final_project/iter_mmaps_morn.py
###afternoon queries
0,15,30,45 21,22,23,0,1,2,3 * * * python /home/ec2-user/DAT_SF_13_homework/final_project/iter_gmaps_morn.py
0,15,30,45 21,22,23,0,1,2,3 * * * python /home/ec2-user/DAT_SF_13_homework/final_project/iter_mmaps_morn.py
0 4 * * * python /home/ec2-user/DAT_SF_13_homework/final_project/iter_gmaps_morn.py
0 4 * * * python /home/ec2-user/DAT_SF_13_homework/final_project/iter_mmaps_morn.py

In [None]:
#Below is the script used to query Bing Maps for driving time information in the morning.
#Please note: the script will only run on the server and will not work in this ipython Notebook

#!/Users/gpnaifeh/anaconda/bin/python2.7
import json
import urllib
import urllib2
#import pandas as pd
import numpy as np
from numpy import division
import MySQLdb as mdb
from datetime import datetime
import MySQL_data_file as MySQL_data
import query_data_file

#Neighborhoods to be used
pacific_heights = "Pacific Heights, San Francisco, CA"
outer_richmond = "Outer Richmond, San+Francisco, CA"
outer_sunset = "Outer Sunset, San Francisco, CA"
mission_district = "Mission District, San Francisco, CA"
noe_valley = "Noe Valley, San+Francisco, CA"
berkeley = "Berkeley, CA"
oakland = "Oakland, CA"
financial_district = "Financial District, San Francisco, CA"
mountain_view = "Mountain View, CA"
#russian_hill = "Russian Hill, San Francisco, CA"
#north_beach = "North Beach, San Francisco, CA"

residential_neighborhoods = [russian_hill, north_beach, pacific_heights, outer_richmond,\
                            outer_sunset, mission_district, noe_valley,\
                            oakland, berkeley]
work_neighborhoods = [oakland, financial_district, mountain_view]
residential_coordinates = ['37.8010963,-122.4195558', '37.8060532,-122.4103311',\
                        '37.7925153,-122.4382307', '37.777677,-122.49531',\
                         '37.755445,-122.494069', '37.7598648,-122.4147977',\
                            '37.7502378,-122.4337029', '37.8043637,-122.2711137',\
                             '37.8715926,-122.272747']
work_coordinates = ['37.8043637,-122.2711137', '37.7945742,-122.3999445',\
                    '37.3860517,-122.0838511']


#connect to database and execute query
#does not return anything
def query_db(command):
    con = mdb.connect(MySQL_data.my_sql_host, MySQL_data.my_sql_user,\
                        MySQL_data.my_sql_passwd,\
                        MySQL_data.my_sql_database)
    cur = con.cursor()
    cur.execute(command)
    con.commit()
    con.close()


#query the Bing maps API wiht a given origin and destination
#returns a json object
def queryMmaps(query_origin, query_destination, mmaps_api_key):
    this_url = """http://dev.virtualearth.net/REST/v1/Routes?\
wayPoint.1={}&\
wayPoint.2={}&\
optimize=timeWithTraffic&\
key={}""".\
format(query_origin, query_destination, query_data_file.mmaps_api_key)
    mmaps_query = urllib2.urlopen(this_url)
    query_result = json.loads(mmaps_query.read())
    return query_result


#function that takes the json object and returns an
#np array to be entered into the database
def createEntry(query_result, query_time, query_origin, \
                    query_destination, travel_mode):
    if 'statusDescription' in query_result:
        if query_result['statusDescription'] != 'OK':
            print "There is an issue with the high level status description.\
                    It is currently listed as {}".format(status_description)
    travel_duration = "NULL"
    travel_duration_traffic = "NULL"
    travel_distance = "NULL"
    traffic_congestion = "NULL"
    try:
        resource_sets_dict = query_result['resourceSets'][0]#use '0' as an index because the dictionary object is embedded in a list
        resources_dict = resource_sets_dict['resources'][0]#use '0' as an index because the dictionary object is embedded in a list
        if 'travelDuration' in resources_dict:
            travel_duration = resources_dict['travelDuration']
        if 'travelDurationTraffic' in resources_dict:
            travel_duration_traffic = resources_dict['travelDurationTraffic']
        if 'travelDistance' in resources_dict:
            travel_distance = resources_dict['travelDistance']
        if 'trafficCongestion' in resources_dict:
            traffic_congestion = resources_dict['trafficCongestion']
    except:
        print "Exception triggered when trying to query 'resources_dict' object"
    entry_array = np.array([query_time, query_origin,\
                            query_destination, travel_mode, travel_duration,\
                            travel_duration_traffic, travel_distance,\
                            traffic_congestion])
    return entry_array


def saveToDatabase(array_of_entries):
    for array in array_of_entries:
        query_db("""INSERT INTO mmaps_data_local(datetime,origins,destinations,travel_mode,duration,duration_traffic,distance,congestion)
                    VALUES ('{}','{}','{}','{}',{},{},{},'{}')"""\
                    .format(array[0],array[1],array[2],array[3], array[4],array[5],array[6],array[7]))


def run_trip (start_neighborhoods, start_coordinates,\
                end_neighborhoods, end_coordinates):
    array_of_entries = np.array([])
    for i in np.arange(len(start_neighborhoods)):
        for n in np.arange(len(end_neighborhoods)):
            if start_neighborhoods[i] != end_neighborhoods[n]:
                query_origin = start_coordinates[i]
                query_destination = end_coordinates[n]
                mmaps_api_key = query_data_file.mmaps_api_key
                query_time = datetime.now().isoformat(' ')
                travel_mode = "driving"
                query_result = queryMmaps(query_origin, query_destination,\
                                            mmaps_api_key)
                entry_array = createEntry(query_result, query_time,\
                                            start_neighborhoods[i],\
                                            end_neighborhoods[n],\
                                            travel_mode)
                array_of_entries = np.append(array_of_entries,\
                                                entry_array, axis=0)
    number_of_columns = 8
    number_of_rows = (len(array_of_entries))/number_of_columns
    array_of_entries = np.reshape(array_of_entries,\
                                    (number_of_rows, number_of_columns))
    saveToDatabase(array_of_entries)


run_trip(residential_neighborhoods, residential_coordinates, work_neighborhoods, work_coordinates)

In [None]:
#Below is the script used to query google maps for transit data in the morning
#Please note: the script will only run on the server and will not work in this ipython Notebook

#!/Users/gpnaifeh/anaconda/bin/python2.7

import json
import urllib2
#import pandas as pd
import numpy as np
from numpy import division
import MySQLdb as mdb
from datetime import datetime
import MySQL_data_file as MySQL_data
import query_data_file

#districts
russian_hill = "Russian+Hill+San+Francisco+CA"
north_beach = "North+Beach+San+Francisco+CA"
pacific_heights = "Pacific+Heights+San+Francisco+CA"
outer_richmond = "Outer+Richmond+San+Francisco+CA"
outer_sunset = "Outer+Sunset+San+Francisco+CA"
mission_district = "Mission+District+San+Francisco+CA"
noe_valley = "Noe+Valley+San+Francisco+CA"

oakland = "Oakland+CA"
berkeley = "Berkeley+CA"

soma_district = "SoMA+San+Francisco+CA"
financial_district = "Financial+District+San+Francisco+CA"
mountain_view = "Mountain+View+CA"

def add_districts(list_of_districts):
    district_list_string = ''
    for x in list_of_districts:
        district_list_string = district_list_string + x + "|"
    return district_list_string

current_origins = add_districts([add_districts([russian_hill, north_beach,\
                            pacific_heights, outer_richmond,\
                            outer_sunset, mission_district, noe_valley,\
                            oakland, berkeley])
current_destinations = add_districts([oakland, financial_district, mountain_view])

travel_mode = "transit"

gmaps_query = urllib2.urlopen("""
https://maps.googleapis.com/maps/api/\
distancematrix/json?\
origins={}&\
destinations={}&\
mode={}&\
key={}&\
departure_time=now""".\
format(current_origins, current_destinations, travel_mode, query_data_file.gmaps_api_key))

query_result = json.loads(gmaps_query.read())

array_of_entries = np.array([])
query_time = datetime.now().isoformat(' ')
query_origins = np.array(query_result['origin_addresses'])
query_destinations = np.array(query_result['destination_addresses'])
#iterate through and print all trips
if query_result['status'] == 'OK':
    query_rows = query_result['rows']
    for i in np.arange(len(query_rows)):
        query_elements = query_rows[i]['elements']
        for n in np.arange(len(query_elements)):
            if query_origins[i] != query_destinations[n]:
                try:
                    #print "Trip from {} to {}".format(query_origins[i],query_destinations[n])
                    if 'duration' in query_elements[n]:
                        this_trip_duration = query_elements[n]['duration']['value']
                    else:
                        this_trip_duration = "NULL"
                    if 'distance' in query_elements[n]:
                        this_trip_distance = query_elements[n]['distance']['value']
                    else:
                        this_trip_distance = "NULL"
                    if 'fare' in query_elements[n]:
                        this_trip_fare = query_elements[n]['fare']['value']
                    else:
                        this_trip_fare = "NULL"
                    #print "Duration:", this_trip_duration
                    #print "Distance:", this_trip_distance
                    #print "Fare:", this_trip_fare
                except:
                    #print error?
                    print "Exception triggered at the element level"
                    if (query_elements[n]['status'] != "OK"):
                        print "Element status is: {}".format(query_elements[n]['status'])
                entry_array = np.array([query_time, query_origins[i],\
                                        query_destinations[n], travel_mode,
                                        this_trip_duration, this_trip_distance,\
                                        this_trip_fare])
                array_of_entries = np.append(array_of_entries, entry_array, axis=0)
else:
    print 'The status of the query was not listed as "Ok"'
    print 'The status of the query was listed as: {}'.format(query_result['status'])
#print datetime.now().isoformat(' ')
array_of_entries = np.reshape(array_of_entries, (len(array_of_entries)/7,7))

#connect to database
def query_db(command):
    con = mdb.connect(MySQL_data.my_sql_host, MySQL_data.my_sql_user, \
                        MySQL_data.my_sql_passwd,\
                        MySQL_data.my_sql_database)
    cur = con.cursor()
    cur.execute(command)
    con.commit()
    con.close()

for array in array_of_entries:
    query_db("""INSERT INTO gmaps_data_local\
                (datetime,origins,destinations,travel_mode,duration,distance,fare)
                VALUES ('{}','{}','{}','{}',{},{},{})"""\
                .format(array[0],array[1],array[2],array[3], array[4],array[5],array[6]))