# Taxi Interactive Dashboard

Samuel Moijueh

2019-05-10

<img src="images/taxi.jpg" style="width: 500px">


# Introduction
<br>

In this Exploratory Data Analysis, I create an Interactive Taxi Service Map of Chicago and provide insights using Data Science.

The analysis will be based on the questions outlined below. I've divided the problem into four categories. 

<br>

## Step 1: Define the Problem

<ol>
    <li><b>Visualization</b>
     <ul>
         <li>Create an interactive dashboard of the taxi service trips. Any observations? <!--based on the geospatial coordinates of the pickup and dropoff points!--></li>
        </ul>
    </li>
    <br>
    <li><b>Exploratory Data Analysis</b>
    <ul>
        <li>Look for patterns in travel and commute times by distance and time of day.</li>
    <li>When are the peak hours of service? </li>
    <li>Where are the most popular dropoff locations on weekdays vs weekend?</li>
      </ul>
    <br>
 <li><b>Civic Tech</b>
   <ul> 
    <li>Is there a connection between areas having a metro line (loop system) and taxi pickup areas? This information would provide insight on how efficient public transportation is for people's commute.</li>
    <li>Are there areas or communities in the city being underserved according to supply and demand metrics?</li>
    
   </ul>
    <br>
  <li><b>Business Analytics & Business Intellegence</b>
    <ul>
        <li>What is the total revenue and market share value of the the top performing taxi companies?</li>
        <li>Can we identify any actionable insights to improve revenue?</li>
        <li>How is business affected by holidays and sporting events? Is there higher demand?</li>
</ol> 
  
 <br>

I will use <b>Python</b> as the programming language.

In [195]:
#!/usr/bin/env python

""" Basic Utils """
import warnings
from sodapy import Socrata
import json

""" Data Analysis and Visualization """
import pandas as pd
import datetime
import folium
from folium.plugins import MarkerCluster
from folium.plugins import MiniMap
from folium.plugins import Search

""" Socrata API key """
client = Socrata("data.cityofchicago.org",
                 "3GOqgqfzC6WKkttr7L28Ls8V5",
                 username="ssm87@bu.edu",
                 password="French3490#")

## Step 2: Gather the data
<br>

The Chicago Taxi Trips data is available to download at <a href="https://catalog.data.gov/dataset/taxi-trips" target="_blank">data.gov</a> courtesy of the City of Chicago. A widget view of the data is also <a href="https://data.cityofchicago.org/widgets/wrvz-psew" target="_blank">available</a>.

<br>

The data available was collected from 2013 to 2017, during which there where are over 112 million taxi trips. 

<br>

For brevity, the scope of the data will be limited to trips between October 17th, 2016 to November 30, 2016. <b>Approximately 6 weeks worth of data.</b> This is an interesting period of time. You have Halloween 2016, the 2016 World Series (Chicago's first World Series appearence in 71 years), the pre-holiday shopping event that is Black Friday, and Thanksgiving.

<br>

I will use the Socrata Open Data API (<a href="https://dev.socrata.com/foundry/data.cityofchicago.org/wrvz-psew" target="_blank">SODA</a>) to query the Chicago Taxi Trip data. SODA allows you to programmatically access a wealth of open data resources from governments, non-profits, and NGOs around the world. Further down the page is a data dictionary which explains each variable.

<br>

### SQL Queries using SODA

<br> 

I will query for the taxi rides of interest using <a href="https://dev.socrata.com/docs/queries/" blank="_blank">SoQL</a>, a variant of SQL.

The command below queries all taxi rides between Oct. 17, 2016 to November 2016. I also specify that the rows must have geospatial coordinates and the company names.

In [3]:
try:
    with open('taxi_data.json') as json_file:
        taxi_data = json.load(json_file)
except FileNotFoundError:
    # query for all taxi rides, returned as JSON from API / converted to Python list of dictionaries by sodapy
    taxi_data = client.get("wrvz-psew", where="trip_start_timestamp >='2016-10-17T20:00:00.000' \
                     AND trip_end_timestamp <= '2016-11-30T20:00:00.000' \
                     AND trip_miles > 0 \
                     AND company IS NOT NULL \
                     AND dropoff_census_tract IS NOT NULL \
                     AND pickup_census_tract IS NOT NULL \
                     AND dropoff_centroid_location IS NOT NULL \
                     AND pickup_centroid_location IS NOT NULL", limit=1000000)
    
    # write the json object to file
    with open('taxi_data.json', 'w') as outfile:  
        json.dump(taxi_data, outfile)

### Pandas and Data Wrangling

I'll use pandas to load the data into Python, and convert the time strings into DataTime objects. From here I can wrangle data fields required for the analysis.

In [200]:
# Convert to pandas DataFrame.
taxi_df = pd.DataFrame.from_records(taxi_data)

# wrangle data required for analysis
taxi_df["trip_start_timestamp"] = pd.to_datetime(taxi_df["trip_start_timestamp"])
taxi_df["trip_end_timestamp"] = pd.to_datetime(taxi_df["trip_end_timestamp"])
taxi_df["hour"] = taxi_df["trip_start_timestamp"].map(lambda x: x.hour)
taxi_df["day"] = taxi_df["trip_start_timestamp"].map(lambda x: x.weekday())

That last couple lines add a column to the table indicating the hour of day and the day of the week. I will use that information later to determine the peak hours of service and most popular dropoff locations on weekends vs weekends, respectively.

<br>

### Dropoff Location Hotspots

In the snippet below, I add a column indicating whether the dropoff location is within a 100 foot radius of the following hotspot locations.

**Travel**: O'Hare International Airport, Midway International Airport

**Commute**: Union Station, Ogilvie Transportation Center, The Loop Metro

**Tourist Attractions**: Millennium Park, Navy Pier, Magnificent Mile, The Chicago Theatre, 

**Sporting Arena**: United Center, Soldier Field, Wrigley Field (Chicago Cubs), Guaranteed Rate Field (White Sox)


<br>


In [204]:
hotspots = {
    'ORD' : (41.9742, -87.9073),
    'MDW' : (41.7868, -87.7522),
    'UNION' : (41.8787, -87.6403),
    'OGV' : (41.8830, -87.6405),
    'LOOP': (41.8806, -87.6302),
    'MLP' : (41.8826, -87.6226),
    'CTR' : (41.8853, -87.6276),
    'UTD' : (41.8807, -87.6742),
    'SDF' : (41.8623, -87.6167),
    'WGF' : (41.9484, -87.6553),
    'GRF' : (41.8299, -87.6338)
}

# function that determines if a taxi dropoff location (x,y) is within a 100 foot radius of hotspot (1000 ft radius for the LOOP)
def is_inside(x, y, hotspot_x, hotspot_y, hotspot):
    if hotspot == 'LOOP':
        radius = 1000
    else:
        radius = 100
        
    if ( (x - hotspot_x)^2 + (y - hotspot_y)^2 < radius^2 ):
        return(hotspot)
    return('No')

To take at a peak of the results. run this:

In [206]:
taxi_df.shape

#taxi_df.head()

(532609, 25)

## Pandas

The pandas dataframe shows that there are ~755,000 taxi ride services between Oct. 17, 2016 - November 2016.

In [164]:
# 532609 records
list(results_df)

['company',
 'dropoff_census_tract',
 'dropoff_centroid_latitude',
 'dropoff_centroid_location',
 'dropoff_centroid_longitude',
 'dropoff_community_area',
 'extras',
 'fare',
 'payment_type',
 'pickup_census_tract',
 'pickup_centroid_latitude',
 'pickup_centroid_location',
 'pickup_centroid_longitude',
 'pickup_community_area',
 'taxi_id',
 'tips',
 'tolls',
 'trip_end_timestamp',
 'trip_id',
 'trip_miles',
 'trip_seconds',
 'trip_start_timestamp',
 'trip_total']

## Step 3: Interactive Map

Now that the data is ready, I will use Folium to create an interactive map of the Chicago Taxi Ride Services.

In [191]:
CHICAGO_COORDINATES = (41.8781, -87.6298)

# for speed purposes
MAX_RECORDS = 100

# create empty map zoomed in on Chicago
folium_map = folium.Map(location=CHICAGO_COORDINATES, zoom_start=11, control_scale=True)

marker_cluster = MarkerCluster(name="Dropoff").add_to(folium_map)
marker_cluster = MarkerCluster(name="Pick Up").add_to(folium_map)
#circle_marker = CircleMarkers(name="Dropoff").add_to(folium_map)

minimap = MiniMap(toggle_display=True).add_to(folium_map)
#minimap.add_to(folium_map)
##plugins.Search(coordinates, search_zoom=6).add_to(folium_map)

In [185]:
# add map tiles
#tile = folium.TileLayer(tiles='OpenStreetMap', name="Color Map").add_to(folium_map)
tile = folium.TileLayer(tiles='Stamen Terrain', name="Terrain Map").add_to(folium_map)
tile = folium.TileLayer(tiles='cartodbpositron', name="Light Map").add_to(folium_map)

# TO-DO: push onto github using command line
# TO-DO: add pickup icon to pickup marker, add dropoff icon to dropoff marker. change color of pickup (red) vs dropoff marker (blue). blue taxi, red taxi
# TO-DO: look up MarkerCluster object. create FeatureGroup layers for pickup and dropoff

### Plotting Markers for each Taxi Ride

In the code below, I iterate over all the rows of the Taxi Rides DataFrame and add a marker for each row.

In [186]:
#adding marker and popup of city and crime-name
for i in range(0,MAX_RECORDS):
    lat_do = float(taxi_df['dropoff_centroid_latitude'][i])
    lon_do = float(taxi_df['dropoff_centroid_longitude'][i])
    lat_pu = float(taxi_df['pickup_centroid_latitude'][i])
    lon_pu = float(taxi_df['pickup_centroid_longitude'][i])
    duration = taxi_df['trip_seconds'][i]
    miles = taxi_df['trip_miles'][i]
    fare = taxi_df['fare'][i]
    
    folium.map.Marker(location=[lat_do,lon_do],
                      popup=folium.Popup("Trip Duration="+duration+" seconds\nTrip Distance="+miles
                      +" miles\nFare=$"+fare, max_width=450)).add_to(marker_cluster)
    
    folium.map.Marker(location=[lat_pu,lon_pu]).add_to(marker_cluster)
    
    folium.CircleMarker(location=[lat_do,lon_do],
                        radius= 4,
                        color='steelblue', fill_opacity = 0.7).add_to(folium_map)
    
    folium.CircleMarker(location=[lat_pu,lon_pu],
                        radius= 4,
                        color='firebrick', fill_opacity = 0.001).add_to(folium_map)
    
#     popup=folium.Popup(max_width=450).folium.Vega(json.load(open('vis1.json')), width=450, height=250).add_to(buoy_map)
    
#     folium.vector_layers.CircleMarker([float(results_df['dropoff_centroid_longitude'][i]),
#                    float(results_df['dropoff_centroid_latitude'][i])], radius = 1, color = "steelblue", fillOpacity = 0.001, name = 'DropOff')
    
# LayerConrol
folium.LayerControl(collapsed=False).add_to(folium_map)

<folium.map.LayerControl at 0x7ff1c1d98828>

In [202]:
folium_map