### Static heatmap

## basic idea

Our basic idea is to construct forms could reflect features of dataset in different level: overall, county, state and city. So my gonna use three kinds of image to achieve the target:

1. Heathmap: This kind of image don't have distinctive boundary, which means it could work better to reflect overall trend.
2. Chropleth map: This kind of image have distinctive boundary, which means it could clearly telled the differences between counties or states.
3. Bubble map: This kind of image could clearly reflect the positions of certain points in a map and distinguish the scale of data in the certain points, which means it have a better effect on visualization of major cities and other smaller cities.

## data preprocess

In [4]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from folium.plugins import HeatMap
import folium
import datetime
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline


In [5]:
records=pd.read_csv('random_sample_data_MQ.csv',iterator=True,chunksize=10**5,low_memory=False)
mylist=[]
for chunk in records:
    mylist.append(chunk[['Type','StartTime(UTC)','City','LocalTimeZone','StartPoint_Lat','StartPoint_Lng','ZipCode','County','State']])
records=pd.concat(mylist)

# Show the time correctly.
records['StartTime(UTC)']=pd.to_datetime(records['StartTime(UTC)'])
records=records[records['LocalTimeZone'].notnull()]
dic={'EDT':datetime.timedelta(hours=-4), 'EST':datetime.timedelta(hours=-5),
     'CDT':datetime.timedelta(hours=-5), 'CST':datetime.timedelta(hours=-6), 
     'MDT':datetime.timedelta(hours=-6), 'MST':datetime.timedelta(hours=-7),
     'PDT':datetime.timedelta(hours=-7), 'PST':datetime.timedelta(hours=-8)}
records['StartTime(UTC)']=records['StartTime(UTC)']+records['LocalTimeZone'].map(lambda x:dic[x])
start=datetime.datetime(2016,2,1)
records=records[(records['StartTime(UTC)']>=start)]

# Transfer the form of zipcode.
records['ZipCode']=records['ZipCode'].map(lambda x:str(x).split('-')[0] if '-' in str(x) else x)

## heatmap

The tool to complete the work: folium which could provide heatmap function based on html.

Due to the dataset is too huge to directly show in the map. We need to use appropriate granularity to transfer the dataset. I think zipcode is a proper form.

In [12]:
def heatmap(matrix,max_val=None):
    result = folium.Map(location=np.array(matrix)[:,:2].mean(axis=0).tolist(),zoom_start = 4)
    HeatMap(matrix,min_opacity=0.45,max_val=max_val,radius=13,blur=15,max_zoom=2).add_to(result)
    return result

def heatmap_weighted(df,count=1):
    zip_pos=df.groupby('ZipCode')['StartPoint_Lat','StartPoint_Lng'].mean()
    zip_count=df.groupby('ZipCode')['Type'].count()/count
    zip_pos['count']=zip_count
    matrix=zip_pos.values.tolist()
    max_val=zip_count.max()
    return heatmap(matrix,max_val)

In [13]:
heatmap_weighted(records)

### Heatmap of different types.

In [14]:
heatmap_weighted(records[records['Type']=='Event'])

In [10]:
heatmap_weighted(records[records['Type']=='Congestion/Flow'])

In [10]:
heatmap_weighted(records[records['Type']=='Incident/accident'])

In [11]:
heatmap_weighted(records[records['Type']=='Construction'])

The overall map and the map of different types all reflect that events happen frequently in the East and West Coasts. It could be the influence of cities' scale and populations based on preliminary evaluation.

### weekdays and weekends

To avoid the times' influence, we need divid the data with relavant times of days.

In [15]:
weekday_if=records['StartTime(UTC)'].map(lambda x:x.weekday()) <5
weekday=records.loc[weekday_if]
weekend=records.loc[weekday_if==False]

In [16]:
heatmap_weighted(weekday,5)

In [18]:
heatmap_weighted(weekend,5)

There shows no apprent differences.

## chropleth map

The tool to complete the work: plotly: a online tool need to regist first.

In [48]:
# sign in
import plotly as py
py.tools.set_credentials_file(username='zejian', api_key='7kKMolQDVR14vWTSlMEN')

In [50]:
import plotly.plotly as py
import plotly.figure_factory as ff
import re

Define a tool to generate color sequence.

In [35]:
def color_generator(n):
    start=245+10*np.random.rand(1,3)
    end=50*np.random.rand(1,3)
    end[:,np.random.randint(0,3)]+=50
    dis=(end-start)/n
    return ['rgb('+str(np.round((start+i*dis),2).tolist())[2:-2]+')' for i in range(0,n)]
color_generator(5)

['rgb(249.16, 249.65, 253.46)',
 'rgb(218.93, 202.56, 207.55)',
 'rgb(188.7, 155.47, 161.65)',
 'rgb(158.46, 108.37, 115.75)',
 'rgb(128.23, 61.28, 69.85)']

To avoid the influence of area, we need to divide the data with relevant area of counties or states.

### county

In [39]:
def county_chropleth(df,title):
    count=df.groupby(['County','State'])['Type'].count()
    count=count.reset_index(inplace=False)
    area=pd.read_excel('LND01.xls')[['Areaname','LND010200D']]
    area.columns=['county/state','area']
    area['County']=area['county/state'].map(lambda x:x.split(',')[0].strip())
    area['State']=area['county/state'].map(lambda x:x.split(',')[1].strip() if len(x.split(','))>1 else x)
    count=pd.merge(count,area,how='inner',on=['County','State'])
    count['average']=count['Type']/count['area']
    df_sample = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/laucnty16.csv').ix[:,1:4]
    df_sample['County']=df_sample['County Name/State Abbreviation'].map(lambda x:re.split('[ ,]',x)[0].strip())
    df_sample['State FIPS Code'] = df_sample['State FIPS Code'].apply(lambda x: str(x).zfill(2))
    df_sample['County FIPS Code'] = df_sample['County FIPS Code'].apply(lambda x: str(x).zfill(3))
    df_sample['FIPS'] = df_sample['State FIPS Code'] + df_sample['County FIPS Code']
    df_sample['State']=df_sample['County Name/State Abbreviation'].map(lambda x:re.split('[ ,]',x)[-1].strip())
    df_sample=pd.merge(df_sample,count,how='left',on=['County','State'])
    df_sample.fillna(0)
    colorscale=color_generator(12)
    fips = df_sample['FIPS'].tolist()
    values = df_sample['average'].tolist()
    endpts = list(np.linspace(0,sorted(count['average'])[-int(len(count)/5)], len(colorscale) - 1))
    fig = ff.create_choropleth( fips=fips, values=values, scope=['usa'],binning_endpoints=endpts, colorscale=colorscale,show_state_data=False,show_hover=True,\
                               centroid_marker={'opacity': 0},asp=2.9, title=title)
    return fig
    

In [51]:
fig=county_chropleth(records,'frequency of all types in counties')
py.iplot(fig, filename='county_choropleth_overall')

High five! You successfully sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~zejian/0 or inside your plot.ly account where it is named 'county_choropleth_overall'


#### different types.

In [43]:
kinds=records.Type.unique()

In [83]:
fig=county_chropleth(records[records.Type==kinds[0]],'frequency of '+kinds[0]+' in counties')
py.iplot(fig, filename='county_choropleth_overall')

High five! You successfully sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~zejian/0 or inside your plot.ly account where it is named 'county_choropleth_overall'


In [53]:
fig=county_chropleth(records[records.Type==kinds[1]],'frequency of '+kinds[1]+' in counties')
py.iplot(fig, filename='county_choropleth_overall')

High five! You successfully sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~zejian/0 or inside your plot.ly account where it is named 'county_choropleth_overall'


In [82]:
fig=county_chropleth(records[records.Type==kinds[2]],'frequency of '+kinds[0]+' in counties')
py.iplot(fig, filename='county_choropleth_overall')

High five! You successfully sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~zejian/0 or inside your plot.ly account where it is named 'county_choropleth_overall'


In [55]:
fig=county_chropleth(records[records.Type==kinds[3]],'frequency of '+kinds[0]+' in counties')
py.iplot(fig, filename='county_choropleth_overall')

High five! You successfully sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~zejian/0 or inside your plot.ly account where it is named 'county_choropleth_overall'


### state

In [61]:
import plotly.plotly as py
import pandas as pd


df=records.groupby('State')['Type'].count()
df=df.reset_index(inplace=False)
temp=pd.merge(pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2011_us_ag_exports.csv').ix[:,:2],
              pd.read_csv('https://raw.githubusercontent.com/jakevdp/PythonDataScienceHandbook/master/notebooks/data/state-areas.csv'),
              how='left',on='state')
temp.ix[4,2]=163707

In [63]:
import plotly.plotly as py
import pandas as pd

def state_chropleth(df,title):
    df=records.groupby('State')['Type'].count()
    df=df.reset_index(inplace=False)
    temp=pd.merge(pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2011_us_ag_exports.csv').ix[:,:2],
                  pd.read_csv('https://raw.githubusercontent.com/jakevdp/PythonDataScienceHandbook/master/notebooks/data/state-areas.csv'),
                  how='left',on='state')
    temp.iloc[4,2]=163707
    temp=temp.rename(columns={'code':'State','state':'whole name'})

    df=pd.merge(df,temp,how='right',on='State')
    df.fillna(0)
    df['area (sq. mi)']=df['Type']/df['area (sq. mi)']

    for col in df.columns:
        df[col] = df[col].astype(str)

    scl =[list(a) for a in (zip([0.1*i for i in range(11)],color_generator(11)))]
    df['text'] = df['whole name'] + '<br>' +\
            'overall times: '+df['Type']+'<br>average times for area(sq.mi):'+df['area (sq. mi)']
    data = [ dict(
            type='choropleth',
            colorscale = scl,
            autocolorscale = False,
            locations = df['State'],
            z = df['area (sq. mi)'].astype(float),
            locationmode = 'USA-states',
            text = df['text'],
            marker = dict(
                line = dict (
                    color = 'rgb(255,255,255)',
                    width = 2
                ) ),
            colorbar = dict(
                title = "times/(sq.mi)")
            ) ]
    layout = dict(
            title = title,
            geo = dict(
                scope='usa',
                projection=dict( type='albers usa' ),
                showlakes = True,
                lakecolor = 'rgb(255, 255, 255)'),
                 )
    fig = dict( data=data, layout=layout )
    return fig
fig=state_chropleth(records,'frequency of overall in state')
py.iplot( fig, filename='d3-cloropleth-map' )

#### different type

In [66]:
fig=state_chropleth(records[records.Type==kinds[0]],'frequency of '+kinds[0]+' in state')
py.iplot( fig, filename='d3-cloropleth-map' )

In [67]:
fig=state_chropleth(records[records.Type==kinds[1]],'frequency of '+kinds[1]+' in state')
py.iplot( fig, filename='d3-cloropleth-map' )

In [68]:
fig=state_chropleth(records[records.Type==kinds[2]],'frequency of '+kinds[2]+' in state')
py.iplot( fig, filename='d3-cloropleth-map' )

In [69]:
fig=state_chropleth(records[records.Type==kinds[3]],'frequency of '+kinds[3]+' in state')
py.iplot( fig, filename='d3-cloropleth-map' )

## bubble map

In [73]:
   
def bubble_map(df,title):
    df=df.groupby(['City','County','State'])['StartPoint_Lng','StartPoint_Lat','Type'].agg({'StartPoint_Lng':'mean','StartPoint_Lat':'mean','Type':'count'})
    df.reset_index(inplace=True)
    df=df.sort_values(by='Type',ascending=False)[0:4000]
    df.index=list(range(len(df)))
    df=df.rename(columns={'StartPoint_Lng':'lon','StartPoint_Lat':'lat'})

    df['text'] = df['City']+" in "+df['County']+'('+df['State']+')' + '<br>times:' + (df['Type']).astype(str)
    limits = [(0,5),(6,20),(21,50),(51,100),(101,len(df))]
    colors = ["rgb(0,116,217)","rgb(255,65,54)","rgb(133,20,75)","rgb(255,133,27)","lightgrey"]
    cities = []
    scale = df.ix[0,'Type']/1500

    for i in range(len(limits)):
        lim = limits[i]
        df_sub = df[lim[0]:lim[1]]
        city = dict(
            type = 'scattergeo',
            locationmode = 'USA-states',
            lon = df_sub['lon'],
            lat = df_sub['lat'],
            text = df_sub['text'],
            marker = dict(
                size = df_sub['Type']/scale,
                color = colors[i],
                line = dict(width=0.5, color='rgb(40,40,40)'),
                sizemode = 'area'
            ),
            name = '{0} - {1}'.format(lim[0],lim[1]) )
        cities.append(city)

    layout = dict(
            title = title+'<br>(Click legend to toggle traces)',
            showlegend = True,
            geo = dict(
                scope='usa',
                projection=dict( type='albers usa' ),
                showland = True,
                landcolor = 'rgb(217, 217, 217)',
                subunitwidth=1,
                countrywidth=1,
                subunitcolor="rgb(255, 255, 255)",
                countrycolor="rgb(255, 255, 255)"
            ),
        )
    fig = dict( data=cities, layout=layout )
    return fig

In [74]:
df=records
fig=bubble_map(df,'overall distribution of cities')
py.iplot( fig, validate=False, filename='d3-bubble-map-populations' )

#### different type

In [76]:
df=records[records.Type==kinds[0]]
fig=bubble_map(df,kinds[0]+' distribution of cities')
py.iplot( fig, validate=False, filename='d3-bubble-map-populations' )

In [77]:
df=records[records.Type==kinds[1]]
fig=bubble_map(df,kinds[1]+' distribution of cities')
py.iplot( fig, validate=False, filename='d3-bubble-map-populations' )

In [79]:
df=records[records.Type==kinds[2]]
fig=bubble_map(df,kinds[2]+' distribution of cities')
py.iplot( fig, validate=False, filename='d3-bubble-map-populations' )

In [80]:
df=records[records.Type==kinds[3]]
fig=bubble_map(df,kinds[3]+' distribution of cities')
py.iplot( fig, validate=False, filename='d3-bubble-map-populations' )