<h1>Library setup</h1>

In [None]:
#pip install shapely --upgrade
#pip install geojsonio --upgrade
#pip install folium --upgrade

<h4>Geospatial plotting is all about points and polygons</h4>
<li>A point is, um, a point (x,y)
<li>A polygon is a shape defined by a set of points
<li>Shapefiles define regions on a map (e.g., counties or zipcodes)

In [None]:
from shapely.geometry import Point,Polygon
poly = Polygon([(0,0),(1,1),(1,0),(0,1)])
type(poly)

<h2>GeoJSON</h2>
<li>A format for encoding geographical data in a JSON like structure
<li>Easy to read
<li>Easy to create
<li>Fast becoming the standard for sharing geographical data


<h3>Example</h3>

In [None]:
columbia="""
{ "type": "Feature",
        "geometry": {"type": "Point", "coordinates": [-73.9626, 40.8075]},
        "properties": {"name": "Columbia University"}
}
"""
import geojsonio
geojsonio.display(columbia)

<h3>Structure of a geojson file</h3>
<li>A JSON dictionary. (Construct a dict and dump into json)
<li>Consists of (recursive) geojson objects
<li>Each object has a "type" key (FeatureCollection, Feature, Point, Polygon, etc.)
<li>FeatureCollection is associated with a list of "features"
<li>Each feature may have a "geometry" key. 
<li>geometry key has a geojson object as its value and must also have a coordinates key
<li>coordinates key has a list of [longitude,latitude,elevation] as its value
<li>features may have properties (the stuff that shows up when you click on a feature
<li>geojsonio files use OpenStreetMap maps
<li>http://geojson.org/geojson-spec.html
<li>https://macwright.org/2015/03/23/geojson-second-bite.html

In [None]:
example = { "type" : "FeatureCollection",
           "features": [
               {"type": "Feature",
               "geometry": {"type":"Point", "coordinates": [-73.9626, 40.8075]},
                "properties": {"name":"Columbia University"}
               },
               {"type": "Feature",
               "geometry": {"type":"Point", "coordinates": [-73.9965, 40.7295]},
                "properties": {"name":"New York University"}
               },
               {"type": "Feature",
               "geometry": {"type":"LineString","coordinates":
                           [[-73.9626, 40.8075],
                            [-73.9680,40.7489],
                            [-73.9965, 40.7295]
                           ]
                    },
                "properties": {
                    "route":"via United Nations"
                }
               },
               {"type": "Feature",
               "geometry": {"type":"Polygon",
                           "coordinates": [
                               [
                               [-73.9732585597,40.7647613157],[-73.9822190042,40.7686179131],
                               [-73.9585078996,40.8002103312],[-73.9495474551,40.7963555678],
                               [-73.9732585597,40.7647613157]
                               ]
                           ]
                           },
                "properties": {
                    "name":"Central Park"
                }
               }
           ]
          }
                               

               
               

import geojsonio
import json
geojsonio.display(json.dumps(example))

<h2>Mapping incident locations</h2>
<li>We'll construct a geojson object and map the first 100 incidents in 311 data


<h3>Read 311 data</h3>

In [None]:
def read_311_data(datafile):
    import pandas as pd
    import numpy as np
    
    #Add the fix_zip function
    def fix_zip(input_zip):
        try:
            input_zip = int(float(input_zip))
        except:
            try:
                input_zip = int(input_zip.split('-')[0])
            except:
                return np.NaN
        if input_zip < 10000 or input_zip > 19999:
            return np.NaN
        return str(input_zip)
    
    #Read the file
    df = pd.read_csv(datafile,index_col='Unique Key')
    
    #fix the zip
    df['Incident Zip'] = df['Incident Zip'].apply(fix_zip)
    
    #drop all rows that have any nans in them (note the easier syntax!)
    
    df = df.dropna(how='any')
    
    #get rid of unspecified boroughs
    df = df[df['Borough'] != 'Unspecified']
    
    #Convert times to datetime and create a processing time column
    
    import datetime
    df['Created Date'] = df['Created Date'].apply(lambda x:datetime.datetime.strptime(x,'%m/%d/%Y %I:%M:%S %p'))
    df['Closed Date'] = df['Closed Date'].apply(lambda x:datetime.datetime.strptime(x,'%m/%d/%Y %I:%M:%S %p'))
    df['processing_time'] =  df['Closed Date'] - df['Created Date']
    
    #Finally, get rid of negative processing times and return the final data frame
    
    df = df[df['processing_time']>=datetime.timedelta(0,0,0)]
    
    return df
    
datafile = "/Users/hardeepjohar/Documents/Courses/DataAnalyticsFall2017/DataVisualization/nyc_311_data_subset_small.csv"
df = read_311_data(datafile)

In [None]:
df.info()

In [None]:
map_dict = dict()
map_dict["type"] ="FeatureCollection"
features = list()
lats = df['Latitude']
longs = df['Longitude']
agencies = df['Agency']
for index in range(100):
    lat,lon,agency = lats.iloc[index],longs.iloc[index],agencies.iloc[index]
    data_point = { "type": "Feature",
        "geometry": {"type": "Point", "coordinates": [lon, lat]},
        "properties": {"Agency": agency}
        }
    features.append(data_point)
map_dict['features'] = features


In [None]:
len(features)

In [None]:
features[5]

In [None]:
import json
import geojsonio
geojsonio.display(json.dumps(map_dict))

<h2>folium: map drawing library</h2>
<li>Can draw maps using geojson
<li>Can draw maps usiing its own structure
<li>Easy to view maps "inline" in a Jupyter notebook
<li>Easy to overlay data on geojson shape files to create choropleth maps
<li>Nice heat maps
<li>Timeseries maps
<li>https://www.kaggle.com/daveianhickey/how-to-folium-for-maps-heatmaps-time-analysis

<h3>Choropleth maps</h3> 
A choropleth map (from Greek χῶρος ("area/region") + πλῆθος ("multitude")) is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income. (https://en.wikipedia.org/wiki/Choropleth_map)

In [None]:
import folium
m = folium.Map(location=[40.8075,-73.9626],zoom_start=14)

In [None]:
m

<h2>Adding markers to a folium map</h2>


In [None]:
folium.Marker([40.8075,-73.9626],popup="Columbia University").add_to(m)
folium.Marker([40.72956,-73.9965],popup="New York University").add_to(m)
folium.Marker([40.748,-73.9680],popup="Times Square").add_to(m)


In [None]:
m

<h2>choropleth map by zipcodes</h2>

<li>NYC zip code data is available in geojson format at http://data.beta.nyc/dataset/nyc-zip-code-tabulation-areas/resource/6df127b1-6d04-4bb7-b983-07402a2c3f90?view_id=b34c6552-9fdb-4f95-8810-0588ad1a4cc8
<li>Download the data into a file "zipcode.geojson"
<li>We'll use a choropleth map to see how zipcodes differ by the number of complaints


In [None]:
m

In [None]:
df.info()


<h3>Generate incident count by zipcode</h3>

In [None]:
import pandas as pd
zip_groups = df.groupby("Incident Zip")
sizes = pd.DataFrame(zip_groups.size())
import pandas as pd
sizes.rename(columns={0:"size"},inplace=True)
sizes.reset_index(level=0, inplace=True)
sizes['Zip'] = sizes['Incident Zip']
sizes.info()

In [None]:

sizes

In [None]:
import folium
#Center the map at Times Square
m = folium.Map(location = [40.7589,-73.9851],zoom_start=12)

m.choropleth(geo_data='zipcode.geojson', data=sizes,
             columns=[ 'Zip','size'],
             key_on='feature.properties.postalCode',
             fill_color='RdYlGn', fill_opacity=0.7, line_opacity=0.8,
             legend_name='Distribution of Incidents')
folium.LayerControl().add_to(m)

In [None]:
m

<h3>Folium examples</h3>
http://nbviewer.jupyter.org/github/python-visualization/folium/tree/master/examples/