# Visualization of Manhattan Trees using Plotly and Mapbox

## Import required libraries

------------------

Note that Plotly requires an API key for Python not stored on this file.

Attempting to graph without authentication will NOT WORK.

View instructions here for API setup:

https://plot.ly/python/getting-started/

If Mapbox API does not work, create an account and go under "API access tokens", then paste the key.

In [15]:
import numpy as np
import pandas as pd

import plotly.plotly as py
from plotly.graph_objs import *

mapbox_access_token = 'pk.eyJ1IjoiamFja2x1byIsImEiOiJjaXhzYTB0bHcwOHNoMnFtOWZ3YWdreDB3In0.pjROwb9_CEuyKPE-x0lRUw'

## Unzip dataset beforehand (else code will NOT work)

In [18]:
filename = "data/2015_Street_Tree_Census_-_Tree_Data.csv"
chart_filename = "Manhattan trees"

## Create functions

------------------

These two functions normalizes outlier data and cleans up text respectively.

In [19]:
# Divides size by 2 and sets max, min values so that points don't pop from map
def get_size(tree_dbh):
    size = tree_dbh / 2
    size = max(1, size)
    return min(10, size)

# Returns properly formatted text for species
def title(x):
    x = str(x)
    return x.title()

## Wrangle data

-----------------

Panda's built-in CSV reader will be used to load the 100+ mb dataset.

Comments/notes inside code.

In [20]:
# Loads the CSV file and times it 

%time df = pd.read_csv(filename, encoding="utf-8-sig")
#print df.columns

# Limits borough to Manhattan only, otherwise map may crash due to too many (500k+) points

df = df[df["boroname"] == "Manhattan"]

# Divides the trees by three health categories: healthy, fair, and poor
# This is done to ensure that the final map has 3 toggleable traces

healthy = df[df["health"] == "Good"]
fair = df[df["health"] == "Fair"]
poor = df[df["health"] == "Poor"]

# For each health level, get Series for latitudes, longitudes
# capitalized species, and normalized sizes (with pandas.apply())

h_latitudes = healthy["latitude"]
h_longitudes = healthy["longitude"]
h_species = healthy["spc_common"].apply(title)
h_sizes = healthy["tree_dbh"].apply(get_size)

f_latitudes = fair["latitude"]
f_longitudes = fair["longitude"]
f_species = fair["spc_common"].apply(title)
f_sizes = fair["tree_dbh"].apply(get_size)

p_latitudes = poor["latitude"]
p_longitudes = poor["longitude"]
p_species = poor["spc_common"].apply(title)
p_sizes = poor["tree_dbh"].apply(get_size)

CPU times: user 3.59 s, sys: 208 ms, total: 3.8 s
Wall time: 3.8 s


## Create traces

-------------

Here we create 3 Scattermapbox traces that are used to plot the trees on the map. 

Healthy trees are colored green, fair ones yellow, and poor ones orange.

Attributes of a Scattermapbox() object used here are:

- lat : list of latitudes of each point on trace
- long : list of longitudes of each point on trace
- mode : markers (set by default, i.e. we are simply plotting points)
- name : name of the trace (e.g. Healthy, Fair, Poor)
- text : list of text shown when hovering on individual points on trace (e.g. species)

And for Marker(), we have:

- color : color of all markers for a given trace (optionally, list of colors for each marker)
- size : list of sizes for each point on trace (size scales w.r.t diameter, not area)
- opacity : opacity of each point

Note that name identifies one of the three traces, while text will display the individual species of a tree

Also, none of these are really "lists". Plotly accepts a Pandas Series by default w/o need for conversion.


In [21]:

# Healthy, green
trace0 = Scattermapbox(
    lat = h_latitudes,
    lon = h_longitudes,
    mode = 'markers',
    marker = Marker(
        color = "#46BE60",
        size = h_sizes,
        opacity = 0.5,   
    ),
    name = "Healthy",
    text = h_species,
)

# Fair, yellow
trace1 = Scattermapbox(
    lat = f_latitudes,
    lon = f_longitudes,
    mode = 'markers',
    marker = Marker(
        color = "#D6C13C",
        size = f_sizes,
        opacity = 0.5,   
    ),
    name = "Fair",
    text = f_species,
)

# Poor, orange
trace2 = Scattermapbox(
    lat = p_latitudes,
    lon = p_longitudes,
    mode = 'markers',
    marker = Marker(
        color = "#C2772C",
        size = p_sizes,
        opacity = 0.5,   
    ),
    name = "Poor",
    text = p_species,
)


# Create layout

-------------

Specifies looks of the map and some of its properties not related to individual data.

- Annotations() : list of one Annotation() that tells the user he/she can toggle individual traces
- autosize : determine if graph resizes dynamically
- height, width : specifies default height and width (if autosize true, affects map ratio)
- Font() : specifies defaut font propreties (font-family, color and size)
- margin : specifies margins of the map
- paper_bgcolor : specifies background color of the margin
- hovermode : specifies whether marker info is shown for closest point; or if info is shown for all points of a given axis (e.g. to compare prices of 3 different stocks at same time)
- legend : specify location of legend (moved slighly downwards for annotation)
- title : title of the graph

As for Mapbox specifications:

- mapbox_access_token : API key
- bearing : degree of map rotation (for Manhattan, 28.5 degrees gives streets perpendicular to main map axis)
- center : specifies coordinates the map should be centered
- pitch : not used, speicifies if map is seen from bird's eye view, or with a lower viewing angle
- zoom : specifies how zoomed in the map should be
- style : uses the dark Mapbox style for a more aesthetic visualization

Note that Plotly objects can alternatively be structured as dicts.


In [22]:
layout = Layout(
    annotations=Annotations([
       Annotation(
           x=1,
           y=1,
           align='right',
           showarrow=False,
           text='    Toggle view:',
           xanchor='left',
           xref='paper',
           yref='paper'
       )]),
    autosize=True,
    height=1024,
    width=1024,
    font=Font(
        family = 'Overpass',
        color = "#CCCCCC",
        size = 14,
    ),
    margin=Margin(
        t=80,
        l=40,
        b=40,
        r=40,
        pad=0,
    ),
    paper_bgcolor = "#020202",
    hovermode='closest',
    legend = dict(x=1, y=0.97),
    title = "Visualization of trees in Manhattan",
    mapbox=dict(
        accesstoken=mapbox_access_token,
        bearing=28.5,
        center=dict(
            lat=40.785,
            lon=-73.96
        ),
        pitch=0,
        zoom=11.15,
        style="dark"
    ),
)

# Upload map

--------------

Plotly figures (e.g. graphs, or a map in this case) are structured in two parts: data and layout. These are stored in the Figure() object.

Data contains all the traces, while layout specifies the layout.

Alongside the figure, a plot also has a filename, along with other file creation options not discussed here.

py.plot() takes all these objects and creates a Plotly plot.

In [25]:
traces = [trace0, trace1, trace2]
data = Data(traces)
figure = Figure(data=data, layout=layout)
py.iplot(figure, filename=chart_filename, fileopt="overwrite")