Introduction to the Assignment (Provided by Class Teacher)

In [1]:
import pandas as pd
import numpy as np

This module we'll be looking at the New York City tree census. This data was provided by a volunteer driven census in 2015, and we'll be accessing it via the socrata API. The main site for the data is here, and on the upper right hand side you'll be able to see the link to the API.

The data is conveniently available in json format, so we should be able to just read it directly in to Pandas:

In [2]:
url = 'https://data.cityofnewyork.us/resource/nwxe-4ae8.json'
trees = pd.read_json(url)
trees.head(10)

Unnamed: 0,address,bbl,bin,block_id,boro_ct,borocode,boroname,brch_light,brch_other,brch_shoe,...,tree_dbh,tree_id,trnk_light,trnk_other,trunk_wire,user_type,x_sp,y_sp,zip_city,zipcode
0,108-005 70 AVENUE,4022210000.0,4052307.0,348711,4073900,4,Queens,No,No,No,...,3,180683,No,No,No,TreesCount Staff,1027431.0,202756.7687,Forest Hills,11375
1,147-074 7 AVENUE,4044750000.0,4101931.0,315986,4097300,4,Queens,No,No,No,...,21,200540,No,No,No,TreesCount Staff,1034456.0,228644.8374,Whitestone,11357
2,390 MORGAN AVENUE,3028870000.0,3338310.0,218365,3044900,3,Brooklyn,No,No,No,...,3,204026,No,No,No,Volunteer,1001823.0,200716.8913,Brooklyn,11211
3,1027 GRAND STREET,3029250000.0,3338342.0,217969,3044900,3,Brooklyn,No,No,No,...,10,204337,No,No,No,Volunteer,1002420.0,199244.2531,Brooklyn,11211
4,603 6 STREET,3010850000.0,3025654.0,223043,3016500,3,Brooklyn,No,No,No,...,21,189565,No,No,No,Volunteer,990913.8,182202.426,Brooklyn,11215
5,8 COLUMBUS AVENUE,1011310000.0,1076229.0,106099,1014500,1,Manhattan,No,No,No,...,11,190422,No,No,No,Volunteer,988418.7,219825.5227,New York,10023
6,120 WEST 60 STREET,1011310000.0,1076229.0,106099,1014500,1,Manhattan,No,No,No,...,11,190426,No,No,No,Volunteer,988311.2,219885.2785,New York,10023
7,311 WEST 50 STREET,1010410000.0,1086093.0,103940,1012700,1,Manhattan,No,No,No,...,9,208649,No,No,No,Volunteer,987769.1,217157.8561,New York,10019
8,65 JEROME AVENUE,,,407443,5006400,5,Staten Island,No,No,No,...,6,209610,No,No,No,TreesCount Staff,963073.2,156635.5542,Staten Island,10305
9,638 AVENUE Z,3072350000.0,3320727.0,207508,3037402,3,Brooklyn,No,No,No,...,21,192755,No,No,No,TreesCount Staff,992653.7,152903.6306,Brooklyn,11223


Looks good, but lets take a look at the shape of this data:

In [3]:
trees.shape

(1000, 45)

1000 seems like too few trees for a city like New York, and a suspiciously round number. What's going on?

Socrata places a 1000 row limit on their API. Raw data is meant to be "paged" through for applications, with the expectation that a UX wouldn't be able to handle a full dataset.

As a simple example, if we had a mobile app with limited space that only displayed trees 5 at a time, we could view the first 5 trees in the dataset with the url below:

In [4]:
firstfive_url = 'https://data.cityofnewyork.us/resource/nwxe-4ae8.json?$limit=5&$offset=0'
firstfive_trees = pd.read_json(firstfive_url)
firstfive_trees

Unnamed: 0,address,bbl,bin,block_id,boro_ct,borocode,boroname,brch_light,brch_other,brch_shoe,...,tree_dbh,tree_id,trnk_light,trnk_other,trunk_wire,user_type,x_sp,y_sp,zip_city,zipcode
0,108-005 70 AVENUE,4022210001,4052307,348711,4073900,4,Queens,No,No,No,...,3,180683,No,No,No,TreesCount Staff,1027431.148,202756.7687,Forest Hills,11375
1,147-074 7 AVENUE,4044750045,4101931,315986,4097300,4,Queens,No,No,No,...,21,200540,No,No,No,TreesCount Staff,1034455.701,228644.8374,Whitestone,11357
2,390 MORGAN AVENUE,3028870001,3338310,218365,3044900,3,Brooklyn,No,No,No,...,3,204026,No,No,No,Volunteer,1001822.831,200716.8913,Brooklyn,11211
3,1027 GRAND STREET,3029250001,3338342,217969,3044900,3,Brooklyn,No,No,No,...,10,204337,No,No,No,Volunteer,1002420.358,199244.2531,Brooklyn,11211
4,603 6 STREET,3010850052,3025654,223043,3016500,3,Brooklyn,No,No,No,...,21,189565,No,No,No,Volunteer,990913.775,182202.426,Brooklyn,11215


If we wanted the next 5, we would use this url:

In [5]:
nextfive_url = 'https://data.cityofnewyork.us/resource/nwxe-4ae8.json?$limit=5&$offset=5'
nextfive_trees = pd.read_json(nextfive_url)
nextfive_trees

Unnamed: 0,address,bbl,bin,block_id,boro_ct,borocode,boroname,brch_light,brch_other,brch_shoe,...,tree_dbh,tree_id,trnk_light,trnk_other,trunk_wire,user_type,x_sp,y_sp,zip_city,zipcode
0,8 COLUMBUS AVENUE,1011310000.0,1076229.0,106099,1014500,1,Manhattan,No,No,No,...,11,190422,No,No,No,Volunteer,988418.6997,219825.5227,New York,10023
1,120 WEST 60 STREET,1011310000.0,1076229.0,106099,1014500,1,Manhattan,No,No,No,...,11,190426,No,No,No,Volunteer,988311.19,219885.2785,New York,10023
2,311 WEST 50 STREET,1010410000.0,1086093.0,103940,1012700,1,Manhattan,No,No,No,...,9,208649,No,No,No,Volunteer,987769.1163,217157.8561,New York,10019
3,65 JEROME AVENUE,,,407443,5006400,5,Staten Island,No,No,No,...,6,209610,No,No,No,TreesCount Staff,963073.1998,156635.5542,Staten Island,10305
4,638 AVENUE Z,3072350000.0,3320727.0,207508,3037402,3,Brooklyn,No,No,No,...,21,192755,No,No,No,TreesCount Staff,992653.7253,152903.6306,Brooklyn,11223


You can read more about paging using the Socrata API here

In these docs, you'll also see more advanced functions (called SoQL) under the "filtering and query" section. These functions should be reminding you of SQL.

Think about the shape you want your data to be in before querying it. Using SoQL is a good way to avoid the limits of the API. For example, using the below query I can easily obtain the count of each species of tree in the Bronx:

In [6]:
boro = 'Bronx'
soql_url = ('https://data.cityofnewyork.us/resource/nwxe-4ae8.json?' +\
        '$select=spc_common,count(tree_id)' +\
        '&$where=boroname=\'Bronx\'' +\
        '&$group=spc_common').replace(' ', '%20')
soql_trees = pd.read_json(soql_url)

soql_trees

Unnamed: 0,count_tree_id,spc_common
0,4619,
1,662,silver maple
2,18,pagoda dogwood
3,3917,littleleaf linden
4,12,American larch
5,1483,northern red oak
6,1889,green ash
7,7,pignut hickory
8,56,eastern cottonwood
9,177,shingle oak


This behavior is very common with web APIs, and I think this is useful when thinking about building interactive data products. When in a Jupyter Notebook or RStudio, there's an expectation that (unless you're dealing with truly large datasets) the data you want can be brought in memory and manipulated.

Dash and Shiny abstract away the need to distinguish between client side and server side to make web development more accessible to data scientists. This can lead to some unintentional design mistakes if you don't think about how costly your callback functions are (for example: nothing will stop you in dash from running a costly model triggered whenever a dropdown is called.)

The goal of using the Socrata is to force you to think about where your data operations are happening, and not resort to pulling in the data and performing all operations in local memory.

NOTE: One tip in dealing with URLs: you may need to replace spaces with '%20'. I personally just write out the url and then follow the string with a replace:

In [7]:
url='https://api-url.com/?query with spaces'.replace(' ', '%20')

Assignment

Module 4
In this module we’ll be looking at data from the New York City tree census:
https://data.cityofnewyork.us/Environment/2015-Street-Tree-Census-Tree-Data/uvpi-gqnh
This data is collected by volunteers across the city, and is meant to catalog information
about every single tree in the city.
Build a dash app for a arborist studying the health of various tree species (as defined by the
variable ‘spc_common’) across each borough (defined by the variable ‘borough’). This
arborist would like to answer the following two questions for each species and in each
borough:
1. What proportion of trees are in good, fair, or poor health according to the ‘health’
variable?
2. Are stewards (steward activity measured by the ‘steward’ variable) having an impact
on the health of trees?
Please see the accompanying notebook for an introduction and some notes on the Socrata
API.
Deployment: Dash deployment is more complicated than deploying shiny apps, so
deployment in this case is optional (and will result in extra credit). You can read instructions
on deploying a dash app to heroku here: https://dash.plot.ly/deployment

In [8]:
#grab tree_id , health, and boro. I do not believe we need anymore fields 
#'https://data.cityofnewyork.us/resource/nwxe-4ae8.json?$limit=50000&$offset=0'

#q1_url = ('https://data.cityofnewyork.us/resource/nwxe-4ae8.json?$limit=700000&$offset=0').replace(' ', '%20')

#q1 = pd.read_json(q1_url)

#q1.head(5)

I was having trouble getting the soql query to work. I wanted to pull the three fields only without calling on the entire data table but i was greeted with bad gateway error. This was my attempt:

q1_url = ('https://data.cityofnewyork.us/resource/nwxe-4ae8.json?' +\
'$select=tree_id,health, boroname' +\
'$limit=50000').replace(' ', '%20')

To get around this, I will use the limit clause to query the whole data frame then use panda sql to only grab the data I needfor each question. 

In [9]:

q1_url = ('https://data.cityofnewyork.us/resource/nwxe-4ae8.json?' +\
        '$select=boroname,health, count(tree_id)' +\
        '&$group=boroname,health').replace(' ', '%20')

q1 = pd.read_json(q1_url)

q1

Unnamed: 0,boroname,count_tree_id,health
0,Bronx,10887,Fair
1,Bronx,66603,Good
2,Bronx,3095,Poor
3,Bronx,4618,
4,Brooklyn,25073,Fair
5,Brooklyn,138212,Good
6,Brooklyn,6459,Poor
7,Brooklyn,7549,
8,Manhattan,11460,Fair
9,Manhattan,47358,Good


Data prep:

We can do this more efficient using python's version of sqldf

In [10]:
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())

In [11]:
#grab count of unique trees per boro,spc, health, and status
soql_url = ('https://data.cityofnewyork.us/resource/nwxe-4ae8.json?' +\
        '$select=count(tree_id),boroname,spc_common, health,status' +\
         '&$where=health!=\'NaN\'' +\
        '&$group=boroname,health,status,spc_common').replace(' ', '%20')
soql_trees = pd.read_json(soql_url)

#grab count of unique ids by boro, health, and steward
soql_url_2 = ('https://data.cityofnewyork.us/resource/nwxe-4ae8.json?' +\
        '$select=count(tree_id), boroname, health, steward' +\
         '&$where=health!=\'NaN\'' +\
        '&$group=health,steward, boroname').replace(' ', '%20')
soql_trees_2 = pd.read_json(soql_url_2)

#grab count of unique trees by spc common, health, and steward (across all regions to eaisly find proportions)
soql_url_3 = ('https://data.cityofnewyork.us/resource/nwxe-4ae8.json?' +\
        '$select=count(tree_id), spc_common, health, steward' +\
         '&$where=health!=\'NaN\'' +\
        '&$group=health,steward, spc_common').replace(' ', '%20')
soql_trees_3 = pd.read_json(soql_url_3)



We pulled three levels of granularity when finding out counts of trees. We start off by looking at the tree counts for desired attributes by boro and then by region. We need counts without a geo filter in order to find the proportions. Lets begin to merge our data.

In [12]:
soql_trees.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 5 columns):
boroname         1000 non-null object
count_tree_id    1000 non-null int64
health           1000 non-null object
spc_common       997 non-null object
status           1000 non-null object
dtypes: int64(1), object(4)
memory usage: 39.1+ KB


In [13]:
soql_trees_2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60 entries, 0 to 59
Data columns (total 4 columns):
boroname         60 non-null object
count_tree_id    60 non-null int64
health           60 non-null object
steward          60 non-null object
dtypes: int64(1), object(3)
memory usage: 2.0+ KB


In [14]:
soql_trees_3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 4 columns):
count_tree_id    1000 non-null int64
health           1000 non-null object
spc_common       997 non-null object
steward          1000 non-null object
dtypes: int64(1), object(3)
memory usage: 31.3+ KB


In [15]:
# i do not like warnings 

import warnings

#warnings.filterwarnings('ignore')



In [16]:
soql_trees_sum_2 = soql_trees_2.groupby(['boroname', 'health']).agg({'count_tree_id': [np.sum]})

soql_trees_sum_2b=pd.DataFrame(soql_trees_sum_2.to_records())

soql_trees_sum_2b.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 3 columns):
boroname                    15 non-null object
health                      15 non-null object
('count_tree_id', 'sum')    15 non-null int64
dtypes: int64(1), object(2)
memory usage: 440.0+ bytes


In [17]:
soql_merged_2 = pd.merge(soql_trees_2, soql_trees_sum_2b, on=['boroname', 'health'])

#soql_merged_2 

soql_merged_2.columns = ['boroname', 'count_tree_id', 'health','steward', 'sum_count_tree_id']

soql_merged_2.head(10)


#I do not like the name of the last column so lets rename it 

Unnamed: 0,boroname,count_tree_id,health,steward,sum_count_tree_id
0,Bronx,2130,Fair,1or2,10887
1,Bronx,125,Fair,3or4,10887
2,Bronx,7,Fair,4orMore,10887
3,Bronx,8625,Fair,,10887
4,Brooklyn,6490,Fair,1or2,25073
5,Brooklyn,760,Fair,3or4,25073
6,Brooklyn,59,Fair,4orMore,25073
7,Brooklyn,17764,Fair,,25073
8,Manhattan,4471,Fair,1or2,11460
9,Manhattan,1415,Fair,3or4,11460


In [18]:
soql_trees_sum_3 = soql_trees_3.groupby(['spc_common', 'health']).agg({'count_tree_id': [np.sum]})

soql_trees_sum_3b=pd.DataFrame(soql_trees_sum_3.to_records())

soql_merged_3 = pd.merge(soql_trees_3, soql_trees_sum_3b, on=['spc_common','health'])

soql_merged_3.columns = ['count_tree_id', 'health', 'spc_common','steward', 'sum_count_tree_id']

soql_merged_3.head(10)


Unnamed: 0,count_tree_id,health,spc_common,steward,sum_count_tree_id
0,227,Fair,Amur maple,,302
1,1,Fair,Amur maple,4orMore,302
2,74,Fair,Amur maple,1or2,302
3,5,Fair,pine,1or2,33
4,28,Fair,pine,,33
5,4,Good,flowering dogwood,4orMore,823
6,761,Good,flowering dogwood,,823
7,58,Good,flowering dogwood,3or4,823
8,7,Poor,Turkish hazelnut,1or2,29
9,1,Poor,Turkish hazelnut,3or4,29


In [19]:
soql_trees_sum = soql_trees.groupby(['boroname', 'spc_common']).agg({'count_tree_id': [np.sum]})

soql_trees_sumb=pd.DataFrame(soql_trees_sum.to_records())

soql_merged = pd.merge(soql_trees, soql_trees_sumb, on=['boroname','spc_common'])

soql_merged.columns = ['boroname', 'count_tree_id','health', 'spc_common', 'status', 'sum_count_tree_id']

soql_merged.head(10)

Unnamed: 0,boroname,count_tree_id,health,spc_common,status,sum_count_tree_id
0,Brooklyn,8,Fair,white pine,Alive,8
1,Bronx,44,Good,magnolia,Alive,48
2,Bronx,4,Poor,magnolia,Alive,48
3,Staten Island,8,Poor,European hornbeam,Alive,166
4,Staten Island,158,Good,European hornbeam,Alive,166
5,Queens,2,Poor,pignut hickory,Alive,45
6,Queens,43,Good,pignut hickory,Alive,45
7,Manhattan,441,Poor,Callery pear,Alive,5823
8,Manhattan,5382,Good,Callery pear,Alive,5823
9,Brooklyn,3,Good,black pine,Alive,3


Now that our data has been collected, we can proceed to performing calculations on the data such as dividing by the number of total trees per category per boro to get the correct proportions. We will gather the data needed for each question.

In [40]:
soql_merged['prop_health'] = soql_merged['count_tree_id'] / soql_merged['sum_count_tree_id']

soql_merged.sort_values(by=['boroname','spc_common'])

q1 = soql_merged[['boroname','health','spc_common','prop_health']]

q1.head(10)

Unnamed: 0,boroname,health,spc_common,prop_health
0,Brooklyn,Fair,white pine,1.0
1,Bronx,Good,magnolia,0.916667
2,Bronx,Poor,magnolia,0.083333
3,Staten Island,Poor,European hornbeam,0.048193
4,Staten Island,Good,European hornbeam,0.951807
5,Queens,Poor,pignut hickory,0.044444
6,Queens,Good,pignut hickory,0.955556
7,Manhattan,Poor,Callery pear,0.075734
8,Manhattan,Good,Callery pear,0.924266
9,Brooklyn,Good,black pine,1.0


In [20]:
soql_merged_2['prop_steward'] = soql_merged_2['count_tree_id'] / soql_merged_2['sum_count_tree_id']

soql_merged_2.sort_values(by=['boroname','steward'])

q2 = soql_merged_2

q2.head(10)

Unnamed: 0,boroname,count_tree_id,health,steward,sum_count_tree_id,prop_steward
0,Bronx,2130,Fair,1or2,10887,0.195646
1,Bronx,125,Fair,3or4,10887,0.011482
2,Bronx,7,Fair,4orMore,10887,0.000643
3,Bronx,8625,Fair,,10887,0.792229
4,Brooklyn,6490,Fair,1or2,25073,0.258844
5,Brooklyn,760,Fair,3or4,25073,0.030311
6,Brooklyn,59,Fair,4orMore,25073,0.002353
7,Brooklyn,17764,Fair,,25073,0.708491
8,Manhattan,4471,Fair,1or2,11460,0.39014
9,Manhattan,1415,Fair,3or4,11460,0.123473


In [21]:
soql_merged_3['prop_steward'] = soql_merged_3['count_tree_id'] / soql_merged_3['sum_count_tree_id']

soql_merged_3.sort_values(by=['spc_common','steward'])

q3 = soql_merged_3

q3.head(10)

Unnamed: 0,count_tree_id,health,spc_common,steward,sum_count_tree_id,prop_steward
0,227,Fair,Amur maple,,302,0.751656
1,1,Fair,Amur maple,4orMore,302,0.003311
2,74,Fair,Amur maple,1or2,302,0.245033
3,5,Fair,pine,1or2,33,0.151515
4,28,Fair,pine,,33,0.848485
5,4,Good,flowering dogwood,4orMore,823,0.00486
6,761,Good,flowering dogwood,,823,0.924666
7,58,Good,flowering dogwood,3or4,823,0.070474
8,7,Poor,Turkish hazelnut,1or2,29,0.241379
9,1,Poor,Turkish hazelnut,3or4,29,0.034483


Our data has been gathered. We can now proceed to buiding an app using dash. 

In [None]:
import dash
import dash_core_components as dcc
import dash_html_components as html

external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

df = q1

available_indicators = df['spc_common'].unique()

app = dash.Dash(__name__, external_stylesheets=external_stylesheets)



app.layout = html.Div([
    html.H1('Find the Prop. of health by Borough for each SPC'),
    html.Div('''
        spc_common
    '''),
    dcc.Dropdown(
        id='my-dropdown',
        options=[{'label': i, 'value': i} for i in available_indicators],
        value='Atlas cedar'
    ),
    dcc.Graph(
        id='example-graph'    
    )
    
])

@app.callback(
    dash.dependencies.Output('example-graph', 'figure'),
    [dash.dependencies.Input('my-dropdown', 'value')])

def update_output(selected_dropdown_value):
    dff = df[df['spc_common'] == selected_dropdown_value]
    figure = {
            'data': [
                {'x': dff.boroname[dff['health'] == 'Good'], 'y': dff.prop_health[dff['health'] == 'Good'], 'type': 'bar', 'name': 'Good'},
                {'x': dff.boroname[dff['health'] == 'Fair'], 'y': dff.prop_health[dff['health'] == 'Fair'], 'type': 'bar', 'name': 'Fair'},
                {'x': dff.boroname[dff['health'] == 'Poor'], 'y': dff.prop_health[dff['health'] == 'Poor'], 'type': 'bar', 'name': 'Poor'}
            ],
            'layout': {
                'title': 'Prop. of health by boroname'
            }
        }
    return figure 


if __name__ == '__main__':
    app.run_server()

 * Running on http://127.0.0.1:8050/ (Press CTRL+C to quit)
127.0.0.1 - - [23/Mar/2019 10:50:49] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2019 10:50:51] "GET /_dash-layout HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2019 10:50:51] "GET /_dash-dependencies HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2019 10:50:51] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2019 10:50:52] "GET /_favicon.ico HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2019 10:51:49] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2019 10:51:50] "GET /_dash-layout HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2019 10:51:50] "GET /_dash-dependencies HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2019 10:51:50] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2019 10:51:54] "POST /_dash-update-component HTTP/1.1" 200 -


In [None]:
import dash
import dash_core_components as dcc
import dash_html_components as html

external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

df = q2

available_indicators = df['boroname'].unique()

app = dash.Dash(__name__, external_stylesheets=external_stylesheets)



app.layout = html.Div([
    html.H1('Find the Prop. of Steward by health for each Borough'),
    html.Div('''
        boroname
    '''),
    dcc.Dropdown(
        id='my-dropdown',
        options=[{'label': i, 'value': i} for i in available_indicators],
        value='Queens'
    ),
    dcc.Graph(
        id='example-graph'    
    )
    
])

@app.callback(
    dash.dependencies.Output('example-graph', 'figure'),
    [dash.dependencies.Input('my-dropdown', 'value')])

def update_output(selected_dropdown_value):
    dff = df[df['boroname'] == selected_dropdown_value]
    figure = {
            'data': [
                {'x': dff.health[dff['steward'] == 'None'], 'y': dff.prop_steward[dff['steward'] == 'None'], 'type': 'bar', 'name': 'None'},
                {'x': dff.health[dff['steward'] == '1or2'], 'y': dff.prop_steward[dff['steward'] == '1or2'], 'type': 'bar', 'name': '1or2'},
                {'x': dff.health[dff['steward'] == '3or4'], 'y': dff.prop_steward[dff['steward'] == '3or4'], 'type': 'bar', 'name': '3or4'},
                {'x': dff.health[dff['steward'] == '4orMore'], 'y': dff.prop_steward[dff['steward'] == '4orMore'], 'type': 'bar', 'name': '4orMore'}
            ],
            'layout': {
                'title': 'Prop. of Steward by health'
            }
        }
    return figure 


if __name__ == '__main__':
    app.run_server()

 * Running on http://127.0.0.1:8050/ (Press CTRL+C to quit)
127.0.0.1 - - [23/Mar/2019 17:25:12] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2019 17:25:13] "GET /_dash-layout HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2019 17:25:13] "GET /_dash-dependencies HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2019 17:25:13] "POST /_dash-update-component HTTP/1.1" 200 -


In [None]:
import dash
import dash_core_components as dcc
import dash_html_components as html

external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

df = q3

available_indicators = df['spc_common'].unique()

app = dash.Dash(__name__, external_stylesheets=external_stylesheets)



app.layout = html.Div([
    html.H1('Find the Prop. of Steward by health for each spc_common'),
    html.Div('''
        spc_common
    '''),
    dcc.Dropdown(
        id='my-dropdown',
        options=[{'label': i, 'value': i} for i in available_indicators],
        value='Atlas cedar'
    ),
    dcc.Graph(
        id='example-graph'    
    )
    
])

@app.callback(
    dash.dependencies.Output('example-graph', 'figure'),
    [dash.dependencies.Input('my-dropdown', 'value')])

def update_output(selected_dropdown_value):
    dff = df[df['spc_common'] == selected_dropdown_value]
    figure = {
            'data': [
                {'x': dff.health[dff['steward'] == 'None'], 'y': dff.prop_steward[dff['steward'] == 'None'], 'type': 'bar', 'name': 'None'},
                {'x': dff.health[dff['steward'] == '1or2'], 'y': dff.prop_steward[dff['steward'] == '1or2'], 'type': 'bar', 'name': '1or2'},
                {'x': dff.health[dff['steward'] == '3or4'], 'y': dff.prop_steward[dff['steward'] == '3or4'], 'type': 'bar', 'name': '3or4'},
                {'x': dff.health[dff['steward'] == '4orMore'], 'y': dff.prop_steward[dff['steward'] == '4orMore'], 'type': 'bar', 'name': '4orMore'}
            ],
            'layout': {
                'title': 'Prop. of Steward by health'
            }
        }
    return figure 


if __name__ == '__main__':
    app.run_server()