![](https://i.imgur.com/CSRPuEo.png)

# How To Easily Make a Choropleth with Plotly

I've run into Plotly choropleth problems a few times over the last few months and I was always frustrated at how little _clear_ information was available online. As a result the aim of this Kernel will be to lay out how to easily make a Plotly Choropleth, as well as explain some key details for begginers. At the end I add some extra features that could be useful for dashboards and such. I'll try to keep the text as short and concise as possible.

I'll be using the US EPA Toxic Release Inventory Dataset provided to Kaggle by the EPA.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import os
print(os.listdir("../input"))

In [None]:
data = pd.read_csv('../input/basic_data_files.csv', nrows=2000000, low_memory=False)

First, lets check plotly version:

In [None]:
import plotly
print('Plotly Version: ',plotly.__version__)

import plotly.offline as py
import plotly.graph_objs as go
py.init_notebook_mode(connected=True)

### Data: <br>
The bare minimum data to make a choropleth is a list of states/contries using a standardized format. Plotly recognizes ISO-3, US state names, and country names. Then you'll need a corresponding list of values for the colour intensity of each geographic area, Plotly automatically scales the colour so these values can be as large or as small as you like. 

I make a DataFrame 'reports' to list the number of reported chemical spills in each state. Note, I needed to sort_index() of the value_counts() to keep the values consistent with the labels (more relevent when you have multiple years and some states are not included).

Below is a pretty plain choropleth that can be quickly copied.

In [None]:
# We'll make a choropleth of the total reported release incidences for 2016:
reports = pd.DataFrame(index=data[data.YEAR=='2016'].ST.value_counts().sort_index().index.tolist(), 
                       columns=['2016'],
                       data=data[data.YEAR == '2016'].ST.value_counts().sort_index().values
                      )

# initial, simple choropleth
trace = dict(type='choropleth',
                     name='2016',
                     locations = reports.index,
                     z = reports['2016'],
                     locationmode = 'USA-states',
                     colorbar = dict(title='# Reported Chemical Releases')
                    )

# Plot layout
layout = dict(title = 'Number of Toxic Chemicals Released in US States in 2016',
              geo = dict(scope='usa', projection=dict(type='albers usa'))
             )

fig = dict(data=[trace], layout=layout)
py.iplot(fig)

Wow, Texas has quite the number of chemical spills!
This wsa just the beginning, Plotly allows you to customize a lot of different features such as:

Trace Elements:
- colorscale:
> colorscale =  [ 'Blackbody' | 'Bluered' | 'Blues' | 'Earth' | 'Electric' | 'Greens' |'Greys' | 'Hot' | 'Jet' | 'Picnic' | 'Portland' | 'Rainbow' | 'RdBu' | 'Reds' | 'Viridis' | 'YlGnBu' | 'YlOrRd' ]
- reversescale: 
> reversescale=True/False
- locationmode:
> locationmode= ["ISO-3" | "USA-states" | "country names"]
    - must match locations provided

Layout Elements:
- geo>scope>projection> type:
> type = [ enumerated : "equirectangular" | "mercator" | "orthographic" | "natural earth" | "kavrayskiy7" | "miller" | "robinson" | "eckert4" | "azimuthal equal area" | "azimuthal equidistant" | "conic equal area" | "conic conformal" | "conic equidistant" | "gnomonic" | "stereographic" | "mollweide" | "hammer" | "transverse mercator" | "albers usa" | "winkel tripel" | "aitoff" | "sinusoidal" ]
    - sets the topographic type, experiment to see some cool projections
    - you can also set the rotation and intial view angles, etc.

There many more adjustments, but they're rather hard to find. With a bit of work you can sift through the chart reference to find more: https://plot.ly/python/reference/#choropleth 

Out of interest I've made a widget that allows you to choose which colourscale and view the map below, but this won't work on Kaggle so you can experiment in a Jupyter notebook. I've provided and hidden the code below to make the widget work with this dataset:


In [None]:
import ipywidgets as widgets
from ipywidgets import interact, interact_manual, Layout, interactive

# We'll make a choropleth of the total reported release incidences for 2016:
reports = pd.DataFrame(index=data[data.YEAR == '2016'].ST.value_counts().sort_index().index.tolist(), 
                   columns=['2016'],
                   data=data[data.YEAR == '2016'].ST.value_counts().sort_index().values
                  )

# Plot layout
layout = dict(title = 'Number of Toxic Chemicals Released in US States in 2016',
              geo = dict(scope='usa', projection=dict(type='albers usa'))
             )
@interact
def use_colorscale(colorscale = ['Blackbody', 'Bluered', 'Blues', 'Earth', 'Electric', 'Greens', 'Greys', ...
                                 'Hot', 'Jet', 'Picnic', 'Portland', 'Rainbow', 'RdBu', 'Reds', 'Viridis', 'YlGnBu', 'YlOrRd'],
                   reversescale = [True, False]):
    # change trace
    trace = dict(type='choropleth',
                         colorscale=colorscale,
                         reversescale=reversescale,
                         name='2016',
                         locations = reports.index,
                         z = reports['2016'],
                         locationmode = 'USA-states',
                         colorbar = dict(title='# Reported Chemical Releases')
                        )
    
    fig = dict(data=[trace], layout=layout)
    py.iplot(fig)   

# Adding a Slider:

Our dataset actually has many years of data, it would be convenient to be able to interact and show the data changing over time. Here's where Plotly's 'slider' comes in handy.

The slider works by 'restyling' the input data, but you must create the 'trace' for each plot ahead of time, store them all in a list, and then tell Plotly which trace to use with a boolean list corresponding to our trace list. Kind of confusing, but you can see how it works below:

In [None]:
# add slider

# get list of the years in order, easiest if manually created vs. search and using list(set())
years = [i for i in range(2000, 2017)]

#create data frame with index=states and columns=year
state_counts = pd.DataFrame(index=data[data.YEAR == '2016'].ST.value_counts().sort_index().index.tolist(), columns=years)

# fill each year column with the number of contaminated sites in each state
for i in years: state_counts[i] = data[data.YEAR == str(i)].ST.value_counts()

# create a list and loop through every year, store the trace in data_bal and then update with a 
# new year will have a list with a trace for every year
data_bal = []
for i in years:
    data_upd = [dict(type='choropleth',
                     name=i,
                     colorscale = 'Blues',
                     reversescale=True,
                     locations = state_counts[i].index,
                     z = state_counts[i].values,
                     locationmode = 'USA-states',
                     colorbar = dict(title='# Reported Chemical Releases'
                                     )
                    )
               ]
    
    data_bal.extend(data_upd)
    
# set menus inside the plot
# Create list called 'Steps', where each element is a boolean list indicating which trace 
# in data_bal should be used. The length of data_bal = number of years in the slider, so for 
# each year on the slider we will have a boolean list that is the length of 'years', with 
# every value set to 'False', except for the element corresponding to the trace for that year, 
# which we set with 'step['arg'][1][i]=True'. Each list will be called with the slider to
# tell plotly which trace with show for that slider option. The 'restyle' method means we are
# editting data in the plot, and the 'visible' argument is the bool array mentioned previously.
steps = []
for i in range(0,len(data_bal)):
    step = dict(method = "restyle",
                args = ["visible", [False]*len(data_bal)],
                label = years[i]) 
    step['args'][1][i] = True
    steps.append(step)

# Sliders layout:
sliders = [dict(active = 10,
                currentvalue = {"prefix": "Year: "},
                pad = {"t": 50},
                steps = steps)]

# Plot layout
layout = dict(title = 'Number of Toxic Chemicals Released in US States',
              geo = dict(scope='usa',
                         projection=dict( type='albers usa')),
              sliders = sliders)

fig = dict(data=data_bal, layout=layout)
py.iplot(fig)

Nice, we have our slider all set up. But, if you look carefully you can see that each choropleth is individually scaled, so your plot may not show proper changes over time. I ran into this issue when using this dataset to have a slider from 1987-2016 (there are far, far more reported spills nowadays vs. 1993, but the plots looked the same).

# Normalize Choropleths 
If you add a row to the state_counts DataFrame where every element is the maximum value from the entire dataframe, Plotly will automatically scale each plot to that value.

See the change below:

In [None]:
def plot_choro(state_counts):
    data_bal = []
    for i in years:
        data_upd = [dict(type='choropleth',
                         name=i,
                         colorscale = 'Blues',
                         reversescale=True,
                         locations = state_counts[i].index,
                         z = state_counts[i].values,
                         locationmode = 'USA-states',
                         colorbar = dict(title='# Reported Chemical Releases'
                                         )
                        )
                   ]

        data_bal.extend(data_upd)
    steps = []
    for i in range(0,len(data_bal)):
        step = dict(method = "restyle",
                    args = ["visible", [False]*len(data_bal)],
                    label = years[i]) 
        step['args'][1][i] = True
        steps.append(step)
    # Sliders layout:
    sliders = [dict(active = 10,
                    currentvalue = {"prefix": "Year: "},
                    pad = {"t": 50},
                    steps = steps)]
    # Plot layout
    layout = dict(title = 'Number of Toxic Chemicals Released in US States',
                  geo = dict(scope='usa',
                             projection=dict( type='albers usa')),
                  sliders = sliders)
    fig = dict(data=data_bal, layout=layout)
    py.iplot(fig)

In [None]:
years = [i for i in range(2000, 2016+1)]
state_counts = pd.DataFrame(index=data[data.YEAR == str(2016)].ST.value_counts().sort_index().index.tolist(), columns=years)
for i in years: state_counts[i] = data[data.YEAR == str(i)].ST.value_counts()

# normalize the scalebar by adding a row 'norm' so that the plotly choropleth will 
# normalize each column to that maximum use the maximum value of the entire 
# DataFrame so the scale bar is consistent over all years
state_counts.loc['norm'] = state_counts.max().max()

# plot_choro (same lines as before):
plot_choro(state_counts) 

I highly recommend using the ipywidgets feature in notebooks, it's quite simple and easy once you've already made the plot and gives so much extra flexibility.

That's it for this kernel, hopefully this helps someone make a choropleth and please leave any comments or suggestions in the comment section!