# Visualization with Plotly Python Library On Different Datasets

# Introduction

Data Visualization is an important part of Data Science to communicate information clearly and efficiently via statistical graphics, plots and information graphics. It makes complex data more accessible, understandable and usable.

In order to make data and information clear, we should select visualization technology wisely since there are many ways of visualization which have different advantages and fit different datasets. So here we will introduce the plots, charts and graphs on different datasets and explain why we select the visualization technology for the certain dataset. Plotly Python Library provides plenty of tools for visualization and we will use it through this tutorial.

Plotly's Python graphing library makes interactive, publication-quality graphs online. Examples of how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts.

## Tutorial Content
This tutorial will show how to do data visualization using Plotly Python Library to help analyze the data. When introduce a certain plot or chart, it will explain why we select this tool for the dataset and how to visualize the data.

It will use the following Wine Dataset and Iris Dataset in sklearn, [Seattle Pet Dataset](https://data.seattle.gov/api/views/jguv-t9rb/rows.csv?accessType=DOWNLOAD), [Volcano Dataset](https://vincentarelbundock.github.io/Rdatasets/csv/datasets/volcano.csv), [US Population Dataset in 500 Cities](https://www.kaggle.com/cdc/500-cities/downloads/500_Cities_CDC.csv) which will be introcduced in 'Loading Data' section. The it will show how to do visualization on these datasets. 

The tutorial will cover the following topics:
- [Installation](#Installation)
- [Initialization and Basic Plotting](# Initialization and Basic Plotting)
- [Loading Data](#Loading-Data)
- [Basic Visualization](#Basic-Visualization)
- [Further Visualization](#Further-Visualization)
- [Summary and References](#Summary-and-References)

# Installation
- Before installing the Plotly Python Library, make sure you have **pip** installed. If not, please install **pip** first by:

    `sudo apt install python3-venv python3-pip` for Python3

    `sudo apt install python-pip` for Python 2


- To install Plotly's python package, use the package manager **pip** inside your terminal and install Plotly by:

    `$ pip install plotly`   
    
    or  
    
    `$ sudo pip install plotly`


- Plotly's Python package is updated frequently! To upgrade, run:

    `$ pip install plotly --upgrade`
    

- After installation, make sure the following commands could work:

In [271]:
import plotly

# Initialization and Basic Plotting

Plolty provies two ways of plotting: Online Plotting and Offline Plotting. Here, we introduce the initialization for both but the tutorial will use **Online Plotting** for the following topics. Make sure you could run the sample codes of basic plotting before moving on.

## Online Plotting Initialization
To get started, we have to carete a free account [here](https://plot.ly/feed/) to plot with web-wervice for hsting graphs provided by Plotly. Then, your graphs will be saved in your online Plotly account. Also, you may control the privacy but only public hosting is free. If you need private hosting, check [here](https://plot.ly/products/cloud/) for pricing. 

After get your Plotly package ready, you may fire up python by:

`$ python`

then you are ready to set up credentials:

In [272]:
import plotly
plotly.tools.set_credentials_file(username='YourAccount', api_key='YourAPIKey')

Remeber you'll have to replace 'YourAccount' and 'YourAPIKey' with your real username and the [API key](https://plot.ly/settings/api). You could find your API key [here](https://plot.ly/accounts/login/?next=%2Fsettings%2Fapi).

### Online Plot Privacy
You may set your plot to 3 different type of pirvacies: public, privat or secret.

- public: Anyone can view this graph. It will appear in your profile and can appear in search engines. You do not need to be logged in to Plotly to view this chart.

- private: Only you can view this plot. It will not appear in the Plotly feed, your profile, or search engines. You must be logged in to Plotly to view this graph. You can privately share this graph with other Plotly users in your online Plotly account and they will need to be logged in to view this plot.
- secret: Anyone with this secret link can view this chart. It will not appear in the Plotly feed, your profile, or search engines. If it is embedded inside a webpage or an IPython notebook, anybody who is viewing that page will be able to view the graph. You do not need to be logged in to view this plot.

By default all plots are set to public. Users with free account have the permission to keep one private plot. If you need to save private plots, upgrade to a pro account. If you're a Personal or Professional user and would like the default setting for your plots to be private, you can edit your Plotly configuration:

In [273]:
import plotly

# Do not run the following codes if you want to continue this tutorial using online plotting.
# plotly.tools.set_config_file(world_readable=True, sharing='private')

**Do not run the previous codes if you want to continue this tutorial using online plotting.**

Please visit [Python privacy documentation](https://plot.ly/python/privacy/) for more examples about online privacy.


### Start Online Plotting
For online plotting, there are two main methods `plotly.plot()` and `plotly.iplot()`. They will create a unique URL for the plot and save it in your Plotly account.

- `plotly.plot()`: return the unique URL and optionally open the URL.

In [297]:
import plotly
from plotly.graph_objs import *

trace0 = Scatter(
    x=[1, 2, 3, 4],
    y=[10, 15, 13, 17]
)
trace1 = Scatter(
    x=[1, 2, 3, 4],
    y=[16, 5, 11, 9]
)
data = Data([trace0, trace1])

plotly.plotly.plot(data, filename = 'basic-plotting')

- `py.iplot()`: display the plot in a Jupyter Notebook.

In [275]:
import plotly
from plotly.graph_objs import *

trace1 = Scatter(x=[1, 2, 3, 4, 5], y=[12, 3, 15, 2, 11])
trace2 = Scatter(x=[1, 2, 3, 4, 5], y=[1, 15, 11, 9, 7])

sample_data = Data([trace1, trace2])

plotly.plotly.iplot(sample_data, filename = 'basic-plotting')

![1.JPG](attachment:1.JPG)
You could check the interactive plot [here](https://plot.ly/~kuozhao/0/).

- You could create plotly graphs with matplotlib syntax. For more, check out [matplotlib documentation](https://plot.ly/matplotlib/).

## Offline Plotting Initialization

For offline plotting, there are two main methods `plotly.offline.plot()` and `plotly.offline.iplot()`. Note that Plotly Offline allows you to create graphs offline and save them locally.

### Start Offline Plotting

- Before plot offline, you must check Plotly version. V1.9.4+ is needed for offline plotting. You may check it as following:

In [276]:
import plotly
plotly.__version__

'2.4.1'

- `plotly.offline.plot()`: create and standalone HTML that is saved locally and opened inside your web browser.


In [277]:
import plotly
from plotly.graph_objs import Scatter, Layout

trace1 = Scatter(x=[1, 2, 3, 4, 5], y=[10, 6, 7, 3, 8])
trace2 = Scatter(x=[1, 2, 3, 4, 5], y=[5, 11, 2, 12, 8])

plotly.offline.plot({
    "data": [trace1, trace2],
    "layout": Layout(title="Offline Plotting by plotly.offline.Plot()")
})

'file://C:\\Users\\kuozh\\Desktop\\Graduate_2\\PDS\\Tutorial\\temp-plot.html'

- `plotly.offline.iplot()` display the plot in a Jupyter Notebook when working offline.

In [296]:
from plotly.graph_objs import Scatter, Layout

plotly.offline.init_notebook_mode(connected=True)

trace1 = Scatter(x=[1, 2, 3, 4, 5], y=[10, 6, 7, 3, 8])
trace2 = Scatter(x=[1, 2, 3, 4, 5], y=[5, 11, 2, 12, 8])

plotly.offline.iplot({
    "data": [trace1, trace2],
    "layout": Layout(title="Basic Offline Plotting by plotly.offline.iPlot()")
})

![Capture2.JPG](attachment:Capture2.JPG)

- For more examples on plotting offline with Plotly in python please check out on the official [offline documentation](https://plot.ly/python/offline/).

## Summary for Initialization 
Now you know how to initialize your Plotly and how to do basic plotting. For more types of plotting and examples, please check out [here](https://plot.ly/python/#basic-charts).

# Loading Data

In this tutorial, we'll be using appropriate datasets for different plots and charts. Here are the datasets that we'll use and which sections they will be used. See [Basic Visualization](#Basic-Visualization) and [Further Visualization](#Further-Visualization) for why we select the visualization technologies.

- Wine Dataset in sklearn: The wine dataset is a classic and very easy multi-class classification dataset. The bar charts and pie charts could show the structure of its data efficiently.


- Iris Dataset in sklearn: The iris dataset is a multi-class classification dataset. The data has three categories: 'setosa', 'versicolor' and 'virginica'. Since it has 4 dimensions Sepal Length, Sepal Width, Petal Length and Petal Width, scatter plots would be appropiate to show its distribution.


- [Seattle Pet Licenses Dataset](https://data.seattle.gov/api/views/jguv-t9rb/rows.csv?accessType=DOWNLOAD): Pet licenses issued by the Seattle Animal Shelter between 2005 and early 2017. The number of licenses changes with time, so the time series charts will be a good choice for it.


- [Volcano Dataset](https://vincentarelbundock.github.io/Rdatasets/csv/datasets/volcano.csv): The altitude data of a volcano. The heat maps and surface plots could show it perfectly. 


- [US Population Dataset in 500 Cities](https://www.kaggle.com/cdc/500-cities/downloads/500_Cities_CDC.csv): The population data of 500 cities in the US. Since it contains location information, Choropleth maps should be selected to show it.

Please download the last three datasets by click their names and save the CSV files in the same directory with your notebook.
We will use pandas to load the data as DataFrame. If you are not familiar with pandas DataFrame, please check out [this document](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html).

Before loading data, make sure the following packages installed and the commands could work for you.

In [279]:
from sklearn import datasets
import pandas as pd

- Load wine and iris data from sklearn datasets.

In [280]:
# Load wine data
wine = datasets.load_wine()
wine_df = pd.DataFrame(wine.data)

wine_df['Class'] = wine.target
wine_df['Class'] = wine_df['Class'].astype('category')

print("The Wine Dataset in sklearn:")
print(wine_df)

# Load iris data
iris = datasets.load_iris()
iris_df = pd.DataFrame(iris.data)

iris_df.columns  = ['S_Length','S_Width','P_Length','P_Width']

iris_df['Class'] = iris.target
iris_df['Class'] = iris_df['Class'].astype('category')

iris_df['Class'].replace(0,'setosa',inplace=True)
iris_df['Class'].replace(1,'versicolor',inplace=True)
iris_df['Class'].replace(2,'virginica',inplace=True)

print("The Iris Dataset in sklearn:")
print(iris_df)

The Wine Dataset in sklearn:
         0     1     2     3      4     5     6     7     8          9    10  \
0    14.23  1.71  2.43  15.6  127.0  2.80  3.06  0.28  2.29   5.640000  1.04   
1    13.20  1.78  2.14  11.2  100.0  2.65  2.76  0.26  1.28   4.380000  1.05   
2    13.16  2.36  2.67  18.6  101.0  2.80  3.24  0.30  2.81   5.680000  1.03   
3    14.37  1.95  2.50  16.8  113.0  3.85  3.49  0.24  2.18   7.800000  0.86   
4    13.24  2.59  2.87  21.0  118.0  2.80  2.69  0.39  1.82   4.320000  1.04   
5    14.20  1.76  2.45  15.2  112.0  3.27  3.39  0.34  1.97   6.750000  1.05   
6    14.39  1.87  2.45  14.6   96.0  2.50  2.52  0.30  1.98   5.250000  1.02   
7    14.06  2.15  2.61  17.6  121.0  2.60  2.51  0.31  1.25   5.050000  1.06   
8    14.83  1.64  2.17  14.0   97.0  2.80  2.98  0.29  1.98   5.200000  1.08   
9    13.86  1.35  2.27  16.0   98.0  2.98  3.15  0.22  1.85   7.220000  1.01   
10   14.10  2.16  2.30  18.0  105.0  2.95  3.32  0.22  2.38   5.750000  1.25   
11   14.12 

- Load Seattle Pet Licenses data.

In [281]:
SPL_df = pd.read_csv('Seattle_Pet_Licenses.csv')
print("The first 10 lines of Seattle Pet Licenses Dataset:")
print(SPL_df[:10]) # Give a sample to show data structure.

The first 10 lines of Seattle Pet Licenses Dataset:
  License Issue Date License Number Animal's Name Species  \
0      March 29 2005         130651          Ozzy     Dog   
1   December 23 2009         898148          Jack     Dog   
2    January 20 2006          29654        Ginger     Dog   
3   February 07 2006          75432        Pepper     Cat   
4     August 04 2006         729899          Addy     Dog   
5       July 24 2007         437433        Rustie     Dog   
6   December 06 2006         959078          Lady     Dog   
7   November 19 2007         895915         Emily     Cat   
8   December 04 2007         957798        Pancho     Dog   
9    January 31 2008          26600       Sampson     Dog   

                       Primary Breed      Secondary Breed ZIP Code  
0  Dachshund, Standard Smooth Haired                  NaN    98104  
1               Schnauzer, Miniature         Terrier, Rat    98107  
2                  Retriever, Golden  Retriever, Labrador    98117  


- Load Volcano data.

In [282]:
volcano_df = pd.read_csv('volcano.csv')
print("The Volcano Dataset.")
print(volcano_df)

The Volcano Dataset.
    Unnamed: 0   V1   V2   V3   V4   V5   V6   V7   V8   V9 ...   V52  V53  \
0            1  100  100  101  101  101  101  101  100  100 ...   107  107   
1            2  101  101  102  102  102  102  102  101  101 ...   108  108   
2            3  102  102  103  103  103  103  103  102  102 ...   109  108   
3            4  103  103  104  104  104  104  104  103  103 ...   109  109   
4            5  104  104  105  105  105  105  105  104  104 ...   110  109   
5            6  105  105  105  106  106  106  106  105  105 ...   110  110   
6            7  105  106  106  107  107  107  107  106  106 ...   110  111   
7            8  106  107  107  108  108  108  108  107  107 ...   113  112   
8            9  107  108  108  109  109  109  109  108  108 ...   115  114   
9           10  108  109  109  110  110  110  110  109  109 ...   117  115   
10          11  109  110  110  111  111  111  111  110  110 ...   118  116   
11          12  110  110  111  113  112  11

- Load US Population in 500 Cities data.

In [283]:
population_df = pd.read_csv('500_Cities_CDC.csv')
print("The first 10 rows of US Population Dataset with columns needed.")
print(population_df[['StateAbbr', 'PlaceName', 'PlaceFIPS', 'Population2010']][:10]) # First 10 rows to show the data structure.

The first 10 rows of US Population Dataset with columns needed.
  StateAbbr   PlaceName  PlaceFIPS  Population2010
0        AL  Birmingham     107000          212237
1        AL      Hoover     135896           81619
2        AL  Huntsville     137000          180105
3        AL      Mobile     150000          195111
4        AL  Montgomery     151000          205764
5        AL  Tuscaloosa     177256           90468
6        AK   Anchorage     203000          291826
7        AZ    Avondale     404720           76238
8        AZ    Chandler     412000          236123
9        AZ     Gilbert     427400          208453


- We could also load data using Padans and SQLite. Check out [here](https://plot.ly/python/big-data-analytics-with-pandas-and-sqlite/) to learn more.

# Basic Visualization 
Plotly provides tools for different kinds of basic charts. The tutorial will only show basic visualization examples of most frequently used charts. You could find more [here](https://plot.ly/python/basic-charts/).

In [284]:
import plotly.plotly as py
import plotly.graph_objs as go

## Bar Charts on Wine Dataset

Bar charts present categorical data with rectangular bars with heights or lengths proportional to the values that they represent. So, it's a good choice for classification dataset. Here we give a sample of bar chart on Wine Dataset in sklearn.

In [285]:
wine_data = [go.Bar(x=['Class-1','Class-2','Class-3'],
y=[wine_df.loc[wine_df['Class']==0].shape[0], 
   wine_df.loc[wine_df['Class']==1].shape[0],
   wine_df.loc[wine_df['Class']==2].shape[0]])]

layout = go.Layout(title='Bar Chart on Wine Dataset',
                   xaxis=dict(title='Wine Dataset - Class of Wine'), yaxis=dict(title='Count'))

bar_fig = go.Figure(data=wine_data, layout=layout)
py.iplot(bar_fig)

![Capture3.JPG](attachment:Capture3.JPG)
You could check the interactive plot [here](https://plot.ly/~kuozhao/10/).

## Pie Charts on Wine Dataset

Pie chart is circular statistical graphic which is divided into slices to illustrate proportion.
So, it's also appropriate for classification dataset. Here we give a sample of pie chart on Wine Dataset in sklearn.

In [286]:
wine_data = [go.Pie(labels=['Class-1','Class-2','Class-3'], 
                   values=[wine_df.loc[wine_df['Class']==0].shape[0], 
                           wine_df.loc[wine_df['Class']==1].shape[0],
                           wine_df.loc[wine_df['Class']==2].shape[0]])]

py.iplot(wine_data, filename='basic_pie_chart')

![Capture4.JPG](attachment:Capture4.JPG)
You could check the interactive plot [here](https://plot.ly/~kuozhao/12/).

## Histograms on Iris Dataset

Histogram is an accurate representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable. Iris data is continuous on the Sepral Width and here is the sample of historgram on Sepal Width vs Count.

In [287]:
# Add a color and line to make it clear
iris_data = [go.Histogram(x=iris.data[:,0],             
                marker=dict(
                color='rgb(102,204,250)',
                line=dict(
                    color='rgb(8,48,107)',
                    width=1.5),
            ),)]

layout = go.Layout(title='Histograms on Iris Dataset - S_Width',
xaxis=dict(title='S_Width'), yaxis=dict(title='Count'))

histograms_fig = go.Figure(data=iris_data, layout=layout)
py.iplot(histograms_fig)

![Capture5.JPG](attachment:Capture5.JPG)
You could check the interactive plot [here](https://plot.ly/~kuozhao/14/).

## Scatter Plots On Iris Dataset

Scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. The Iris data distribute on Sepal Length and Sepal Width. 

In [288]:
# Set color for each Class.
color_dict = {'setosa':'rgb(102,154,250)', 'versicolor':'rgb(250,154,100)' ,'virginica':'rgb(102,250,204)'}

iris_data = [go.Scatter(x = iris_df["S_Length"],y = iris_df["S_Width"], mode='markers', 
                        marker=dict(color = [color_dict[c] for c in iris_df['Class']]))]

layout = go.Layout(title='Scatter Plots on Iris Dataset - S_Length vs S_Width',
xaxis=dict(title='S_Length'),
yaxis=dict(title='S_Width'))

SP_fig = go.Figure(data=iris_data, layout=layout)
py.iplot(SP_fig)

![Capture6.JPG](attachment:Capture6.JPG)
You could check the interactive plot [here](https://plot.ly/~kuozhao/16/).

## Bubble Charts on Iris Dataset

Bubble chart is a type of chart that displays three dimensions of data. Based on the scatter plot on Iris Dataset, we add Petal Length as the third dimension. This will help show the distribution by the third variable.

In [289]:
# Set color for each Class.
color_dict = {'setosa':'rgb(102,154,250)', 'versicolor':'rgb(250,154,100)' ,'virginica':'rgb(102,250,204)'}

iris_data = [go.Scatter(x = iris_df["S_Length"],y = iris_df["S_Width"], mode = 'markers', 
                    marker=dict(color = [color_dict[c] for c in iris_df['Class']],       # Set colors for each class.
                    size=iris_df["P_Length"] * 5))]                                      # Multiply by 5 to make the maker clear

layout = go.Layout(title='Iris Dataset - S_Length vs S_Width with P_Length as 3rd Dimension',
xaxis=dict(title='S_Length'),
yaxis=dict(title='S_Width'))

bubble_fig = go.Figure(data=iris_data, layout=layout)
py.iplot(bubble_fig)

![Capture7.JPG](attachment:Capture7.JPG)
You could check the interactive plot [here](https://plot.ly/~kuozhao/18/).

## Box Plot on Iris Dataset

Boxplot is a method for graphically depicting groups of numerical data through their quartiles. It contains the mean value, upper bound, lower bound as well as the ranges of distribution of data. The iris data distribute through Sepal Width and the distributions differ from 3 catogeries, we could show them clearly using box plot. 

In [290]:
iris_data = [go.Box(y=iris_df.loc[iris_df["Class"]=='setosa','S_Width'],name='Setosa'),
go.Box(y=iris_df.loc[iris_df["Class"]=='versicolor','S_Width'],name='Versicolor'),
go.Box(y=iris_df.loc[iris_df["Class"]=='virginica','S_Width'],name='Virginica')]

layout = go.Layout(title='Box Plot on Iris Dataset - S_Width',
yaxis=dict(title='S_Width'))

boxplot_fig = go.Figure(data=iris_data, layout=layout)
py.iplot(boxplot_fig)

![Capture8.JPG](attachment:Capture8.JPG)
You could check the interactive plot [here](https://plot.ly/~kuozhao/20/).

## Line Plot on Seattle Pet Licenses Dataset
Line plot is a graph that shows frequency of data along a number line. The number of pet licenses changes with time, so we could plot the changes as time order.  

- Time Series charts on Seattle Pet Licenses Dataset

In [291]:
# Add a column 'License Issue Year' in ordre to group the records by year.
SPL_df['License Issue Year'] = pd.Series([date[-4:] for date in SPL_df['License Issue Date']],  index=SPL_df.index)

SPL_data = [go.Scatter(x=[str(year) for year in range(2010,2017)], 
                       y=[SPL_df.loc[SPL_df['License Issue Year']==str(year)].shape[0] for year in range(2010,2017)])]
layout = go.Layout(
title='Time Series Plot on Seatlle Pet Licenses Dataset',
xaxis=dict(title='Time', range = ['2010-01-01','2016-12-31']),
yaxis=dict(title='Pet Licenses Count'))

time_series_fig = go.Figure(data=SPL_data, layout=layout)
py.iplot(time_series_fig)

![Capture9.JPG](attachment:Capture9.JPG)
You could check the interactive plot [here](https://plot.ly/~kuozhao/22/).

- Time Series charts with traces on Seattle Pet Licenses Dataset

In [292]:
# Split the dataset by 'Species'.
dog_trace = go.Scatter(name='Dog Licenses', x=[str(year) for year in range(2010,2017)], 
                       y=[SPL_df.loc[(SPL_df['License Issue Year']==str(year))  & (SPL_df['Species'] == 'Dog')].shape[0] for year in range(2010,2017)])
cat_trace = go.Scatter(name='Cat Licenses', x=[str(year) for year in range(2010,2017)], 
                      y=[SPL_df.loc[(SPL_df['License Issue Year']==str(year))  & (SPL_df['Species'] == 'Cat')].shape[0] for year in range(2010,2017)])
SPL_data = [dog_trace, cat_trace]

layout = go.Layout(
title='Time Series Plot by Species on Seatlle Pet Licenses Dataset',
xaxis=dict(title='Time', range = ['2010-01-01','2016-12-31']),
yaxis=dict(title='Pet Licenses Count'))

time_series_fig = go.Figure(data=SPL_data, layout=layout)
py.iplot(time_series_fig)

![Capture10.JPG](attachment:Capture10.JPG)
You could check the interactive plot [here](https://plot.ly/~kuozhao/24/).

# Further Visualization

## Heat Maps on Volcano Dataset

A heat map is a graphical representation of data where the individual values contained in a matrix are represented as colors. Since the volcano dataset is a matrix of 2 dimensions, we could show the attitude using heat map.

In [293]:
volcano_data = [go.Heatmap(z=volcano_df.as_matrix())]
layout = go.Layout(title='Heat Maps on Volcano Dataset')
heat_map_fig = go.Figure(data=volcano_data, layout=layout)
py.iplot(heat_map_fig)

![Capture11.JPG](attachment:Capture11.JPG)
You could check the interactive plot [here](https://plot.ly/~kuozhao/26/).

## Topographical 3D Surface Plot on Volcano Dataset

Surface Plot is a more intuitive to show the 3D shape of the volacon.

In [294]:
volcano_data = [go.Surface(z=volcano_df.as_matrix())]
layout = go.Layout(title='Topographical 3D Surface Plot on Volcano Dataset', autosize=False, width=800, height=800, 
                   margin=dict(l=50, r=50, b=50, t=50))

top_3D_surface_fig = go.Figure(data=volcano_data, layout=layout)
py.iplot(top_3D_surface_fig)

![Capture12.JPG](attachment:Capture12.JPG)
You could check the interactive plot [here](https://plot.ly/~kuozhao/28/).

## Choropleth Maps on Population Dataset 

A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map. Since the population data has the location attribute, we could show it intuitively by state on the map.

In [295]:
population_data = [ dict(
        type='choropleth',
        colorscale = [[0.0, 'rgb(242,240,247)'],[0.2, 'rgb(218,218,235)'],[0.4, 'rgb(188,189,220)'],\
            [0.6, 'rgb(158,154,200)'],[0.8, 'rgb(117,107,177)'],[1.0, 'rgb(84,39,143)']],
        autocolorscale = [[0.0, 'rgb(242,240,247)'],[0.2, 'rgb(218,218,235)'],[0.4, 'rgb(188,189,220)'],\
            [0.6, 'rgb(158,154,200)'],[0.8, 'rgb(117,107,177)'],[1.0, 'rgb(84,39,143)']],
        locations = population_df['StateAbbr'],
        z = [population_df.loc[population_df['StateAbbr'] == location]['Population2010'].sum() for location in df['StateAbbr']],
        locationmode = 'USA-states',
        marker = dict(
            line = dict (
                color = 'rgb(255,255,255)',
                width = 1
            ) ),
        colorbar = dict(
            title = "Populations")
        ) ]

layout = dict(
        title = 'US Populations by State',
        geo = dict(
            scope='usa',
            projection=dict( type='albers usa' ),
            showlakes = False,
            lakecolor = 'rgb(255, 255, 255)'),
             )
    
choropleth_fig = dict( data=population_data, layout=layout )
py.iplot(choropleth_fig, filename='d3-cloropleth-map' )

![Capture13.JPG](attachment:Capture13.JPG)
You could check the interactive plot [here](https://plot.ly/~kuozhao/6/).

# Summary and References
This tutorial briefly introduce how to visualize data with Plolty Python Library and give some exmaples of visualization. It also explains why we should select different types of plots for specific dataset. It's a vital concept in data science that visualization technologies should be applied according to the circumstances.

The tutorial refers to the following meterials, please check them for more detail.

- Plotly Python Library: https://plot.ly/python/

- Create interactive data visualization using Plotly: https://www.analyticsvidhya.com/blog/2017/01/beginners-guide-to-create-beautiful-interactive-data-visualizations-using-plotly-in-r-and-python/ 