# Introduction
[Plotly](https://plot.ly/) is a handy open source tool for creating, editing and sharing data visualization through internet. It is available in various programming languages and platforms, such as R, Python, and Matlab. 

One of the unique advantage of Plotly is the interactive charts or graphs. After plots and charts are created with Plotly, users can easily move, rotate, and download the plots. If widgets are included, users can also modify the data and get instant update on the plots. Therefore, Plotly is an ideal library for creating data visualization. 

Besides, Plotly is well integrated with jupyter notebook and can host user-created plots in its server, making it easy to share the charts or graphs without creating or publishing an online database. It can save the difficulty of code/data/graph sharing through different systems and platforms.

This tutorial aims to introduce the Plotly python library and some of the beautiful charts and graphs created by Plotly.

To learn more about Plotly, please visit the offical website: https://plot.ly/

# How to start

Use `pip` to install Plotly Python library:

`$ pip install plotly`

We are now ready to start using Plotly.

To upgrade Plotly:

`$ pip install plotly --upgrade`

In order to use the online hosting feature of Plotly, it is necessary to have a Plotly account. Graphs are saved inside the online Plotly account and public hosting is free. Besides, all the plots saved in online account can be edited from Graph, the online chart management platform provided by Plotly. 

In [1]:
import plotly
plotly.tools.set_credentials_file(username='plotly_username', api_key='api_key')

In [2]:
import numpy as np 
import plotly.plotly as py
import plotly.graph_objs as go
from sklearn import datasets

# Common plots

Like many ploting libraries in python, Plotly supports most of the common plots use in data science projects, including scatter plots, line plots, bar charts, and pie charts. Plots created in Plotly can be downloaded in png format, zoom, and scaled using the buttons on the top right corner. 

For demonstration purpose, I am using the iris dataset from scikit-learn library. The dataset can also be downloaded in csv format from UCI Machine Learning Repository http://archive.ics.uci.edu/ml/index.php.


## Scatter and line plots

The scatter and line plots in Plotly use the `scatter` command. The difference is in the parameter `mode`, which can be set to `markers`, `lines`, `text`, any combination joined by "+" or `none`.

Unlike matplotlib library, when drawing plots in Plotly, the drawing function can have only one dimension of the data like x or y, making it easier to draw multiple plots in the same coordinate system. Scatter and line plot tutorial is [here](https://plot.ly/python/line-and-scatter/).


In [3]:
# import iris dataset
iris = datasets.load_iris()

## scatter and line plots

# use the first column Sepal Length to draw a scatter and line plot
scatter_Y= iris.data[:,:1] 

scatter_trace1 = go.Scatter(y=scatter_Y+5, name = "markers", mode = 'markers') # point only plot
scatter_trace2 = go.Scatter(y=scatter_Y, name = "lines",mode = "lines") # line only plot
scatter_trace3 = go.Scatter(y=scatter_Y-5, name = "lines+markers", mode = "lines+markers") # point and line plot

scatter_data = [scatter_trace1,scatter_trace2,scatter_trace3]

py.iplot(scatter_data, filename = "scatter plot")


## Bar chart

Bar charts can be drawn as grouped or stacked, vertical or horizontal, and in various colors and customized base. Bar chart tutorial can be found [here](https://plot.ly/python/bar-charts/).

In [4]:
# Bar chart

setosa = iris.data[iris.target==0]
versicolor = iris.data[iris.target==1]
virginica = iris.data[iris.target==2]

bar_X = ["Sepal Length", "Sepal Width", "Petal Length", "Petal Width"]
bar_trace1 = go.Bar(x = bar_X, y = np.mean(setosa,axis=0), name = "Setosa",
                    marker = dict(color = "rgb(130,191,110)"))
bar_trace2 = go.Bar(x = bar_X, y = np.mean(versicolor,axis=0),name = "Versicolor",
                    marker = dict(color = "rgb(243,163,42)"))
bar_trace3 = go.Bar(x = bar_X, y = np.mean(virginica,axis=0),name = "Virginica",
                    marker = dict(color = "rgb(60,180,203)"))

layout = go.Layout(
    title='Average on Features',
    yaxis=dict(title='average value'),
    barmode='group',
    bargap=0.05,
    bargroupgap=0.1)

fig = go.Figure(data=[bar_trace1,bar_trace2,bar_trace3], layout=layout)

py.iplot(fig,filename = "bar_chart")

## Histogram

Histogram can be drawn with absolute or normalized number amount, horizontal or vertical, stacked or overlaid, and cumulatively. Tutorial is [here](https://plot.ly/python/histograms/).

In [5]:
# Histogram
hist_trace1 = go.Histogram(x=scatter_Y, marker = dict(color="rgb(255,201,136)"))

layout = go.Layout(bargap=0.05)
fig = go.Figure(data=[hist_trace1], layout=layout)

py.iplot(fig,filename = "histogram")

## Boxplot

boxplot is a less known form of plots but very useful for statistical data. It can show some of the frequently used measures of data, such as the minimum, maximum, median, and first and third quantiles. Boxplot can also highlight the suspected outliners. the Tutorial is [here](https://plot.ly/python/box-plots/).

In [6]:
# Box plot

box_trace1 = go.Box(y=setosa[:,[0]],
                    name='setosa',
                    boxpoints = 'suspectedoutliers')

box_trace2 = go.Box(y=versicolor[:,[0]],
                    name = 'versicolor',
                    boxpoints = 'suspectedoutliers')

box_trace3 = go.Box(y=virginica[:,[0]],
                    name = 'virginica',
                    boxpoints = 'suspectedoutliers')

layout = go.Layout(title="Box plot of sepal length")
fig = go.Figure(data=[box_trace1,box_trace2,box_trace3],layout=layout)
py.iplot(fig,filename="box plot")

For other basic plots, there are a lot of sources in basic plots section in Plotly webiste: https://plot.ly/python/basic-charts/. It is also useful to look at the cheat sheet for some of the frequently used plots: https://images.plot.ly/plotly-documentation/images/python_cheat_sheet.pdf

# Maps

With the deveopment of smart transportation, geographical data is becoming common for data science projects. Plotly have some nicely built functions to visualize location and data in maps.

One ways is to use the `choropleth` functions in Plotly. By setting the location mode to `USA-states` or a country name, the map will match the entries to the region information in given area.

In [8]:
import pandas as pd
usa_df = pd.read_csv('2011_us_ag_exports.csv')
print(usa_df.head())

  code        state category  total exports   beef  pork  poultry   dairy  \
0   AL      Alabama    state        1390.63   34.4  10.6    481.0    4.06   
1   AK       Alaska    state          13.31    0.2   0.1      0.0    0.19   
2   AZ      Arizona    state        1463.17   71.3  17.9      0.0  105.48   
3   AR     Arkansas    state        3586.02   53.2  29.4    562.9    3.53   
4   CA   California    state       16472.88  228.7  11.1    225.4  929.95   

   fruits fresh  fruits proc  total fruits  veggies fresh  veggies proc  \
0           8.0         17.1         25.11            5.5           8.9   
1           0.0          0.0          0.00            0.6           1.0   
2          19.3         41.0         60.27          147.5         239.4   
3           2.2          4.7          6.88            4.4           7.1   
4        2791.8       5944.6       8736.40          803.2        1303.5   

   total veggies  corn  wheat   cotton  
0          14.33  34.9   70.0   317.61  
1   

In [9]:
# create a text column for the text to shown. The text is in HTML format

for col in usa_df.columns:
    usa_df[col]=usa_df[col].astype('str') 
    
usa_df['text'] = (usa_df.state + "<br>" + 
               "Beef " + usa_df.beef + "<br>" + 
               "Pork " + usa_df.pork + "<br>" + 
               "Dairy " + usa_df.dairy + "<br>" + 
               "Fruits " + usa_df['total fruits'] + "<br>" + 
               "Corn " + usa_df.corn + "<br>" + 
               "Cotton " + usa_df.cotton
              )
print(usa_df.head())

  code        state category total exports   beef  pork poultry   dairy  \
0   AL      Alabama    state       1390.63   34.4  10.6   481.0    4.06   
1   AK       Alaska    state         13.31    0.2   0.1     0.0    0.19   
2   AZ      Arizona    state       1463.17   71.3  17.9     0.0  105.48   
3   AR     Arkansas    state       3586.02   53.2  29.4   562.9    3.53   
4   CA   California    state      16472.88  228.7  11.1   225.4  929.95   

  fruits fresh fruits proc total fruits veggies fresh veggies proc  \
0          8.0        17.1        25.11           5.5          8.9   
1          0.0         0.0          0.0           0.6          1.0   
2         19.3        41.0        60.27         147.5        239.4   
3          2.2         4.7         6.88           4.4          7.1   
4       2791.8      5944.6       8736.4         803.2       1303.5   

  total veggies  corn  wheat   cotton  \
0         14.33  34.9   70.0   317.61   
1          1.56   0.0    0.0      0.0   
2    

In [10]:
data = dict(
        type='choropleth',
        locationmode = 'USA-states',
        colorscale = "Greens",
        locations = usa_df['code'],
        z = usa_df['total exports'].astype(float),
        text = usa_df['text'],
        marker = dict(
            line = dict (
                color = 'rgb(255,255,255)',
                width = 2
            ) ),
        colorbar = dict(
            title = "Millions USD")
        )

layout = dict(
        title = 'US Agriculture Export Summary 2011',
        geo = dict(scope='usa',
            projection=dict( type='albers usa'),
            showlakes = True,
            lakecolor = 'rgb(255, 255, 255)'),
             )
    
fig = dict(data=[data], layout=layout)

py.iplot(fig, filename='USA map' )

Another way is to draw maps is to use [Mapbox](https://www.mapbox.com/), a data platform providing building blocks for web application to include location features. Plotly is integrated with Mapbox for plotting longtitude and latitude on interactive maps. To use Mapbox for maps, you need to create a Mapbox account and obtain an access token.

For demonstration purpose, I am using a dataset from [Tianchi](https://tianchi.aliyun.com/index.htm?spm=5176.100066.5610778.6.6a4ad780dkgKPl&_lang=en_US), a Chinese version of Kaggle hosted by Alibaba Group. The dataset is the longtitude and latitude of 10 service branches of a express delivery company in Shanghai, China. More information about the dataset can be found [here](https://tianchi.aliyun.com/datalab/dataSet.htm?id=15).

In [11]:
mapbox_token = "mapbox_token"

In [12]:
# import the csv file of dataset
import csv
import pandas as pd

map_df = pd.read_csv("new_1.csv",header = None)
map_df.columns = ['branch','lon','lat']
print(map_df)

  branch         lon        lat
0   A116  121.226536  31.013124
1   A051  121.746743  31.191404
2   A074  121.490155  31.250216
3   A001  121.486181  31.270203
4   A007  121.640596  31.245883
5   A065  121.529457  31.220285
6   A044  121.133601  31.254956
7   A073  121.486280  31.261953
8   A012  121.282956  31.404118
9   A069  121.372248  31.238009


In [13]:
map_trace1 = go.Scattermapbox(
    lat = map_df.lat,
    lon = map_df.lon,
    mode = 'markers',
    marker = dict(
        size = 17,
        color = 'rgb(84,186,216)',
        opacity = 0.6
    ),
    text = map_df.branch,
    hoverinfo = 'none'
    )

map_trace2 = go.Scattermapbox(
    lat = map_df.lat,
    lon = map_df.lon,
    mode = 'markers',
    marker = dict(
        size = 10,
        color = 'rgb(0,147,198)',
        opacity = 0.6
    ),
    text = map_df.branch,
    hoverinfo = 'text'
    )

layout = go.Layout(
    title = 'Location of Delivery Company Branches',
    autosize = True,
    hovermode = 'closest',
    showlegend = False,
    mapbox = dict(
        accesstoken = mapbox_token,
        bearing = 0,
        pitch = 0,
        zoom = 8,
        style = 'light',
        center = dict(
        lat = map_df.lat.mean(),
        lon = map_df.lon.mean())
    ),
)

fig = dict(data=[map_trace1,map_trace2], layout=layout)

py.iplot(fig, filename='Location of Delivery Company Branches')

# Plot with widgets

We can create a plot with widgets like sliders and dropdown boxes to change the data and get instant update on the plot. When the parameters in the widgets are changed, Plotly will instantly update the plots. Combined with the animation functions, it is ideal for a demonstration of time-series data or a comparison between categorical data. 


In [14]:
# drawing an interactive plot with mean = 0 and standard deviation ranging 
# from 1 to 5 with a step of 0.1

import plotly.figure_factory as ff

# create data for all lines within the range
x = np.linspace(-10,10,1000)
data = [dict(
        visible = False,
        line=dict(color='rgb(240,220,27)', width=4),
        name = "stddev = " + str(sigma),
        x = x,
        y = np.exp(-x**2/(2*sigma**2))/(sigma*np.sqrt(2*np.pi))) 
        for sigma in np.arange(1,5,0.1)]

# set the default line to be visible
data[0]["visible"]=True

steps = []
for i in range(len(data)):
    step = dict(
        label = 1+i/10,
        method = 'restyle',
        args = ['visible', [False] * len(data)],
    )
    step['args'][1][i] = True # Toggle i'th trace to "visible"
    steps.append(step)

# create slider
sliders = [dict(
    active = 10,
    currentvalue = {"prefix": "Stddev = "},
    pad = {"t": 50},
    steps = steps
)]

layout = dict(sliders=sliders, title = "Normal Distribution with Slider")
fig = dict(data=data, layout=layout)

py.iplot(fig, filename='Normal Distribution with Slider')

# Animated plots

It is also handy to create animation with plots in Plotly. The animation is saved under your account in Plotly cloud.

There are two modes for creating animated plots, online and offline. Online mode will create the plot via Plotly's [v2 api](https://api.plot.ly/v2/) and save the plots in Plotly server, while offline mode will save the plots within local environment. In this tutorial, the plot is created offline.

In addition to static plots drawn in data parameter as static plots, animated plots have one extra parameter - frames - denoting the data after each run. The animation will be automatically generated by Plotly with the data. Here is the documentation on [offline mode](https://plot.ly/python/offline/).

Control buttons can also be added as layout to control the animation.

You can refer to the [Plotly website](https://plot.ly/python/animations/) for how to make animation. 

In [15]:
# ploting a quadratic equation curve and a point moving on the curve

from plotly.offline import init_notebook_mode, iplot
from IPython.display import HTML
import numpy as np

# setting offline mode for animation
init_notebook_mode(connected=True)

# create x and y for formula y = x ^ 2
x=np.linspace(-5,5,100)
y=x**2
# create the maximum and minimum for axis
xm=np.min(x)-5
xM=np.max(x)+5
ym=np.min(y)-5
yM=np.max(y)+5

# create coordinate for the moving point
# as the movement between points are in straight line, the number of steps the animation
# is divided into will determine how well the movement fits the curve

N=100
s=np.linspace(-5,5,N)
xx=s
yy=s**2


data=[
      dict(x=x, y=y, 
           mode='lines', 
           line=dict(width=1, color='rgb(120,166,189)'),
           name = "line"
          ),
    dict(x=x, y=y, 
           mode='lines', 
           line=dict(width=1, color='rgb(120,166,189)'),
         name = "line"
          )
    ]

layout=dict(xaxis=dict(range=[xm, xM], autorange=False, zeroline=True,
                      title = "$y = x^2$"),
            yaxis=dict(range=[ym, yM], autorange=False, zeroline=True),
            title='Quadratic equation Curve with Moving Point', 
            updatemenus= [{'type': 'buttons',
                           'buttons': [{'label': 'Play',
                                        'method': 'animate',
                                        'args': [None]}]}],
           showlegend=False)

frames=[dict(data=[dict(x=[xx[k]], 
                        y=[yy[k]], 
                        mode='markers', 
                        marker=dict(color='rgb(186,85,84)', size=10),
                        name = "point"
                        )
                  ]) for k in range(N)]    
          
figure1=dict(data=data, layout=layout, frames = frames)          
iplot(figure1)

# 3D plots

Plotly also supports 3D plots, such as 3D line plots, 3D scatter plots, and 3D surface plots. The 3D plots can be dragged to turn to different angles with mouse. The `camera` parameter can be pre-set to given the users a view of the data from a certain hyperplane, such as X-Y, Y-Z or X-Z. 

In [16]:
# 3D line plots with sepal length, sepal width and petal length of 3 species

trace1 = go.Scatter3d(x=setosa[:,[0]],
                         y=setosa[:,[1]],
                         z=setosa[:,[2]],
                          mode = "markers",
                         marker = dict(
                         size = 4,
                         color = "rgb(255,201,136)"),
                         name="setosa",
                      opacity=0.7
                        )

trace2 = go.Scatter3d(x=versicolor[:,[0]],
                         y=versicolor[:,[1]],
                         z=versicolor[:,[2]],
                          mode = "markers",
                         marker = dict(
                         size = 4,
                         color = "rgb(232,128,172)"),
                         name="versicolor",
                      opacity = 0.7
                        )

trace3 = go.Scatter3d(x=virginica[:,[0]],
                         y=virginica[:,[1]],
                         z=virginica[:,[2]],
                          mode = "markers",
                         marker = dict(
                         size = 4,
                         color = "rgb(124,232,213)"),
                         name="virginica",
                      opacity = 0.7
                        )

background = dict(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230, 230)')

camera = dict(
    up=dict(x=0, y=0, z=1),
    center=dict(x=0, y=0, z=0),
    eye=dict(x=2, y=2, z=0.1)
)

layout = go.Layout(title = "Iris 3D plot",
                   width=600,
                   height=400,
                   autosize = False,
                   scene = dict(
                       xaxis = background,
                       yaxis = background,
                       zaxis = background,
                       camera = camera),
                  )

data = [trace1,trace2,trace3]
fig = go.Figure(data=data,layout=layout)

py.iplot(fig, filename = "3D line plot")

# References

There are many more plotting and design features available in Plotly. If you are interested, you can check out the below links:

1. Plotly Python API library: https://plot.ly/python/
2. Plotly Python full reference: https://plot.ly/python/reference/
3. Jupyter notebook tutorial: https://plot.ly/python/ipython-notebook-tutorial/
4. Scientific charts: https://plot.ly/python/#scientific-charts
5. 3D charts: https://plot.ly/python/#3d-charts

Hope you also enjoy using Plotly to visualize interesting data!
