# Interactive Visualization with Plotly

This tutorial is a part of the [Zero to Data Science Bootcamp by Jovian](https://zerotodatascience.com)

![](https://i.imgur.com/jPcmxke.jpg)

Plotly is a Python library used for creating interactive visualizations (graphs & charts). Unlike Matplotlib and Seaborn which create static images, Plotly renders an HTML document and uses JavaScript under the hood to enable interactivity. Plotly also offers a large selection of chart types to choose from.


The following topics are covered in this tutorial:

- Creating figures & adding interactive elements
- A quick tour of popular interactive charts
- Using Plotly as a plotting backend for Pandas 
- Creating and exploring 3D graphs
- Adding controls and animating graphs

### How to run the code

This tutorial is an executable [Jupyter notebook](https://jupyter.org) hosted on [Jovian](https://www.jovian.ai). You can _run_ this tutorial and experiment with the code examples in a couple of ways: *using free online resources* (recommended) or *on your computer*.

#### Option 1: Running using free online resources (1-click, recommended)

The easiest way to start executing the code is to click the **Run** button at the top of this page and select **Run on Binder**. You can also select "Run on Colab" or "Run on Kaggle", but you'll need to create an account on [Google Colab](https://colab.research.google.com) or [Kaggle](https://kaggle.com) to use these platforms.


#### Option 2: Running on your computer locally

To run the code on your computer locally, you'll need to set up [Python](https://www.python.org), download the notebook and install the required libraries. We recommend using the [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) distribution of Python. Click the **Run** button at the top of this page, select the **Run Locally** option, and follow the instructions.



Let's begin by installing and importing the required libraries.

In [1]:
%pip install pandas numpy plotly nbformat>=4.2.0

In [None]:
import pandas as pd
import numpy as np

## Creating Figures and Adding Interactive Elements

Like Matplotlib, Plotly provides several low-level functions for creating and customizing figures. While they offer fine grained control over various aspects of a graph, they're quite verbose and can be cumbersome to use. We'll start out by using Plotly express, a high-level API similar to Seaborn, that allows creating and customizing charts with a single line of code.

Plotly express is often imported using the alias `px`.

In [None]:
import plotly.express as px

Let's download a [country-wise population dataset](https://data.worldbank.org/indicator/SP.POP.TOTL) from World Bank Open Data.

In [None]:
population_df = pd.read_csv('population.csv', index_col='Year')
population_df[:5]

Let's use `px.line` to create a line chart showing the population of Hungary from 1960 to 2019.

In [None]:
?px.line

In [None]:
px.line(population_df['Hungary'], title="Population")

Note the following:

* `px.line` automatically picks the index of the series as the X-axis. 
* You can hover over any point on the line to view the exact value.
* You can Zoom in and out using the controls to take a closer look at specific areas of the chart. 
* There are several other controls e.g pan, autoscale, download PNG etc.

For fine grained control over various aspects of the chart, we can use the Figure object returned by `px.line`. Let's change the axis labels, chart colors, and ensure that the y axis starts at 0.

In [None]:
fig = px.line(population_df['Hungary'])

In [None]:
# Set axis & legend labels
fig.update_layout(
    title="Year-Wise Population",
    xaxis_title="Year",
    yaxis_title="Population",
    legend_title="Country",
    plot_bgcolor='#ffcc9c',
    font=dict(
        family="Arial",
        size=14,
        color="#cc3e0e"
    )
)

# Start the Y axis from 0
fig.update_yaxes(rangemode='tozero')

Here's a list of properties you can set using `update_layout`: https://plotly.com/python/reference/layout/

Plotly also has built-in support for Pandas dataframes.


**Note**: Sometimes `fig.show()` does not display the graph on your jupyter notebook then you will have to run the code shared below in a code cell.


```
#Import necessary libraries
import plotly.offline as pyo
import plotly.graph_objs as go
#Set notebook mode to work in offline
pyo.init_notebook_mode()

```

In [None]:
europe_df = population_df[['Hungary', 'Czech Republic', 'Switzerland']]
europe_df.head()

In [None]:
fig = px.line(europe_df,
              title='Population', 
              color_discrete_sequence=["aquamarine", "cornflowerblue", "goldenrod"])

fig.update_layout(yaxis_title='Population', 
                  legend_title='Countries', 
                  font_size=14)

fig.update_yaxes(rangemode='tozero')

fig.show()

Note that apart from providing RGB hexcodes for colors, we can also use named CSS colors: https://www.w3schools.com/cssref/css_colors.asp

In [None]:
?px.line

Switching from a line chart to a bar chart is simply a matter of replacing `plt.line` with `plt.bar`.

In [None]:
px.bar(population_df[['Bangladesh', 'Pakistan']], 
       title="Population", 
       barmode='group')

> **EXERCISE**: Compare the annual population increase of India and China using a line chart. Which of the two countries is growing faster? Hint: Use `population_df.diff()`

> **EXERCISE**: Compare the populations of 10 most populous African countries (as of 2019) over the last 50 years using line charts. 

> **EXERCISE**: Explore the documentation for `px.line`, `fig.update_layout` and `fig.update_yaxes`. Use their arguments to style the charts you've created.

## A quick tour of popular interactive charts

Plotly express provides more than 30 figure for creating different types of figures. Let's explore some popular interactive visualization techniques. We'll use the [built-in datasets](https://plotly.com/python-api-reference/generated/plotly.express.data.html) from `px.data` to demonstrate their usage.

### Scatter Plot



In [None]:
iris_df = px.data.iris()
iris_df

In [None]:
px.scatter(iris_df, 
           x="sepal_width", 
           y="sepal_length", 
           color="species",
           size='petal_length', 
           hover_data=['petal_width'])

> **EXERCISE**: Download the dataset `px.data.gapminder()` and visualize the relationship between GDP per capita and life expectancy in the year 2007 using a scatter plot. Set proper titles for the figure and the axes. Color the dots using the values in `continent` column, and show the country name and population on hover. 

In [None]:
px.data.gapminder()

### Bar Chart

We'll use the dataset `medals_long`, which contains represents the medal table for Olympic Short Track Speed Skating for the top three nations as of 2020.

In [None]:
long_df = px.data.medals_long()
long_df

In [None]:
fig = px.bar(long_df, 
             x="nation", 
             y="count", 
             color="medal", 
             title="Olympic Short Track Speed Skating Medals",
             color_discrete_sequence=["#AF9500", "#B4B4B4", "#6A3805"])
fig.show()

> **EXERCISE**: Look up the documentation of `px.bar` and show the bars in the above chart side by side instead of stacking them on top of each other.

> **EXERCISE**: Replicate the above example with a different dataset. Find a dataset online or pick one from this page: https://plotly.com/python-api-reference/generated/plotly.express.data.html

### Treemap and Sunburst

In [None]:
gapminder_df = px.data.gapminder().query("year == 2007")
gapminder_df["world"] = "World"
gapminder_df

In [None]:
fig = px.treemap(gapminder_df, 
                 path=['world','continent', 'country'], 
                 values='pop',
                 color='lifeExp', 
                 color_continuous_scale='RdBu')
fig.show()

> **EXERCISE**: Replace `px.treemap` with `px.sunburst` in the above example and study the figure that's created. Can you tell what it represents?

In [None]:
fig = px.sunburst(gapminder_df, 
                 path=['continent', 'country'], 
                 values='pop',
                 color='lifeExp', 
                 color_continuous_scale='RdBu')
fig.show()

> **EXERCISE**: Replicate the above example with a different dataset. Find a dataset online or pick one from this page: https://plotly.com/python-api-reference/generated/plotly.express.data.html

Learn more about Treemap and Sunburst charts here: 

* https://plotly.com/python/treemaps/
* https://plotly.com/python/sunburst-charts/


### Histogram and Rug Plots

In [None]:
tips_df = px.data.tips()
tips_df

In [None]:
fig = px.histogram(tips_df, 
                   x="total_bill", 
                   color="sex", 
                   marginal="rug", 
                   hover_data=tips_df.columns)
fig.show()

> **EXERCISE**: Replace `"rug"` with `"box"` or `"violin"` in the above example and study the charts. Do you understand what the marginal plots represent?

> **EXERCISE**: Replicate the above example with a different dataset. Find a dataset online or pick one from this page: https://plotly.com/python-api-reference/generated/plotly.express.data.html

Learn more about histograms in Plotly here: https://plotly.com/python/histograms/

### Polar Chart

In [None]:
wind_df = px.data.wind()
wind_df

In [None]:
fig = px.line_polar(wind_df, 
                    r="frequency",
                    theta="direction", 
                    color="strength", 
                    line_close=True,
                    color_discrete_sequence=px.colors.sequential.Plasma_r,
                    template="plotly_dark")
fig.show()

> **EXERCISE**: Replace `line_polar` with `scatter_polar` or `bar_polar` in the example above and interpret the chart. When would you use one vs. the other?

> **EXERCISE**: Replicate the above example with a different dataset. Find a dataset online or pick one from this page: https://plotly.com/python-api-reference/generated/plotly.express.data.html

Learn more about polar charts in plotly here: https://plotly.com/python/polar-chart/

## Using Plotly as a plotting backend for Pandas

We can configure Pandas to use Plotly as the backend for the `plot` methods of Pandas data frames & series. You can learn more about this here: https://plotly.com/python/pandas-backend/. 

The Plotly backend can be enabled as follows:

In [None]:
pd.options.plotting.backend = "plotly"

The Plotly backend supports the following kinds of Pandas plots: `scatter`, `line`, `area`, `bar`, `barh`, `hist` and `box`. Let's look at some examples.

In [None]:
europe_df = population_df[['Germany', 'United Kingdom', 'France', 'Italy']]

europe_df.plot(kind='area', title="Population")

In [None]:
long_df

In [None]:
long_df.plot(x='count', 
             y='nation', 
             kind='barh', 
             color='medal', 
             barmode='group', 
             title="Olympic Short Track Speed Skating Medals",
             color_discrete_sequence=["#AF9500", "#B4B4B4", "#6A3805"])

In [None]:
tips_df

In [None]:
fig = tips_df.plot('total_bill', 
                   kind='hist', 
                   title="Distribution of Total Bill")
fig.update_layout(bargap=0.1)
fig.show()

In [None]:
tips_df.plot('tip', kind='box', color='sex')

> **EXERCISE**: Replicate the above examples with different datasets. Find a dataset online or pick from this page: https://plotly.com/python-api-reference/generated/plotly.express.data.html

## Creating and exploring 3D graphs

Plotly can also be used to create 3D graphs. Let's look at an example of 3D surface plots, by plotting the elevation of a mountain.

In [None]:
import pandas as pd

z_data = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/api_docs/mt_bruno_elevation.csv', index_col=0)
z_data

We need to use the low level `graph_objects` API to create a 3d surface and attach it to a figure.

In [None]:
import plotly.graph_objects as go

surface=go.Surface(z=z_data.values)
fig = go.Figure(surface)
fig.update_layout(title='Mt. Bruno Elevation')
fig.show()

Plotly express can be used to create 3D linear and scatter plots.

In [None]:
df = px.data.gapminder().query("country=='Brazil'")
fig = px.line_3d(df, x="gdpPercap", y="pop", z="year")
fig.show()

In [None]:
iris_df

In [None]:
px.scatter_3d(iris_df, 
              x='sepal_length', 
              y='sepal_width', 
              z='petal_width', color='species')

> **EXERCISE**: Create some 3D surface, line and scatter plots using some other datasets.

## Adding controls and animating graphs

Plotly express graphs can be animated by specifying an `animation_frame` column. Additionally, an `animation_group` can be specified to uniquely identify objects across frames. An animated graph also provides slider controls to skip to any frame.

In [None]:
df = px.data.gapminder()
df

In [None]:
df = px.data.gapminder()

fig = px.scatter(df, 
                 x="gdpPercap", 
                 y="lifeExp", 
                 animation_frame="year", 
                 animation_group="country",
#                  size="pop",     
                 color="continent", 
                 hover_name="country",
                 log_x=True, 
                 size_max=80, 
                 range_x=[100,100000], 
                 range_y=[25,90])

fig.show()

In [None]:
tips_df

In [None]:
px.box(tips_df, x='sex', y='total_bill', color='smoker', animation_frame='day')

> **EXERCISE**: Create some graphs with controls an animations using other datasets.

Learn more about Plotly animations and custom controls here: https://plotly.com/python/#controls

## Summary and Further Reading

We've covered the following topics in this tutorial:

- Creating figures & adding interactive elements
- Replicating common graphs using Plotly
- Using Plotly as a plotting backend for Pandas 
- Creating and exploring 3D graphs
- Adding controls and animating graphs



## Questions for Revision
1.	What is Plotly?
2.	How is Plotly different from Matplotlib and Seaborn?
3.	How do you plot line chart with Plotly? Illustrate with an example.
4.	What is the difference between `plt.bar`, `sns.barplot` and `px.bar`?
5.	What are the popular interactive charts in Plotly?
6.	What is the `color_discrete_sequence parameter` in Plotly?
7.	How do you set titles for axes in Plotly?
8.	What is `hover_data` parameter in Plotly?
9.	How do you plot sunburst chart with Plotly? Illustrate with an example.
10.	Illustrate the usage of a Treemap in Plotly.
11.	What is marginal parameter in Plotly?
12.	What is a polar chart? Illustrate with an example.
13.	How can you use Plotly as backend for plot methods in Pandas dataframes and series? 
14.	What is a boxplot? Plot it using Plotly.
15.	What is `graph_objects` in Plotly?
16.	How do you import `graph_objects`?
17.	What kind of 3D plots can you be plotted using Plotly?
18.	What is an animated graph? Illustrate with an example.
19.	What is an `animation_frame`?
20.	What is an `animation_group`?

## Solutions for Exercises


> **EXERCISE**: Compare the annual population increase of India and China using a line chart. Which of the two countries is growing faster? Hint: Use `population_df.diff()`

In [None]:
annual_population_increase_df = population_df[["India", "China"]].diff(axis = 0)

In [None]:
fig =px.line(annual_population_increase_df, title = 'Annual population increase of India and China')
fig.update_layout(yaxis_title='Population Increase', 
                  xaxis_title='Year',
                  legend_title='Countries')
fig.show()

**OBSERVATION**: Toppr - Rapid growth of population is known as *Exponential growth* where the population is increased with the constant increasing birth rate. From the above graph, China has a rapid growth and decrease in population where as India has a consistent growth of population over the years.

> **EXERCISE**: Compare the populations of 10 most populous African countries (as of 2019) over the last 50 years using line charts. 

In [None]:
continent_df=pd.read_csv('https://raw.githubusercontent.com/dbouquin/IS_608/master/NanosatDB_munging/Countries-Continents.csv')
african_countries=list(continent_df[continent_df['Continent']=='Africa']['Country'])

In [None]:
population_of_2019=pd.DataFrame(population_df.loc[2019]).reset_index()
population_of_2019.rename(columns = {'index':'Country', 2019:'Population'}, inplace = True)
is_african_country = population_of_2019.Country.isin(african_countries)
top_10_african_countries = population_of_2019[is_african_country].sort_values('Population', ascending = False).head(10)
list_of_top_10_african_countries = list(top_10_african_countries['Country'])
Most_populous_african_countries = population_df[list_of_top_10_african_countries].loc[1970:]

In [None]:
fig = px.line(Most_populous_african_countries)
fig.show()

**OBSERVATION**: Nigeria has the highest population growth in the last 50 years.

> **EXERCISE**: Explore the documentation for `px.line`, `fig.update_layout` and `fig.update_yaxes`. Use their arguments to style the charts you've created.

In [None]:
fig = px.line(Most_populous_african_countries,title='Top 10(2019) African countries population growth over 50 years')
fig.update_layout(yaxis_title='Population', 
                  xaxis_title='Year',
                  legend_title='Countries')
fig.update_xaxes(color='brown')
fig.update_yaxes(color='brown')

> **EXERCISE**: Download the dataset `px.data.gapminder()` and visualize the relationship between GDP per capita and life expectancy in the year 2007 using a scatter plot. Set proper titles for the figure and the axes. Color the dots using the values in `continent` column, and show the country name and population on hover. 

In [None]:
gapminder_df=px.data.gapminder()

In [None]:
data_of_2007 = gapminder_df[gapminder_df.year ==2007]

In [None]:
fig=px.scatter(data_of_2007, 
           x="gdpPercap", 
           y="lifeExp", 
           color="continent", 
           hover_data=['country',"pop"],
           title='Relationship between GDP per capita and life expectancy in the year 2007')
fig.update_layout(xaxis_title='GDP per capita', 
                  yaxis_title='Life expectancy',
                  legend_title='Continent')

**OBSERVATION**: GDP per capita and life expectancy in the year 2007 seem to have a 'curvilinear relationship'. 
- Curvilinear relationship - A Curvilinear Relationship is a type of relationship between two variables where as one variable increases, so does the other variable, but only up to a certain point, after which, as one variable continues to increase, the other decreases.

> **EXERCISE**: Look up the documentation of `px.bar` and show the bars in the above chart side by side instead of stacking them on top of each other. 

- [*Reference to Bar Chart `long_df` visualization.*](https://jovian.ai/aakashns/interactive-visualization-plotly/v/18#C42)

In [None]:
fig = px.bar(long_df, 
             x="nation", 
             y="count", 
             color="medal", 
             title="Olympic Short Track Speed Skating Medals",
             barmode = "group",
             color_discrete_sequence=["#AF9500", "#B4B4B4", "#6A3805"])
fig.show()

> **EXERCISE**: Replicate the above example with a different dataset. Find a dataset online or pick one from this page: https://plotly.com/python-api-reference/generated/plotly.express.data.html

- [*Reference to Bar chart visualization exercise*](https://jovian.ai/aakashns/interactive-visualization-plotly/v/18#C42)

In [None]:
tips_df =  px.data.tips()

In [None]:
tips_df

In [None]:
fig = px.bar(tips_df,
            x="time",
            y="tip",
            color="sex",
            title = "Tips received at different times of the day",
            barmode='group',
            color_discrete_sequence=["#AF9500", "#B4B4B4"])
fig.show()            

**OBSERVATION**: Tips for male during the dinner time of the day are highest. Insteresting. Evening/Night shifts are taken up by men mostly and that could possibly be the reason for the huge difference. 

> **EXERCISE**: Replicate the above example with a different dataset. Find a dataset online or pick one from this page: https://plotly.com/python-api-reference/generated/plotly.express.data.html

- [*Reference to treemap and sunburst visualization exercise.*](https://jovian.ai/aakashns/interactive-visualization-plotly/v/18#C52)

In [None]:
elections = px.data.election()
elections.head(5)

In [None]:
fig = px.treemap(elections,
               path=["winner","result","district_id"],
               values= "total",
               color_continuous_scale="RdBu")
fig.show()

In [None]:
fig = px.sunburst(elections,
               path=["winner","result","district_id"],
               values= "total",
               color_continuous_scale="RdBu")
fig.show()

**OBSERVATION**: Coderre has highest number of votes with maximum plurality wins and Joly has has the least number of votes with only two majority wins.

> **EXERCISE**: Replace `"rug"` with `"box"` or `"violin"` in the above example and study the charts. Do you understand what the marginal plots represent?

- [*Reference to `tips_df` histogram visualization.*](https://jovian.ai/aakashns/interactive-visualization-plotly/v/18#C62)

In [None]:
fig = px.histogram(tips_df, 
                   x="total_bill", 
                   color="sex", 
                   marginal="box", 
                   hover_data=tips_df.columns)
fig.show()

In [None]:
fig = px.histogram(tips_df, 
                   x="total_bill", 
                   color="sex", 
                   marginal="violin", 
                   hover_data=tips_df.columns)
fig.show()

**OBSERVATION**: Marginal plots represent the small subplots above the main plot, which show the distribution of data along only one dimension. 

> **EXERCISE**: Replicate the above example with a different dataset. Find a dataset online or pick one from this page: https://plotly.com/python-api-reference/generated/plotly.express.data.html

- [*Reference to histogram visualization exercise.*](https://jovian.ai/aakashns/interactive-visualization-plotly/v/18#C62)

In [None]:
car_share=px.data.carshare()
car_share

In [None]:
fig = px.histogram(car_share, 
                   x="car_hours", 
                   marginal="rug", 
                   hover_data=car_share.columns)
fig.show()

In [None]:
fig = px.histogram(car_share, 
                   x="car_hours", 
                   marginal="box", 
                   hover_data=car_share.columns)
fig.show()

In [None]:
fig = px.histogram(car_share, 
                   x="car_hours", 
                   marginal="violin", 
                   hover_data=car_share.columns)
fig.show()

**OBSERVATION**: `car_hours` seem to follow a normal distribution with few outliers.

> **EXERCISE**: Replace `line_polar` with `scatter_polar` or `bar_polar` in the example above and interpret the chart. When would you use one vs. the other?

- [*Reference to `wind_df` polar chart visualization.*](https://jovian.ai/aakashns/interactive-visualization-plotly/v/18#C74)

In [None]:
fig = px.scatter_polar(wind_df, 
                    r="frequency",
                    theta="direction", 
                    color="strength", 
                    color_discrete_sequence=px.colors.sequential.Plasma_r,
                    template="plotly_dark")
fig.show()

In [None]:
fig = px.bar_polar(wind_df, 
                    r="frequency",
                    theta="direction", 
                    color="strength",
                    color_discrete_sequence=px.colors.sequential.Plasma_r,
                    template="plotly_dark")
fig.show()

**OBSERVATION**: Bar polar charts are mostly used for categorical data where as scatter polar charts are used for data containing amount and direction values.

> **EXERCISE**: Replicate the above example with a different dataset. Find a dataset online or pick one from this page: https://plotly.com/python-api-reference/generated/plotly.express.data.html

- [*Reference to polar chart visualization exercise.*](https://jovian.ai/aakashns/interactive-visualization-plotly/v/18#C74)

In [None]:
tips_df

In [None]:
fig = px.scatter_polar(tips_df, 
                    r="total_bill",
                    theta="day", 
                    color="time",
                    template="plotly_dark")
fig.show()


**OBSERVATION**: Dinner time of the day during weekends recorded the highest paid bills at the restaurant.

In [None]:
fig = px.bar_polar(tips_df, 
                    r="tip",
                    theta="day", 
                    color="time",
                    template="plotly_dark")
fig.show()


**OBSERVATION**: Dinner time of the day during weekends received the maximum number of tips.

> **EXERCISE**: Replicate the above examples with different datasets. Find a dataset online or pick from this page: https://plotly.com/python-api-reference/generated/plotly.express.data.html

- [*Reference to Using Plotly as a plotting backend for Pandas exercise.*](https://jovian.ai/aakashns/interactive-visualization-plotly/v/18#C83)

In [None]:
elections.head()

In [None]:
elections.plot(x='total', 
             y='winner',  
             color='result', 
             kind='barh',
             barmode='group',
             title="Elections total votes of the winners",
             color_discrete_sequence=["#AF9500", "#B4B4B4", "#6A3805"])

**OBSERVATION**: Coderre has highest number of votes with maximum plurality wins and Joly has has the least number of votes with only two majority wins.

In [None]:
fig = elections.plot('winner', 
                   kind='hist', 
                   title="Distribution of Winners")
fig.update_layout(bargap=0.1)
fig.show()

**OBSERVATION**: Coderre has the maximum number of entries in the `elections` dataset indicating that this person must have won elections in many districts. 

In [None]:
gapminder_df.plot('lifeExp', kind='box', color='continent')

**OBSERVATION**: Europe and African continents have more outliers when compared to other continents in Life Expextancy. Asia has the highest Life Expectancy with 82.6 years and Africa the least Life Expectancy with 23.5 years.

> **EXERCISE**: Create some 3D surface, line and scatter plots using some other datasets.

- [*Reference to Creating and exploring 3D graphs exercise.*](https://jovian.ai/aakashns/interactive-visualization-plotly/v/18#C96)

In [None]:
px.scatter_3d(elections, 
              x='winner', 
              y='total', 
              z='Joly', color='result')

> **EXERCISE**: Create some graphs with controls an animations using other datasets.

- [*Reference to Adding controls and animating graphs exercise.*](https://jovian.ai/aakashns/interactive-visualization-plotly/v/18#C108)

In [None]:
df = px.data.carshare()
df.sort_values(by='peak_hour',inplace=True)

In [None]:
fig = px.bar(df, 
                 x=df.index, 
                 y="car_hours", 
                 animation_frame="peak_hour",  
                 log_x=True,
                 range_x=[10,100])

fig.show()