# 9. Plotting with plotly

So far in this tutorial we have been using seaborn and pandas, two mature libraries designed around matplotlib.
These libraries all focus on building "static" visualizations: visualizations that have no moving parts. In other words,
all of the plots we've built thus far could appear in a dead-tree journal article.

The web unlocks a lot of possibilities when it comes to interactivity and animations. There are a number of plotting libraries available which try to provide these features.

In this section we will examine plotly, an open-source plotting library that's one of the most popular of these libraries.


In [2]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import chart_studio.plotly as py
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
import plotly.graph_objs as go


In [4]:
root_path="https://minio.lab.sspcloud.fr/pengfei/diffusion/data_viz"
university_input_file = f'{root_path}/university.csv'

In [5]:
university=pd.read_csv(university_input_file)

In [6]:
university.head()

Unnamed: 0,world_rank,university_name,country,teaching,international,research,citations,income,total_score,num_students,student_staff_ratio,international_students,female_male_ratio,year
0,1,Harvard University,United States of America,99.7,72.4,98.7,98.8,34.5,96.1,20152,8.9,25%,,2011
1,2,California Institute of Technology,United States of America,97.7,54.6,98.0,99.9,83.7,96.0,2243,6.9,27%,33 : 67,2011
2,3,Massachusetts Institute of Technology,United States of America,97.8,82.3,91.4,99.9,87.5,95.6,11074,9.0,33%,37 : 63,2011
3,4,Stanford University,United States of America,98.3,29.5,98.1,99.2,64.3,94.3,15596,7.8,22%,42 : 58,2011
4,5,Princeton University,United States of America,90.9,70.3,95.4,99.9,-,94.2,7929,8.4,27%,45 : 55,2011


In [7]:
university.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2603 entries, 0 to 2602
Data columns (total 14 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   world_rank              2603 non-null   object 
 1   university_name         2603 non-null   object 
 2   country                 2603 non-null   object 
 3   teaching                2603 non-null   float64
 4   international           2603 non-null   object 
 5   research                2603 non-null   float64
 6   citations               2603 non-null   float64
 7   income                  2603 non-null   object 
 8   total_score             2603 non-null   object 
 9   num_students            2544 non-null   object 
 10  student_staff_ratio     2544 non-null   float64
 11  international_students  2536 non-null   object 
 12  female_male_ratio       2370 non-null   object 
 13  year                    2603 non-null   int64  
dtypes: float64(4), int64(1), object(9)
memor

## 9.1 Line chart

In below line chart, we compare the citation with teaching for the top 100 universities in the world rank

In [10]:
top_100 = university.iloc[:100,:]

# first line plot for citation
citation = go.Scatter(
x=top_100.world_rank,
y=top_100.citations,
mode = "lines",
marker={"color":"rgba(48, 64, 101, 0.8)"},
text= top_100.university_name)

# second line plot for teaching
teaching = go.Scatter(
x=top_100.world_rank,
y=top_100.teaching,
mode = "lines+markers",
marker=dict(color="rgba(0,123,221,.8)"),
text=top_100.university_name)
data1 = [citation,teaching]
layout = dict(title = "Citition and Teaching vs World Rank", xaxis= dict(title= 'World Rank',ticklen= 5,zeroline= False))
fig = dict(data = data1, layout = layout)
iplot(fig)

## 9.2 Scatter plot

This time, we want to see the citation number of 2014, 2015 and 2016 of the top 100 universities (world rank)

In [14]:

rank_2014 = university[university.year == 2014].iloc[:100,:]
rank_2015 = university[university.year == 2015].iloc[:100,:]
rank_2016 = university[university.year == 2016].iloc[:100,:]

In [15]:
trace1 =go.Scatter(
                    x = rank_2014.world_rank,
                    y = rank_2014.citations,
                    mode = "markers",
                    name = "2014",
                    marker = dict(color = 'rgba(255, 128, 255, 0.8)'),
                    text= rank_2014.university_name)
# creating trace2
trace2 =go.Scatter(
                    x = rank_2015.world_rank,
                    y = rank_2015.citations,
                    mode = "markers",
                    name = "2015",
                    marker = dict(color = 'rgba(255, 128, 2, 0.8)'),
                    text= rank_2015.university_name)
# creating trace3
trace3 =go.Scatter(
                    x = rank_2016.world_rank,
                    y = rank_2016.citations,
                    mode = "markers",
                    name = "2016",
                    marker = dict(color = 'rgba(0, 255, 200, 0.8)'),
                    text= rank_2016.university_name)
data1 = [trace1, trace2, trace3]
layout = dict(title = 'Citation vs world rank of top 100 universities with 2014, 2015 and 2016 years',
              xaxis= dict(title= 'World Rank',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Citation',ticklen= 5,zeroline= False)
             )
fig = dict(data = data1, layout = layout)
iplot(fig)

## 9.3 Bar chart


### 9.3.1 Standard bar plot

Below chart shows the top 100 university group by country in 2016. x is the country name, y is the number of top 100 university that country has.


In [16]:

index = rank_2016["country"].value_counts().head(3).index
value = rank_2016["country"].value_counts().head(3).values
trace1 = go.Bar(
x = index,
y = value,
marker = {"color":"rgba(131,26,93,0.4)"}
)
data4 = [trace1]

iplot(data4)

Below bar chart shows citations and teaching of top 3 universities in 2014

In [17]:

x2014 = university[university.year == 2014].iloc[:3,:]
trace1 = go.Bar(
x = x2014.university_name,
y = x2014.citations,
name="citiations",
marker = {"color":"rgba(111,23,155,0.5)"},
text=x2014.university_name
)
trace2 = go.Bar(
x = x2014.university_name,
y = x2014.teaching,
name="teaching",
marker = {"color":"rgba(47,69,187)"},
text = x2014.university_name
)
data3 = [trace1,trace2]
layout = go.Layout(barmode = "group")
fig = go.Figure(data = data3, layout = layout)
iplot(fig)

### 9.3.2 Stacked bar plot

We can draw bar plot with a different type. It's called **stacked bar plot**, in plotly this is called `relative bar plot`

In [18]:

layout = go.Layout(barmode = "relative")
fig = go.Figure(data = data3,layout = layout)
iplot(fig)

###

In below chart, we want to see the students number, teaching quality of the top 20 university.

The diameter of the circle is the number of the students, x is the rank of university, y is the teaching score

In [25]:
rank_2016_20 = university[university.year == 2016].iloc[:20,:]
num_students_size  = [float(each.replace(',', '.')) for each in rank_2016_20.num_students]
international_color = [float(each) for each in rank_2016_20.international]
trace1 = go.Scatter(
x = rank_2016_20.world_rank,
y = rank_2016_20.teaching,
mode = "markers",
marker=dict(
color = international_color,
size = num_students_size,
showscale = True
),
text = rank_2016_20.university_name
)
data5 = [trace1]
iplot(data5)

## 9.4 3D Scatter plot

As we know a 2D scatter plot can only express 2 numeric variable (hue can express only categorical not numeric)

But in plotly, we can use a 3d plot by adding an extra dimension with color and size to express 3 numeric variables.

In below plot, x axis express university world rank, y axis express teaching score,z axis express citiation number, color is research, size is total score in 2016

In [22]:

color_3d = rank_2016_20.research
rank_2016_20.total_score = rank_2016_20.total_score.astype(float)
# the size here is the diameter of the circle, it must be numeric
size_3d = rank_2016_20.total_score
trace = go.Scatter3d(
x = rank_2016_20.world_rank,
y = rank_2016_20.teaching,
z = rank_2016_20.citations,
mode = "markers",
# you can replace the size value by size_3d, and check the result
marker = dict ( color = color_3d,size = 10),
text = rank_2016_20.university_name
)
data6 = [trace]
layout = go.Layout(margin = dict(l=0,r=0,b=0,t=0))
fig = go.Figure(data = data6,layout = layout)
iplot(fig)

## 9.5 Faceting

We can add multiple graph in the same plot in plotly. For example scatter + line, or line + histogram.

Below is an example

In [26]:
#we draw plot has scatter plot + bar plot
trace1 = go.Bar(
x = rank_2016_20.world_rank,
y = rank_2016_20.teaching,
marker = dict(color = "rgba(25,63,86,0.5)"),
xaxis = "x2",
yaxis ="y2",
name = "t"
)
trace2 = go.Scatter(
x = rank_2016_20.world_rank,
y = rank_2016_20.citations,
mode = "markers",
marker = dict (color = "rgba(52,65,144,0.55)"),
text = rank_2016_20.world_rank,
name = "c"
)
data7 = [trace1,trace2]
layout = go.Layout(
xaxis2=dict(
        domain=[0.6, 0.95],
        anchor='y2',
    ),
    yaxis2=dict(
        domain=[0.6, 0.95],
        anchor='x2',
    ),
title = "teaching and citations vs university rank"
)
fig = go.Figure(data = data7,layout = layout)
iplot(fig)

One more figure with multi plot.This plot is line + box plot (it is be box plot)


In [None]:
university.total_score.replace("-",10,inplace = True)
university.total_score = university.total_score.astype(float)

top_100 = university.iloc[:100,:]

In [27]:
trace1 = go.Scatter(
x = top_100.world_rank,
y = top_100.citations,
mode = "lines",
marker = dict(color = "rgba(200,140,61,0.4)"),
text = top_100.university_name
)
trace2 = go.Box(
y = top_100.total_score,
name = "total score",
marker = dict(color = "rgba(42,155,204,0.6)"),
xaxis = "x2",
yaxis = "y2" ,
)
data9 = [trace1,trace2]
layout = go.Layout(
 xaxis2=dict(
        domain=[0.6, 0.95],
        anchor='y2',
    ),
    yaxis2=dict(
        domain=[0.6, 0.95],
        anchor='x2',
    ),
title = "researching and teaching"
)
fig = go.Figure(data = data9,layout = layout)
iplot(fig)

### 9.5.2 Add multiple plot in a figure

We can also draw different content or consept plot in one figure. we can see on multiple plot different feature's plot.

Below figure shows 4 bar plot of citations,teaching,research and total score in the same figure.

In [28]:

trace0 = go.Scatter(
x = top_100.world_rank,
y = top_100.teaching,
mode = "lines",
name = "teaching",
marker = dict(color = 'rgba(12, 12, 140,.4)'),
text = top_100.university_name
)
trace1 = go.Scatter(
x = top_100.world_rank,
y = top_100.citations,
mode ="lines + markers",
name = "citation",
marker = dict(color = "rgba(155,98,160,.6)"),
xaxis = "x2",
yaxis = "y2",
text = top_100.university_name
)
trace2 = go.Scatter(
x = top_100.world_rank,
y= top_100.total_score,
mode = "lines",
name = "total",
marker = {"color":"rgba(36,120,153,.4)"},
xaxis = "x3",
yaxis = "y3",
text = top_100.university_name
)
trace3 = go.Scatter(
x = top_100.world_rank,
y = top_100.research,
mode = "lines + markers",
name = "research",
marker = {"color":"rgba(65,46,178,0.4)"},
xaxis = "x4",
yaxis = "y4",
text = top_100.university_name
)
data9 = [trace0,trace1,trace2,trace3]
layout = go.Layout(
    xaxis=dict(
        domain=[0, 0.45]
    ),
    yaxis=dict(
        domain=[0, 0.45]
    ),
    xaxis2=dict(
        domain=[0.55, 1]
    ),
    xaxis3=dict(
        domain=[0, 0.45],
        anchor='y3'
    ),
    xaxis4=dict(
        domain=[0.55, 1],
        anchor='y4'
    ),
    yaxis2=dict(
        domain=[0, 0.45],
        anchor='x2'
    ),
    yaxis3=dict(
        domain=[0.55, 1]
    ),
    yaxis4=dict(
        domain=[0.55, 1],
        anchor='x4'
    ),
    title = "Multiple"
)
fig = go.Figure(data = data9,layout = layout)
iplot(fig)

## 9.6 Scatter Matrix Plots

if we want to see relation inter two columns and box plot with table. We use scatter matrix plots

In [31]:
#Firstly we importing library.
import plotly.figure_factory as ff
df2015 = rank_2015.loc[:,["teaching","citations","research"]]
df2015["index"] = np.arange(1,len(df2015)+1)
fig = ff.create_scatterplotmatrix(df2015, diag = "box",index = "index",
                                  colormap = "Portland", colormap_type='cat',
                                  height=700, width=700)
iplot(fig)