# Knowledge Transfer Session - Data Visualisation with Plotly


---

The tutorial is divided into two parts:

1. In the first part we will learn how to create basic plots with plotly, how to create subplots, multiple axes or save our plots into a portable HTML file.

2. In the second part, the attendees will be divided into 2-3 groups and work on new data trying to create visualisation for that. They will need to save it as HTML file and send to me so we can share what they did!

---

# Tutorial Part 1.

---

## 1. Scatter plot

Scatter plot is the simplest plot we have - they are in the form of single points on the x-y plot.


<b> Reference: </b> https://plotly.com/python/line-and-scatter/

---

### 1a) Generate data

We are using numpy package to generate random integers - for both x and y axis.

In [77]:
import numpy as np

random_X = np.random.randint(1, 100, 1000)
random_y = np.random.randint(1, 100, 1000)

### 1b) Create simple scatter plot

We are creating the first and the simplest scatter plot.

The compulsory component we need to provide to create a plot is <i> data component </i>. What we do is:

1) create data component with go.Scatter() where we specify data on x and y axes, and type of plot

2) create figure with go.Figure(), where we provide the components for the plot (such as <i> data component </i>)

3) show the plot with fig.show()

In [82]:
import plotly.graph_objs as go

data_component = go.Scatter(x=random_X,
                            y=random_y,
                            mode='markers')

fig = go.Figure(data=data_component)

fig.show()

---

### 1c) Improving the scatter plot - add title and axes labels

Other component that we can provide to plotly plot is <i> layout component </i> as go.Layout () which defines how the figure looks like. You provide <i> layout component </i> into go.Figure().

In [257]:
import plotly.graph_objs as go

data_component = go.Scatter(x=random_X,
                            y=random_y,
                            mode='markers')

layout_component = go.Layout(title='My first scatter plot',
                             xaxis_title='random x',
                             yaxis_title='random y')

fig = go.Figure(data=data_component,
                layout=layout_component)

fig.show()

### 1d) changing the shape, colour and size of markers

You can change how to markers look like with parameter <i> marker </i> which is in the form of dictionary. Some of the parameters it takes are:

<b> Colour: </b> You can just say what colour you want - for example 'red'. Or you can provide a code for the color that you can find here: https://plotly.com/python/discrete-color/

<b> Size: </b> This is given in integers - for example 12.

<b> Symbol: </b> Symbol is a part of styling markers. You can find the entire list here: https://plotly.com/python/marker-style/

<b> Opacity: </b> Transparency of the markers (from 0 - invisible, to 1 - fully visible)

<b> Line: </b> Border of the markers (as a dictionary)

In [84]:
import plotly.graph_objs as go

data_component = go.Scatter(x=random_X,
                            y=random_y,
                            mode='markers',
                            marker=dict(size=12,
                                        color='green',
                                        symbol='hexagon',
                                        opacity=0.5,
                                        line=dict(width=2,
                                                  color='red')))

layout_component = go.Layout(title='My first scatter plot',
                             xaxis_title='random x',
                             yaxis_title='random y')

fig = go.Figure(data=data_component,
                layout=layout_component)

fig.show()

---

## 2. Line plot

Line plot is created in a very similar way to scatter plot: the only difference is what you specify as <i> mode </i>: <i> mode = 'lines' </i>.

<b> Reference: </b> https://plotly.com/python/line-charts/

---

### 2a) Load the data

The data we will try to visualise for line charts is daily average temperature in May 2019 - May 2020 in London. I have uploaded them on my GitHub so you can easily load them straight in the notebook by using pandas as below. The data are stored in pandas dataframe (pd df).

In [85]:
import pandas as pd

data_source = 'https://raw.githubusercontent.com/kamiloster/plotly_workshop/main/temperature_london.csv'

df = pd.read_csv(data_source)

df

Unnamed: 0,date,tavg,tmin,tmax
0,23/05/2019,16.9,9.0,24.7
1,24/05/2019,16.9,10.8,21.1
2,25/05/2019,18.6,13.9,23.3
3,26/05/2019,17.2,14.0,20.1
4,27/05/2019,14.2,9.0,18.6
...,...,...,...,...
360,17/05/2020,14.3,7.7,20.2
361,18/05/2020,16.7,10.2,23.9
362,19/05/2020,18.4,11.6,26.0
363,20/05/2020,20.1,13.4,27.2


### 2b) Plot 3 different modes: markers, lines and their combination

We will compare the 3 different modes available: markers (scatter plot), marker+lines (scatter with lines), and lines (line plot). To the temperature passed in the y axis, I have added 15 and 30 oC into some of them: just so you can see the difference between the modes (otherwise they would lay on top of each other).

In [89]:
import plotly.graph_objs as go 

trace_markers = go.Scatter(x=df['date'],
                           y=df['tavg'],
                           mode='markers',
                           name='markers')

trace_lines = go.Scatter(x=df['date'],
                         y=df['tavg'] + 15,
                         mode='lines',
                         name='lines - added 15 oC')

trace_markers_lines = go.Scatter(x=df['date'],
                                 y=df['tavg'] + 30,
                                 mode='markers+lines',
                                 name='markers+lines - added 30 oC')

data_component = [trace_markers, trace_lines, trace_markers_lines]

layout_component = go.Layout(title='Comparison of different modes: markers, lines and markers+lines',
                             xaxis_title='Date',
                             yaxis_title='Daily average temperature (oC)',
                             hovermode='x')

fig = go.Figure(data=data_component,
                layout=layout_component)

fig.show()

---

## 3. Bar chart

Bar chart can come in 3 different types: normal, stacked, nested.

<b> Reference:</b> https://plotly.com/python/bar-charts/

---

### 3a) Load the data

The data I prepared is for Winter Olympics in 2018. They summarise how many Gold/Silver/Bronze and Total medals were achieved for each country.

In [259]:
import pandas as pd

data_source = 'https://github.com/kamiloster/plotly_workshop/raw/main/2018WinterOlympics.csv'

df = pd.read_csv(data_source)

df

Unnamed: 0,Rank,NOC,Gold,Silver,Bronze,Total
0,1,Norway,14,14,11,39
1,2,Germany,14,10,7,31
2,3,Canada,11,8,10,29
3,4,United States,9,8,6,23
4,5,Netherlands,8,6,6,20
5,6,Sweden,7,6,1,14
6,7,Republic of Korea,5,8,4,17
7,8,Switzerland,5,6,4,15
8,9,France,5,4,6,15
9,10,Austria,5,3,6,14


### 3b) Normal bar chart

In [260]:
import plotly.graph_objs as go

data_component = go.Bar(x=df['NOC'],
                        y=df['Total'])

layout_component = go.Layout(title='Medals in 2018 Olympics',
                             xaxis_title='Country',
                             yaxis_title='Total number of medals')

fig = go.Figure(data=data_component,
                layout=layout_component)

fig.show()

### 3c) Nested bar chart

In [261]:
import plotly.graph_objs as go

trace_1 = go.Bar(x=df['NOC'],
                 y=df['Gold'],
                 name='Gold')

trace_2 = go.Bar(x=df['NOC'],
                 y=df['Silver'],
                 name='Silver')

trace_3 = go.Bar(x=df['NOC'],
                 y=df['Bronze'],
                 name='Bronze')

data_component = [trace_1, trace_2, trace_3]

layout_component = go.Layout(title='Medals in 2018 Olympics',
                             xaxis_title='Country',
                             yaxis_title='Number of medals')

fig = go.Figure(data=data_component,
                layout=layout_component)

fig.show()

### 3d) Stacked bar chart

In [262]:
import plotly.graph_objs as go

trace_1 = go.Bar(x=df['NOC'],
                 y=df['Gold'],
                 name='Gold',
                 marker=dict(color='gold'))

trace_2 = go.Bar(x=df['NOC'],
                 y=df['Silver'],
                 name='Silver',
                 marker=dict(color='silver'))

trace_3 = go.Bar(x=df['NOC'],
                 y=df['Bronze'],
                 name='Bronze',
                 marker=dict(color='brown'))

data_component = [trace_1, trace_2, trace_3]

layout_component = go.Layout(title='Medals in 2018 Olympics',
                             xaxis_title='Country',
                             yaxis_title='Number of medals',
                             barmode='stack')


fig = go.Figure(data=data_component, layout=layout_component)

fig.show()

---

## 4. Bubble plots

Bubble plots is useful to visualise x and y in more dimensions: for example including more variables in the colour and size of the scatters - which then creates bubble-like plot.

<b> Reference: </b> https://plotly.com/python/bubble-charts/

---

### 4a) Load the data

Cars come with different parameters - number of cylinders, horse power, weight, etc. This dataset summarises different parameters for all sorts of cars.

In [98]:
import pandas as pd

data_source = 'https://github.com/kamiloster/plotly_workshop/raw/main/mpg.csv'

df = pd.read_csv(data_source)

df

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin,name
0,18.0,8,307.0,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140,3449,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
393,27.0,4,140.0,86,2790,15.6,82,1,ford mustang gl
394,44.0,4,97.0,52,2130,24.6,82,2,vw pickup
395,32.0,4,135.0,84,2295,11.6,82,1,dodge rampage
396,28.0,4,120.0,79,2625,18.6,82,1,ford ranger


### 4b) Create the bubble plot

In [100]:
import plotly.graph_objs as go

data_component = go.Scatter(x=df['horsepower'],
                            y=df['mpg'],
                            text=df['name'],
                            mode='markers',
                            marker=dict(size=2 * df['cylinders'],
                                        color=df['weight']))

layout_component = go.Layout(title='Bubble chart',
                             xaxis_title='Horse power',
                             yaxis_title='mpg')

fig = go.Figure(data=data_component,
                layout=layout_component)

fig.show()

## 5. Box plots

Box plots are very important in statistical analysis. They show you how the data are distributed around mean/median, standard deviation and upper/lower whiskers (limits).

<b> Reference: </b> https://plotly.com/python/box-plots/

---

### 5a) Load the data

Abalone dataset is very popular in machine learning. Abalone is a type of shellfish. Often, their age is determined by counting the rings on the shell. Machine learning was used to correlate this age with other properties: length, diameter, height, and others. It is important to visualise the statistics of these properties.

In [264]:
import pandas as pd

data_source = 'https://github.com/kamiloster/plotly_workshop/raw/main/abalone.csv'

df = pd.read_csv(data_source)

df

Unnamed: 0,sex,length,diameter,height,whole_weight,shucked_weight,viscera_weight,shell_weight,rings
0,M,0.455,0.365,0.095,0.5140,0.2245,0.1010,0.1500,15
1,M,0.350,0.265,0.090,0.2255,0.0995,0.0485,0.0700,7
2,F,0.530,0.420,0.135,0.6770,0.2565,0.1415,0.2100,9
3,M,0.440,0.365,0.125,0.5160,0.2155,0.1140,0.1550,10
4,I,0.330,0.255,0.080,0.2050,0.0895,0.0395,0.0550,7
...,...,...,...,...,...,...,...,...,...
4172,F,0.565,0.450,0.165,0.8870,0.3700,0.2390,0.2490,11
4173,M,0.590,0.440,0.135,0.9660,0.4390,0.2145,0.2605,10
4174,M,0.600,0.475,0.205,1.1760,0.5255,0.2875,0.3080,9
4175,F,0.625,0.485,0.150,1.0945,0.5310,0.2610,0.2960,10


### 5b) Other ways to create plotly plots

You can also add traces to already created figure go.Figure() with fig.add_trace(), and then update the layout with fig.update_layout():

1) Create figure fig = go.Figure()

2) Add all the plots you want with fig.add_trace()

3) Update the layout with fig.update_layout()

4) Show the figure fig.show()

In [265]:
import plotly.graph_objs as go

fig = go.Figure()

fig.add_trace(go.Box(y=df['length'],
                     name='Length'))

fig.add_trace(go.Box(y=df['diameter'],
                     name='Diameter'))

fig.add_trace(go.Box(y=df['height'],
                     name='Height'))

fig.add_trace(go.Box(y=df['whole_weight'],
                     name='Whole weight'))

fig.update_layout(title='Box plots for basic properties of abalone shellfish',
                  xaxis_title='Property',
                  yaxis_title='Property value')

fig.show()

---

## 6. Histogram

<b> Reference : </b> https://plotly.com/python/histograms/

---

---

## Exercise 1.

Using abalone dataset and the same data as in the box plot - create a histogram. The reference link is given above. You can do it with either method 1 or 2 to create plotly charts (although I encourage you to use the method 2).

<b> Hint: </b> In go.Box() we pass the data as y. Find on the reference website for go.Histogram() how we pass it in histogram.

---

## Exercise 1. - SOLUTION

In [266]:
import plotly.graph_objs as go

fig = go.Figure()

fig.add_trace(go.Histogram(x=df['length'],
                           name='Length'))

fig.add_trace(go.Histogram(x=df['diameter'],
                           name='Diameter'))

fig.add_trace(go.Histogram(x=df['height'],
                           name='Height'))

fig.add_trace(go.Histogram(x=df['whole_weight'],
                           name='Whole weight'))

fig.update_layout(title='Histogram for basic properties of abalone shellfish',
                  xaxis_title='Bin',
                  yaxis_title='Property count')

fig.show()

In [267]:
import plotly.graph_objs as go

fig = go.Figure()

fig.add_trace(go.Histogram(x=df['length'],
                           name='Length'))

fig.add_trace(go.Histogram(x=df['diameter'],
                           name='Diameter'))

fig.add_trace(go.Histogram(x=df['height'],
                           name='Height'))

fig.add_trace(go.Histogram(x=df['whole_weight'],
                           name='Whole weight'))

fig.update_layout(title='Histogram for basic properties of abalone shellfish',
                  xaxis_title='Bin',
                  yaxis_title='Property count',
                  barmode='stack')

fig.show()

## 7. Heat maps

Heat maps are very useful to visualise correlations between x-y-z - for example Pearson correlation coefficient.

<b> Reference: </b> https://plotly.com/python/heatmaps/

---

### 7a) Load the data

The next dataset we are looking at is hourly temperature average in Santa Barbara (in California).

In [108]:
import pandas as pd

data_source = 'https://github.com/kamiloster/plotly_workshop/raw/main/2010SantaBarbaraCA.csv'

df = pd.read_csv(data_source)

df

Unnamed: 0,LST_DATE,DAY,LST_TIME,T_HR_AVG
0,20100601,TUESDAY,0:00,12.7
1,20100601,TUESDAY,1:00,12.7
2,20100601,TUESDAY,2:00,12.3
3,20100601,TUESDAY,3:00,12.5
4,20100601,TUESDAY,4:00,12.7
...,...,...,...,...
163,20100607,MONDAY,19:00,15.6
164,20100607,MONDAY,20:00,14.8
165,20100607,MONDAY,21:00,14.3
166,20100607,MONDAY,22:00,14.4


In [109]:
import plotly.graph_objs as go

fig = go.Figure()

fig.add_trace(go.Heatmap(x=df['DAY'],
                         y=df['LST_TIME'],
                         z=df['T_HR_AVG']))

fig.update_layout(title='Hourly average across the week in Santa Barbara (California)',
                  xaxis_title='Day of the week',
                  yaxis_title='Hour of the day')

fig.show()

## 8. Shared axes

Sometimes, when we plot variables that have significant differences in their values, it makes more sense to create two y axis (or two x axis).

<b> Reference: </b> https://plotly.com/python/multiple-axes/

---

In [218]:
import pandas as pd

data_source = 'https://raw.githubusercontent.com/kamiloster/plotly_workshop/main/temperature_flow_rate.csv'

df = pd.read_csv(data_source)

df

Unnamed: 0,Date,Temperature,Flow rate
0,24/07/2016,565.930603,17.128586
1,25/07/2016,568.573181,17.151127
2,25/07/2016,567.713318,17.263832
3,26/07/2016,567.783081,17.209528
4,26/07/2016,568.730286,17.241804
...,...,...,...
994,18/03/2017,711.218689,9.588627
995,18/03/2017,711.049438,15.312500
996,18/03/2017,689.779785,18.983606
997,19/03/2017,686.987915,9.658299


In [230]:
from plotly.subplots import make_subplots

fig = make_subplots(specs=[[{"secondary_y": True}]])

fig.add_trace(go.Scatter(x=df['Date'],
                         y=df['Temperature'],
                         name='Temperature'),
              secondary_y=False)

fig.add_trace(go.Scatter(x=df['Date'],
                         y=df['Flow rate'],
                         name='Flow rate'),
              secondary_y=True)

fig.update_layout(title_text="Plot with two axes: temperature and flow rate")
fig.update_xaxes(title_text="Date")
fig.update_yaxes(title_text="Temperature", secondary_y=False)
fig.update_yaxes(title_text="Flow rate", secondary_y=True)

fig.show()

## 9. Creating subplots

Subplots are very useful when we want to compare different types of plots in one space.

<b> Reference: </b> https://plotly.com/python/subplots/

---

### 9a) Load the data and get correlation map with df.corr()

In [244]:
import pandas as pd

data_source = 'https://raw.githubusercontent.com/kamiloster/plotly_workshop/main/temperature_london.csv'

df = pd.read_csv(data_source)

correlation_map = df.corr()

df

Unnamed: 0,date,tavg,tmin,tmax
0,23/05/2019,16.9,9.0,24.7
1,24/05/2019,16.9,10.8,21.1
2,25/05/2019,18.6,13.9,23.3
3,26/05/2019,17.2,14.0,20.1
4,27/05/2019,14.2,9.0,18.6
...,...,...,...,...
360,17/05/2020,14.3,7.7,20.2
361,18/05/2020,16.7,10.2,23.9
362,19/05/2020,18.4,11.6,26.0
363,20/05/2020,20.1,13.4,27.2


### 9b) Create subplot

In [256]:
from plotly.subplots import make_subplots

fig = make_subplots(rows=2,
                    cols=2,
                    vertical_spacing=0.2,
                    subplot_titles=('Line plot',
                                    'Heatmap',
                                    'Histogram',
                                    'Box plot'))

# Line plots - 1 x 1
fig.add_trace(go.Scatter(x=df['date'],
                         y=df['tmin'],
                         name='Minimum temperature',
                         mode='lines',
                         marker=dict(color='#B6E880')),
              row=1,
              col=1)

fig.add_trace(go.Scatter(x=df['date'],
                         y=df['tmax'],
                         name='Maximum temperature',
                         mode='lines',
                         marker=dict(color='#17BECF')),
              row=1,
              col=1)

fig.add_trace(go.Scatter(x=df['date'],
                         y=df['tavg'],
                         name='Average temperature',
                         mode='lines',
                         marker=dict(color='black')),
              row=1,
              col=1)

# Heatmap - 1 x 2
fig.add_trace(go.Heatmap(z=correlation_map,
                         x=df.columns[1:],
                         y=df.columns[1:],
                         showscale=False),
              row=1,
              col=2)


# Histograms - 2 x 1
fig.add_trace(go.Histogram(x=df['tavg'],
                           name='Average temperature',
                           marker=dict(color='black'),
                           showlegend=False),
              row=2,
              col=1)

fig.add_trace(go.Histogram(x=df['tmin'],
                           name='Minimum temperature',
                           marker=dict(color='#B6E880'),
                           showlegend=False),
              row=2,
              col=1)

fig.add_trace(go.Histogram(x=df['tmax'],
                           name='Maximum temperature',
                           marker=dict(color='#17BECF'),
                           showlegend=False),
              row=2,
              col=1)

# Box plots - 2 x 2
fig.add_trace(go.Box(y=df['tavg'],
                     name='Average temperature',
                     marker=dict(color='black'),
                     showlegend=False,
                     boxpoints='all'),
              row=2,
              col=2)

fig.add_trace(go.Box(y=df['tmin'],
                     name='Minimum temperature',
                     marker=dict(color='#B6E880'),
                     showlegend=False,
                     boxpoints='all'),
              row=2,
              col=2)

fig.add_trace(go.Box(y=df['tmax'],
                     name='Maximum temperature',
                     marker=dict(color='#17BECF'),
                     showlegend=False,
                     boxpoints='all'),
              row=2,
              col=2)

fig.update_layout(legend=dict(y=1.3, x=0),
                  barmode='stack')

fig.update_yaxes(title_text='Temperature (oC)',
                 row=1,
                 col=1)

fig.update_xaxes(title_text='Date',
                 showticklabels=False,
                 row=1,
                 col=1)

fig.update_yaxes(title_text='Count',
                 row=2,
                 col=1)

fig.update_xaxes(title_text='Bin',
                 row=2,
                 col=1)

fig.update_yaxes(title_text='Temperature (oC)',
                 row=2,
                 col=2)

fig.write_html('C:/Users/kamil/Documents/figure.html')

fig.show()