# Data Visualisation with Python

The following notebook serves as an introduction to data visualization with Python training course. 

## Why Data Visualization?
Data visualization (DataViz) is an essential tool for exploring and find insight in the data. Before jumping to complex machine learning or multivariate models, one should always take a first look at the data through simple visualization techniques. Indeed, visualization provides a unique perspective on the dataset that might in some cases allow you to detect potential challenge or specifities in your data that should be taken into account for future and in depth analysis.

## Objectives of the course
The goal of this session is to discover how to make 1D, 2D, 3D and eventually multidimensional data visualization with Python. 

We will explore four different librairies:

* Matplotlib (very similar to Matlab's syntax): classic Python library for data visualization.
* Pandas: its main purpose is to handle data frames. It also provides basic visualization modules.
* Seaborn: it provides a high-level interface to draw statistical graphics.
* Plotly: interactive graphing library


# Part 2 - Introduction to Plotly

[Plotly](https://plotly.com/python/) is an open source interactive graphing library written in JavaScript (mainly d3.js) with Python and R implementations.

We will explore two parts of the library:
* plotly.express, a good starting point
* plotly.graph_objects, a more advanced use of plotly

The project plotly also offers the possibility to create **interactive dashboards** without knowing frontent languages such as HTTP, CSS or JavaScript. [Dash](https://dash.plotly.com/introduction) is a Python framework for building web applications, like R-Shiny. It is based on the core plotly functionalities and the web-service framework [Flask](https://flask.palletsprojects.com/en/2.0.x/).

Resources : 
- [Official documentation](https://plotly.com/python/plotly-fundamentals/)
- [Video tutorial](https://www.youtube.com/watch?v=GGL6U0k8WYA)
- [Visualization with Plotly.Express: Comprehensive guide](https://towardsdatascience.com/visualization-with-plotly-express-comprehensive-guide-eb5ee4b50b57)
- [Plotly and Dash](https://medium.com/analytics-vidhya/interactive-visualization-with-plotly-and-dash-f3f840b786fa)

## **Plotly express**

The [plotly.express](https://plotly.com/python/plotly-express/) module contains functions that can create entire figures **at once**. Plotly Express is the recommended starting point for creating most common figures, for instance:

* Basics: scatter, line, area, bar
* Part-of-Whole: pie, sunburst, treemap
* 1D Distributions: histogram, boxplot, violin
* 2D Distributions: density_heatmap, density_contour
* Matrix Input: imshow

In the following code, we import the `express` module of the `plotly` library under the alias px.

In [None]:
# Import the necessary package
import plotly.express as px
from pandas import DatetimeIndex

In [None]:
# Load some useful datasets
iris = px.data.iris()

from vega_datasets import data
seattle_weather = data.seattle_weather()

# Extract the month and the year from the date to create a new variable
seattle_weather['month'] = DatetimeIndex(seattle_weather['date']).month_name()
seattle_weather['year'] = DatetimeIndex(seattle_weather['date']).year

### **Scatter plot**

The function `scatter()` plots data points. It is equivalent to `scatter()` in `matplotlib` and `plot()` in R.

In [None]:
# Arguments: a dataset (pandas DataFrame, numpy array or dict) and the labels of the features we want to visualize (x and y)
fig = px.scatter(data_frame=iris, x="sepal_width", y="sepal_length")
fig.show()

Set the value of the arguments color, size, hover_data for adding more informations to the figure. The corresponding values correspond to some specific columns in the dataset.

In [None]:
# Scatter plot with more arguments
fig = px.scatter(iris, x="sepal_width", y="sepal_length", color="species",
                 size="petal_length", hover_data=["petal_width"])
fig.show()

*Exercises: for the two previous figures*
1.   *Modify the axis labels and the title*
2.   *Move the legend at the top left of the figure*




The function `scatter_matrix` is equivalent to `pairplot` in seaborn and `pairs` in R.

In [None]:
fig = px.scatter_matrix(iris, dimensions=["sepal_width", "sepal_length", "petal_width", "petal_length"], color="species", symbol="species")
fig.update_traces(diagonal_visible=False)
fig.show()

### **Histogram and barplot**

*   The function `histogram` is equivalent to `hist` in mathplotlib and R
*   The function `bar` is equivalent to `bar` in matplotlib and `barplot` in R



In [None]:
# Arguments: a dataset (pandas DataFrame, numpy array or dict) and the label of the feature we want to visualize (x)
fig = px.histogram(data_frame=seattle_weather, x="temp_max")
fig.show()

In [None]:
# Select the data to be visualized
gb = seattle_weather.groupby(["year", "month"])
df = gb.mean().loc[2012].reset_index()

# Bar plot
# Arguments: a dataset (pandas DataFrame, numpy array or dict) and the labels of the features we want to visualize (x and y)
fig = px.bar(data_frame=df, x="month", y="temp_max")
fig.show()

*Exercises:*
1. *Re-order the categories in the x-axis using the argument `category_orders`*
2. *Rotate the x-axis tick labels by 45 degrees*

### **Heatmaps and images**

The function `imshow` is equivalent to the functions `imshow` (in matplotlib), `heatmap` (in seaborn)

In [None]:
# Compute the correlation matrix of the iris dataset
correlations = iris.corr()

# Plot the corresponding image
# Argument: an array-like image
fig = px.imshow(img=correlations)
fig.show()

*Exercises:*

1. *Change the color map using the argument `color_continuous_scale`*
2. *(Hard) Replace the numerical labels of the axis by the true feature names*

The function `imshow` can be used for plotting photos

In [None]:
# Load data
from skimage import data
img = data.camera()

# Plot the image
fig = px.imshow(img, color_continuous_scale='gray')
fig.update_layout(coloraxis_showscale=False)
fig.update_xaxes(showticklabels=False)
fig.update_yaxes(showticklabels=False)
fig.show()

### **Lines**

The function `line` plots line charts.

In [None]:
# Arguments: a dataset (pandas DataFrame, numpy array or dict) and the labels of the features we want to visualize (x and y)
fig = px.line(data_frame=seattle_weather, x='date', y='temp_max')
fig.show()

In [None]:
# Using add_scatter function to add more lines
fig = px.line(data_frame=seattle_weather, x="date", y="temp_max")
fig.add_scatter(x=seattle_weather["date"], y=seattle_weather["temp_min"], mode="lines")
fig.show()

*Exercises: Change the style of the previous chart*
  * *use dash line and or dots*
  * *modify the size of the line*
  * *(modify the opacity of the lines)*



### **Boxplot**

The function `box` is equivalent to the function boxplot in R

In [None]:
fig = px.box(seattle_weather, x="month", y="temp_max")
fig.show()

In [None]:
fig = px.box(seattle_weather, x="month", y="temp_max", color="year")
fig.show()

## **Graph objects**

### What are Graph objects

The module `plotly.graph_objects` contains Python classes that represent parts of a figure. As for `matplotlib`, figures are central in plotly.

Broadly speaking, a figure is created by instanciating the class `plotly.graph_objects.Figure`. Instances of this class has many convenience methods for manipulating their attributes (e.g. `.update_layout()` or `.add_trace()`) as well as rendering them (e.g. `.show()`) and exporting them to various formats (e.g. `.to_json()` or `.write_image()` or `.write_html()`). For instance:

```
# Create a fig object
fig = graph_objects.Figure()

# Add a go.Bar object using add_trace method
fig.add_trace(go.Bar(x, y))

# Write to HTML
fig.write_html()
```

Another way to use the module graph objects is to define the elements separately and then to add the object in the figure container:

```
# Create a go.Bar object
bar = go.Bar(x, y)

# Create a layout object, changing the size or anything else
layout = go.Layout(height=600, width=800)

# Add the two objects in the final figure
fig = go.Figure(data=[bar], layout=layout)

# Write to HTML
fig.write_html()
```

The **two main points** when using Graph objects are 
* Certain kinds of figures are not yet possible to create with Plotly Express
* It can be easier to start from an empty `plotly.graph_objects.Figure` object and progressively **add traces** and **update attributes**.


**Official documentation:** [Graph objects in Python](https://plotly.com/python/graph-objects/)





In [None]:
# Import graph_object under the alias go
import plotly.graph_objects as go

### **Comparison with Plotly Express**

The functions in Plotly Express are all built on top of graph objects, and all return instances of `plotly.graph_objects.Figure`.

The official documentation recommends the use of `plotly.express` for the sake of simplicity but it is possible to have the same plots with `plotly.graph_objects` with more lines of code (see a comparison below).

Note that every plotly documentation page lists the Plotly Express option at the top if a Plotly Express function exists to make the kind of chart in question, and then the graph objects version below.

In [None]:
# First generate data
import pandas as pd
df = pd.DataFrame({
  "Fruit": ["Apples", "Oranges", "Bananas", "Apples", "Oranges", "Bananas"],
  "Contestant": ["Alex", "Alex", "Alex", "Jordan", "Jordan", "Jordan"],
  "Number Eaten": [2, 1, 3, 1, 3, 2],
})

# With plotly.express
import plotly.express as px
fig = px.bar(df, x="Fruit", y="Number Eaten", color="Contestant", barmode="group")
fig.show()

# With graph_object
fig = go.Figure()
for contestant, group in df.groupby("Contestant"):
    fig.add_trace(go.Bar(x=group["Fruit"], 
                         y=group["Number Eaten"],
                         name=contestant,
                         hovertemplate="Contestant=%s<br>Fruit=%%{x}<br>Number Eaten=%%{y}<extra></extra>"% contestant))
fig.show()

Mixing a bar chart and a scatter plot

In [None]:
bar = go.Bar(x=["a", "b", "c"], y=[1, 3, 2])
scatter = go.Scatter(x=["a", "b", "c"], y=[1, 3, 2])
layout = go.Layout(height=600, width=800)
fig = go.Figure(data=[bar, scatter], layout=layout)
fig.show()

*Exercise: add axis labels and modify the legend labels using the methods `update_xaxes` ans `update_yaxis`*

### **Histogram and bar chart**

In [None]:
# Generate Gaussian data
import numpy as np
x0 = np.random.randn(500)
x1 = np.random.randn(500) + 1

# Create a figure
fig = go.Figure()

# Add the first histogram to the figure
fig.add_trace(go.Histogram(
    x=x0,
    histnorm='percent',
    name='control', # name used in legend and hover labels
    xbins=dict( # bins used for histogram
        start=-4.0,
        end=3.0,
        size=0.5
    ),
    marker_color='#EB89B5',
    opacity=0.75
))

# Add the second histogram to the figure
fig.add_trace(go.Histogram(
    x=x1,
    histnorm='percent',
    name='experimental',
    xbins=dict(
        start=-3.0,
        end=4,
        size=0.5
    ),
    marker_color='#330C73',
    opacity=0.75
))

# Update the layout
fig.update_layout(
    title_text='Sampled Results', # title of plot
    xaxis_title_text='Value', # xaxis label
    yaxis_title_text='Count', # yaxis label
    bargap=0.2, # gap between bars of adjacent location coordinates
    bargroupgap=0.1 # gap between bars of the same location coordinates
)

fig.show()

In [None]:
# Generate data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
values1 = [20, 14, 25, 16, 18, 22, 19, 15, 12, 16, 14, 17]
values2 = [19, 14, 22, 14, 16, 19, 15, 14, 10, 12, 12, 16]

fig = go.Figure()

fig.add_trace(go.Bar(
    x=months,
    y=values1,
    name='Primary Product',
    marker_color='indianred'
))

fig.add_trace(go.Bar(
    x=months,
    y=values2,
    name='Secondary Product',
    marker_color='lightsalmon'
))

# Here we modify the tickangle of the xaxis, resulting in rotated labels.
fig.update_layout(barmode='group', xaxis_tickangle=-45)
fig.show()

*Exercise: generate the two previous figures with the following pattern*

```
chart1 = ...
chart2 = ...
layout = ...
fig = go.Figure(data=[chart1, chart2], layout=layout)
```

### **Line**

In [None]:
# Create data
month = ['January', 'February', 'March', 'April', 'May', 'June', 'July',
         'August', 'September', 'October', 'November', 'December']
high_2000 = [32.5, 37.6, 49.9, 53.0, 69.1, 75.4, 76.5, 76.6, 70.7, 60.6, 45.1, 29.3]
low_2000 = [13.8, 22.3, 32.5, 37.2, 49.9, 56.1, 57.7, 58.3, 51.2, 42.8, 31.6, 15.9]
high_2007 = [36.5, 26.6, 43.6, 52.3, 71.5, 81.4, 80.5, 82.2, 76.0, 67.3, 46.1, 35.0]
low_2007 = [23.6, 14.0, 27.0, 36.8, 47.6, 57.7, 58.9, 61.2, 53.3, 48.5, 31.0, 23.6]
high_2014 = [28.8, 28.5, 37.0, 56.8, 69.7, 79.7, 78.5, 77.8, 74.1, 62.6, 45.3, 39.9]
low_2014 = [12.7, 14.3, 18.6, 35.5, 49.9, 58.0, 60.0, 58.6, 51.7, 45.2, 32.2, 29.1]

# Create the figure object
fig = go.Figure()

# Create and style traces
fig.add_trace(go.Scatter(x=month, y=high_2014, name='High 2014',
                         line=dict(color='firebrick', width=4)))
fig.add_trace(go.Scatter(x=month, y=low_2014, name = 'Low 2014',
                         line=dict(color='royalblue', width=4)))
fig.add_trace(go.Scatter(x=month, y=high_2007, name='High 2007',
                         line=dict(color='firebrick', width=4, dash='dash') # dash options include 'dash', 'dot', and 'dashdot'
))
fig.add_trace(go.Scatter(x=month, y=low_2007, name='Low 2007',
                         line = dict(color='royalblue', width=4, dash='dash')))
fig.add_trace(go.Scatter(x=month, y=high_2000, name='High 2000',
                         line = dict(color='firebrick', width=4, dash='dot')))
fig.add_trace(go.Scatter(x=month, y=low_2000, name='Low 2000',
                         line=dict(color='royalblue', width=4, dash='dot')))

# Edit the layout
fig.update_layout(title='Average High and Low Temperatures in New York',
                  xaxis_title='Month',
                  yaxis_title='Temperature (degrees F)')


fig.show()

### **Heatmap**

In [None]:
# Correlation matrix
correlations = iris.corr()

# Heatmap
heatmap = go.Heatmap(z=correlations, x=correlations.index, y=correlations.index)
fig = go.Figure(data=heatmap)
fig.show()

## **Going further with dash library**

Dash is a Python framework for interactive dashboards. It creates a web application in which figures will be uploaded (see [here](https://dash.plotly.com/layout)).

There are two ways to run such an application:
1. Implement the features in a program called `app.py` and run it using `python3 app.py` in a terminal
2. Use Jupyter Notebook

Note: the dashboard has to be run locally since it is accessed with a web navigator. It is possible to use Google Colab but not easy and we recommend using one of the ways above.





**Example of a very simple dashboard:**

```
import dash
import dash_core_components as dcc
import dash_html_components as html

fig = go.Figure() # or any Plotly Express function e.g. px.bar(...)
# fig.add_trace( ... )
# fig.update_layout( ... )

# Instanciate the class dash.Dash
app = dash.Dash()

# Add a layout
app.layout = html.Div([
    dcc.Graph(figure=fig)
])

# Run the server
app.run_server(debug=True, use_reloader=False)  # Turn off reloader if inside Jupyter
```

Running the code above will outputs:

```
Dash is running on http://127.0.0.1:8050/

 * Serving Flask app "test_dash" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
```

Then go to `http://127.0.0.1:8050/`.



**Documentations:**
* Tutorial: [https://dash.plotly.com/](https://dash.plotly.com/)
* Examples of dashboards: [Dash app gallery](https://dash-gallery.plotly.host/Portal/)
* Github repository: [dash-sample-apps](https://github.com/plotly/dash-sample-apps)

*Exercises:*
* *Read the official tutorial*
* *Create a simple dashboard with two or three figures one below the other*