# Codeschool - Introduction to plotting with Plotly in Python

GitHub doesn't render Plotly plots natively - but you can view this notebook with the interactive plots through nbviewer: https://nbviewer.org/

### What is Plotly?

Plotly is an interactive and open-source plotting library which has over 40 minute chart types. Plotly itself is a JavaScript library, the Python wrapper is what we'll be talking about here.

### Why is Plotly useful?

* Quick and easy to make graphs (low code/effort)
* Very customisable
* Can create interactive plots, which is useful when:
    * You want to be able to turn certain data series on and off
    * You want to be able to zoom into certain parts
    * You want to be able to identify specific data points and their values by hovering over them
    * You want to be able to save plots either as they are or with specific data series turned off/zoomed to show certain parts
    * You want to make pretty plots that look fancy to impress people

### Creating figures

* With `plotly.express` for simple, quick plots (`px`)
* With `plotly.graph_objects` for more customisation (`go`)
* With `plotly.figure_factory` (more advanced)
* With plotly and Dash (e.g. for a dynamic dashboard)

### Format of a Plotly figure

3 main components:
1. Layout: Dictionary which controls the style of the figure (one per figure)
2. Data: List of dictionaries which sets the graph type and holds the data itself
    - Data + type = a trace, can be multiple per plot
3. Frames: relevant for animated plots

In [None]:
# Set renderer so plotly plots show on nbviewer
import plotly.io as pio
pio.renderers.default = "notebook_connected"

In [63]:
## PENGUIN PICTURE

In [13]:
# We're going to use the palmerpenguins dataset. More info: https://www.kaggle.com/datasets/parulpandey/palmer-archipelago-antarctica-penguin-data
# Can install this with

# pip install palmerpenguins

from palmerpenguins import load_penguins

In [64]:
# Take a look at what the penguins dataset looks like
penguins = load_penguins()
penguins

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007
3,Adelie,Torgersen,,,,,,2007
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007
...,...,...,...,...,...,...,...,...
339,Chinstrap,Dream,55.8,19.8,207.0,4000.0,male,2009
340,Chinstrap,Dream,43.5,18.1,202.0,3400.0,female,2009
341,Chinstrap,Dream,49.6,18.2,193.0,3775.0,male,2009
342,Chinstrap,Dream,50.8,19.0,210.0,4100.0,male,2009


7 columns:
* *species*: penguin species (Chinstrap, Adélie, or Gentoo)
* *culmen_length_mm*: culmen length (mm)
* *culmen_depth_mm*: culmen depth (mm)
* *flipper_length_mm*: flipper length (mm)
* *body_mass_g*: body mass (g)
* *island*: island name (Dream, Torgersen, or Biscoe) in the Palmer Archipelago (Antarctica)
* *sex*: penguin sex


### plotly.express

`plotly.express` contains functions to create entire figures at once and is recommended to create most common figures. These figures can also be created using `plotly.graph_objects` but require 5-100x more code.

In [9]:
# Import Plotly express
import plotly.express as px

# Import pandas for dataframes
import pandas as pd

### Univariate plots

Univariate plots display data on one variable, e.g. bar charts, histograms or box and whisker plots.

In [75]:
px.histogram(
    penguins,
    x='species',
    color='sex',
    barmode='group'
)

In [70]:
bill_length = px.box(
    penguins,
    x='species',
    y='bill_length_mm',
    color='species'
)

bill_length.show()

In [71]:
# Update the main title and axes titles, remove the legend
bill_length.update_layout(
    title={
        'text': 'Bill length distribution across species',
        'xanchor': 'center',
        'x': 0.5
    },
    xaxis_title="Species",
    yaxis_title="Bill length (mm)",
    showlegend=False
)

### Scatter plot

In [26]:
scatter_plot = px.scatter(
    penguins,
    x='flipper_length_mm',
    y='body_mass_g',
    color='sex'
)

scatter_plot.show()

### plotly.graph_objects

In [None]:

# Import Plotly graph objects
import plotly.graph_objects as go

### Example - TAT audit

In [None]:
# Read in a CSV of the audit information on each run
audit_df = pd.read_csv('run_info_2023-06-16_2023-07-07.csv')

In [None]:
# Take a look at the data in the dataframe
audit_df

In [None]:
# Create new figure
fig = go.Figure()

In [None]:
# Add a trace for the time between data upload and the first job being run
fig.add_trace(
    go.Bar(
        x=audit_df["run_name"],
        y=audit_df["upload_to_first_job"],
        name="Upload to processing start",
        legendrank=4
    )
)
fig.show()

In [None]:
# Add trace for the time between the first and last job (time spent running the pipeline)
fig.add_trace(
    go.Bar(
        x=audit_df["run_name"],
        y=audit_df["processing_time"],
        name="Pipeline running",
        legendrank=3
    )
)
fig.show()

In [None]:

# Add trace for time between processing ending and us releasing the data
fig.add_trace(
    go.Bar(
        x=audit_df["run_name"],
        y=audit_df["processing_end_to_release"],
        name="Processing end to all samples released",
        legendrank=2,
        text=round(audit_df['upload_to_release'], 1)
    )
)
fig.show()

In [None]:
# Update bars to be stacked
# barmode=relative is used (instead of barmode=stack) because if
# any negative values in the data, such as if timestamp are incorrect
# then with stack bars overlap other bars (instead of sticking out from negative side of chart)
fig.update_layout(barmode='relative')
fig.show()

In [None]:
# Add a line to show the upper limit of the audit standard
fig.add_hline(y=4, line_dash="dash")
fig.show()

In [None]:
# Change angle of X labels and change the order of the runs to date ascending
fig.update_xaxes(tickangle=45, categoryorder='category ascending')
fig.show()

In [None]:
# Change the format of the hover labels to add the run name, stage name and number of days (to 2 dp). Change the position of the text to be outside the bar
fig.update_traces(
    hovertemplate=(
        '<br><b>Run</b>: %{x}<br>'
        '<b>Stage</b>: %{data.name}<br>'
        '<b>Days</b>: %{y:.2f}<br>'
        '<extra></extra>'
    ),
    textposition='outside'
)
fig.show()

In [None]:
# Change title text, position and size
# Change x and y axes text
# Change size of plot
# Change font to Helvetica
# Reverse the order of the legend
fig.update_layout(
    title={
        'text': f"Audit Turnaround Times",
        'xanchor': 'center',
        'x': 0.5,
        'font_size': 20
    },
    xaxis_title="Run name",
    yaxis_title="Number of days",
    width=1100,
    height=700,
    font_family='Helvetica',
    legend_traceorder="reversed"
)
fig.show()

### Colour scales

Plotly comes with built-in discrete and continous colour scales. See documentation here: https://plotly.com/python/builtin-colorscales/

### Example - Gait analysis report

### Example - Athena