# Codeschool - Introduction to plotting with Plotly in Python

GitHub doesn't render Plotly plots natively - but you can view this notebook with the interactive plots through nbviewer: https://nbviewer.org/

### What is Plotly?

Plotly is an interactive and open-source plotting library which has over 40 minute chart types. Plotly itself is a JavaScript library, the Python wrapper is what will be discussed here.

### Why is Plotly useful?

* Quick and easy to make graphs (low code/effort)
* Very customisable
* Can create interactive plots, which is useful when:
    * You want to be able to turn certain data series on and off
    * You want to be able to zoom into certain parts
    * You want to be able to identify specific data points and their values by hovering over them
    * You want to be able to save plots either as they are or with specific data series turned off/zoomed to show certain parts
    * You want to make pretty plots that look fancy to impress people

### Creating figures

* With `plotly.express` for simple, quick plots (`px`)
* With `plotly.graph_objects` for more customisation (`go`)
* With `plotly.figure_factory` (more advanced)
* With plotly and Dash (e.g. for a dynamic dashboard)

### Format of a Plotly figure

3 main components:
1. Layout: Dictionary which controls the style of the figure (one per figure)
2. Data: List of dictionaries which sets the graph type and holds the data itself
    - Data + type = a trace, can be multiple per plot
3. Frames: relevant for animated plots

In [10]:
from plotly.offline import plot, iplot, init_notebook_mode
init_notebook_mode(connected=True)

import plotly.io as pio

### plotly.express

In [2]:
# Import Plotly express
import plotly.express as px

# Import pandas for dataframes
import pandas as pd

### plotly.graph_objects

In [3]:

# Import Plotly graph objects
import plotly.graph_objects as go

### Example - TAT audit

In [4]:
# Read in a CSV of the audit information on each run
audit_df = pd.read_csv('run_info_2023-06-16_2023-07-07.csv')

In [5]:
# Take a look at the data in the dataframe
audit_df

Unnamed: 0,assay_type,run_name,upload_time,first_job,processing_finished,jira_resolved,jira_status,upload_to_first_job,processing_time,processing_end_to_release,upload_to_release
0,CEN,230623_A01295_0195_AHFCFVDRX3,2023-06-24 13:47:45,2023-06-24 13:50:51,2023-06-26 14:37:14,2023-06-26 18:26:06,All samples released,0.002,2.032,0.159,2.193
1,CEN,230616_A01303_0213_BHF7YMDRX3,2023-06-17 11:41:23,2023-06-17 11:44:17,2023-06-19 13:10:49,2023-06-19 14:29:21,All samples released,0.002,2.06,0.055,2.117
2,CEN,230621_A01303_0216_BHFKL7DRX3,2023-06-22 16:04:11,2023-06-22 16:08:57,2023-06-23 14:17:55,2023-06-23 16:40:47,All samples released,0.003,0.923,0.099,1.025
3,MYE,230629_A01295_0196_AHHGGLDRX3,2023-06-30 15:28:17,2023-06-30 15:29:42,2023-06-30 18:27:47,2023-07-04 10:29:22,All samples released,0.001,0.124,3.668,3.792
4,MYE,230622_A01295_0194_BHF52FDRX3,2023-06-23 20:15:32,2023-06-23 20:20:37,2023-06-23 23:26:39,2023-06-26 14:16:06,All samples released,0.004,0.129,2.618,2.75
5,MYE,230622_A01303_0217_BHF52LDRX3,2023-06-23 16:42:27,2023-06-23 16:45:28,2023-06-23 19:40:14,2023-06-26 14:16:28,All samples released,0.002,0.121,2.775,2.899
6,MYE,230619_A01303_0214_BHF7MHDRX3,2023-06-20 14:03:23,2023-06-20 14:06:46,2023-06-22 10:52:07,2023-06-22 13:16:11,All samples released,0.002,1.865,0.1,1.967
7,TSO500,230703_A01303_0221_BHFCH3DRX3,2023-07-04 08:39:36,2023-07-04 08:42:50,2023-07-05 08:42:04,2023-07-05 12:51:03,All samples released,0.002,0.999,0.173,1.175
8,TSO500,230629_A01303_0220_AHF52MDRX3,2023-06-30 08:42:20,2023-06-30 08:43:33,2023-07-01 18:40:16,2023-07-03 12:36:09,All samples released,0.001,1.414,1.747,3.162
9,TSO500,230626_A01303_0218_BHFCGVDRX3,2023-06-27 09:01:11,2023-06-27 09:03:51,2023-06-28 07:33:53,2023-06-29 18:27:14,All samples released,0.002,0.938,1.454,2.393


In [6]:
# Create new figure
fig = go.Figure()

In [11]:
# Add a trace for the time between data upload and the first job being run
fig.add_trace(
    go.Bar(
        x=audit_df["run_name"],
        y=audit_df["upload_to_first_job"],
        name="Upload to processing start",
        legendrank=4
    )
)
fig

In [31]:
# Add trace for the time between the first and last job (time spent running the pipeline)
fig.add_trace(
    go.Bar(
        x=audit_df["run_name"],
        y=audit_df["processing_time"],
        name="Pipeline running",
        legendrank=3
    )
)
fig.show()

In [32]:

# Add trace for time between processing ending and us releasing the data
fig.add_trace(
    go.Bar(
        x=audit_df["run_name"],
        y=audit_df["processing_end_to_release"],
        name="Processing end to all samples released",
        legendrank=2,
        text=round(audit_df['upload_to_release'], 1)
    )
)
fig.show()

In [33]:
# Update bars to be stacked
# barmode=relative is used (instead of barmode=stack) because if
# any negative values in the data, such as if timestamp are incorrect
# then with stack bars overlap other bars (instead of sticking out from negative side of chart)
fig.update_layout(barmode='relative')
fig.show()

In [34]:
# Add a line to show the upper limit of the audit standard
fig.add_hline(y=4, line_dash="dash")
fig.show()

In [35]:
# Change angle of X labels and change the order of the runs to date ascending
fig.update_xaxes(tickangle=45, categoryorder='category ascending')
fig.show()

In [36]:
# Change the format of the hover labels to add the run name, stage name and number of days (to 2 dp). Change the position of the text to be outside the bar
fig.update_traces(
    hovertemplate=(
        '<br><b>Run</b>: %{x}<br>'
        '<b>Stage</b>: %{data.name}<br>'
        '<b>Days</b>: %{y:.2f}<br>'
        '<extra></extra>'
    ),
    textposition='outside'
)
fig.show()

In [37]:
# Change title text, position and size
# Change x and y axes text
# Change size of plot
# Change font to Helvetica
# Reverse the order of the legend
fig.update_layout(
    title={
        'text': f"Audit Turnaround Times",
        'xanchor': 'center',
        'x': 0.5,
        'font_size': 20
    },
    xaxis_title="Run name",
    yaxis_title="Number of days",
    width=1100,
    height=700,
    font_family='Helvetica',
    legend_traceorder="reversed"
)
fig.show()

### Colour scales

See built-in colour scales and documentation here: https://plotly.com/python/builtin-colorscales/

### Example - Gait analysis report

### Example - Athena