# Plotly

- plotly is a suite of open source libraries + commercial software
- libraries for multiple programming languages including python
- data viz
- interactive applications
- commercial offerings focus on interactive applications + hosting
- high level interface: plotly.express
- low level object interface: graph_objects
- uniform api similar to, but not quite the same as seaborn (tidy data)
- outputs HTML in a notebook

```python
pip install plotly
```

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px

In [2]:
df = px.data.tips()

## Continuous and Categorical

In [3]:
px.box(df, y='tip', x='time')

In [4]:
px.violin(df, y='time', x='total_bill')

In [5]:
# NB we have to aggregate, plotly won't do it for us like seaborn
tips_by_day = df.groupby('day').tip.mean()
px.bar(tips_by_day)

In [6]:
tips_by_day_and_time = df.groupby(['day', 'time'], as_index=False).tip.mean()
px.bar(tips_by_day_and_time, y='tip', x='day', color='time', barmode='group')

### Treemaps

Usually only useful for sums where we want to represent percentage of a whole.

In [7]:
px.treemap(df, values='total_bill', path=['day'])


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.



## Heatmaps

In [8]:
ctab = pd.crosstab(df.time, df['size'])
px.imshow(ctab, color_continuous_scale=['white', 'green'], text_auto=True)

In [9]:
correlation_table = px.data.iris().drop(columns='species_id').corr()
px.imshow(
    correlation_table,
    zmin=-1, zmax=1,
    color_continuous_scale=['red', 'white', 'green'],
    text_auto=True,
)

## Continuous and Continuous

In [10]:
px.scatter(df, y='tip', x='total_bill')

In [11]:
np.random.seed(123)

ts_df = pd.DataFrame({
    'x': pd.date_range('2022', freq='D', periods=100),
    'y': np.random.randn(100).cumsum(),
})
px.line(ts_df, x='x', y='y')

## Adding Dimensions

- color, symbol, size
- facet

In [12]:
px.scatter(df, y='tip', x='total_bill', color='time')

In [13]:
px.scatter(df, y='tip', x='total_bill', symbol='smoker', size='size')

In [14]:
px.scatter(df, y='tip', x='total_bill', facet_col='day', facet_row='time')

## Customizing Figures

### Titles and Labels

In [15]:
fig = px.scatter(df, y='tip', x='total_bill')
fig.update_layout(xaxis_title='Total Bill ($)', yaxis_title='Tip Amount ($)', title='Tip vs Total Bill')

In [16]:
# Alternatively...
fig = px.scatter(df, y='tip', x='total_bill')
fig.layout.xaxis.title = 'Total Bill ($)'
fig

### Horizontal and Vertical Lines

In [17]:
fig = px.scatter(df, y='tip', x='total_bill')
fig.add_vline(
    df.total_bill.mean(), line_dash='dot', opacity=.7,
    annotation_text=f'Average Total Bill: ${df.total_bill.mean():.2f}',
    annotation_position='top right'
)
fig.add_hline(
    df.tip.mean(), line_dash='dot', opacity=.7,
    annotation_text=f'Average Tip: ${df.tip.mean():.2f}',
    annotation_position='bottom right'
)

### Axis Ticks

In [18]:
fig = px.scatter(df, y='tip', x='total_bill')
fig.update_layout(
    xaxis_tickmode='array', xaxis_tickvals=[10, 20, 22.5, 40],
    yaxis_tickmode='linear', yaxis_tick0=1, yaxis_dtick=0.5,
)

### Axis Limits

In [19]:
fig = px.scatter(df, y='tip', x='total_bill')
fig.update_layout(xaxis_range=[10, 25], yaxis_range=[0, 8])

### Annotations

In [20]:
fig = px.scatter(df, y='tip', x='total_bill')
fig.add_annotation(
    x=df.total_bill.max(), y=df.tip.max(),
    ayref='y', ay='9', axref='x', ax=55,
    text='Highest Tip <br />and Total Bill'
)

### Hover Text

In [21]:
fig = px.scatter(df, y='tip', x='total_bill', hover_name='time')
fig

In [22]:
fig = px.scatter(df, y='tip', x='total_bill', hover_data=['day', 'time'])
fig

## Additional Features

### Saving Figures

You can always just take a screenshot with command + shift + 5 or click the "download plot as png" button.

In [23]:
fig = px.scatter(df, y='tip', x='total_bill')
fig.write_image('scatter_tip_total_bill.png')
fig.write_html('scatter_tip_total_bill.html')

Note the html file embeds your data and the plotly library, so can be quite large!

In [24]:
import os

png_size = os.path.getsize('scatter_tip_total_bill.png')
html_size = os.path.getsize('scatter_tip_total_bill.html')

print(f'''
PNG file size  = {png_size / 1024:7.2f} K ({png_size / 1024 / 1024:.2f} M)
HTML file size = {html_size / 1024:.2f} K ({html_size / 1024 / 1024:.2f} M)
'''.strip())

PNG file size  =   34.62 K (0.03 M)
HTML file size = 3601.24 K (3.52 M)


### Pandas Plotting Backend

Any `.plot` calls will use plotly visualizations.

In [25]:
pd.options.plotting.backend = 'plotly'

df.plot.scatter(y='tip', x='total_bill')

In [26]:
import pydataset

In [27]:
pydataset.data().sample(20)

Unnamed: 0,dataset_id,title
553,CNES,Variables from the 1997 Canadian National Election Study
346,USclassifiedDocuments,Official Secrecy of the United States Government
559,cgd,Chronic Granulotomous Disease data
588,JobSatisfaction,Job Satisfaction Data
274,Caschool,The California Test Score Data Set
539,prussian,Prussian army horse kick data
161,education,Education Expenditure Data
138,lung,"data from Exercise 4.4, p120"
338,StrikeNb,Number of Strikes in Us Manufacturing
317,ModeChoice,Data to Study Travel Mode Choice


## Exercise

1. Use the code snippet below to get you started with a dataset of various characteristics of scooby doo episodes:

    ```python
    df = pd.read_csv('https://github.com/rfordatascience/tidytuesday/raw/master/data/2021/2021-07-13/scoobydoo.csv')
    ```

1. Do episodes where the monster is an animal or ghost have higer imdb ratings?
1. Does number of "zoinks" correlate with the number of "jinkies"? Does whether or not the episode contains a door gag affect this?
1. Does the setting terrain affect the imdb rating of an episode? What if you take into account whether or not scrappy doo was in the episode?
1. Do number of monsters correlate with number of "jeepers"? Does this vary by network?
1. Use plotly express to continue to explore the scooby doo episode dataset.

---

1. Download the kickstarter dataset from kaggle: https://www.kaggle.com/datasets/kemical/kickstarter-projects?select=ks-projects-201801.csv
1. Visualize the relationship between the goal and pledged amount by category.
1. Visualize the percentage of successful projects by category. How does number of backers affect this?
1. Visualize the number of successful projects over time.
1. What is the relationship between campaign length (deadline - launch date) and number of backers? How does this vary between successful and failed projects?
1. Use plotly express to further explore the kickstarter dataset.