# Interactive visualization
Author: [Ryan Parker](https://github.com/rparkr)  
Created: Aug 2023

This notebook contains an introduction to interactive visualization using the open-source [`ipyvizzu-story`](https://github.com/vizzuhq/ipyvizzu-story) Python package.

**References**  
- [Streamlit blog: ipyvizzu tutorial](https://blog.streamlit.io/create-an-animated-data-story-with-ipyvizzu-and-streamlit/), a great overall introduction to ipyvizzu and how to create a data story
- [ipyvizzu-story docs: tutorial](https://ipyvizzu-story.vizzuhq.com/latest/tutorial/), explains the basic usage pattern for creating data stories in ipyvizzu-story
- [ipyvizzu docs: tutorial](https://ipyvizzu.vizzuhq.com/latest/tutorial/), explains concepts relevant to chart creation, configuration, filtering, and styling
- [ipyvizzu: chart reference](https://ipyvizzu.vizzuhq.com/latest/examples/), guide to all the different kinds of charts available in ipyvizzu
- [Bike Sharing Demand dataset on OpenML.org](https://openml.org/search?type=data&status=active&id=42712), the dataset used in this demonstration, which has hourly counts of bike rentals in Washington, D.C., from 2011-2012, along with weather information at each hour. This version has been modified from [the original dataset](https://archive.ics.uci.edu/dataset/275/bike+sharing+dataset) to enhance clarity and interpretability.

**Attribution**  
Some of the visualization ideas came from the excellent scikit-learn tutorial: [Time-related feature engineering](https://scikit-learn.org/stable/auto_examples/applications/plot_cyclical_feature_engineering.html).

# Setup
Import packages and download the Bike Sharing Demand dataset from OpenML.org. This dataset 

In [1]:
# %pip install ipyvizzu-story --quiet

In [1]:
from ipyvizzu import Data, Config, Style
from ipyvizzustory import Story, Slide, Step

import pandas as pd
from sklearn.datasets import fetch_openml

In [5]:
# Load the dataset
bike_sharing = fetch_openml(
    "Bike_Sharing_Demand", version=2, as_frame=True, parser="pandas"
)
df = bike_sharing.frame

# Combine 'heavy_rain' into 'rain' since there are only 3 occurences of heavy rain
df['weather'].replace(to_replace='heavy_rain', value='rain', inplace=True)

# Use the day name rather than the day number
df['weekday'] = df['weekday'].apply(lambda x: ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'][x])

# ipyvizzu expects strings, rather than Categorical dtype, so we'll convert those
# df.select_dtypes(include='category').columns  # show which columns are categorical
df = df.astype({
    'season': 'object',
    'year': 'object',
    'month': 'object',
    'hour': 'object',
    'holiday': 'bool',
    'weekday': 'object',
    'workingday': 'bool',
    'weather': 'object'})
df.head()

Unnamed: 0,season,year,month,hour,holiday,weekday,workingday,weather,temp,feel_temp,humidity,windspeed,count
0,spring,0,1,0,True,Sat,True,clear,9.84,14.395,0.81,0.0,16
1,spring,0,1,1,True,Sat,True,clear,9.02,13.635,0.8,0.0,40
2,spring,0,1,2,True,Sat,True,clear,9.02,13.635,0.8,0.0,32
3,spring,0,1,3,True,Sat,True,clear,9.84,14.395,0.75,0.0,13
4,spring,0,1,4,True,Sat,True,clear,9.84,14.395,0.75,0.0,1


In [55]:
pd.date_range(start='2011-01-01', end='2012-12-31', periods=len(df.index)).date

array([datetime.date(2011, 1, 1), datetime.date(2011, 1, 1),
       datetime.date(2011, 1, 1), ..., datetime.date(2012, 12, 30),
       datetime.date(2012, 12, 30), datetime.date(2012, 12, 31)],
      dtype=object)

In [6]:
# Note the data type of the columns: we can use categorical features and
# numerical features for different kinds of visualizations.
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17379 entries, 0 to 17378
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   season      17379 non-null  object 
 1   year        17379 non-null  object 
 2   month       17379 non-null  object 
 3   hour        17379 non-null  object 
 4   holiday     17379 non-null  bool   
 5   weekday     17379 non-null  object 
 6   workingday  17379 non-null  bool   
 7   weather     17379 non-null  object 
 8   temp        17379 non-null  float64
 9   feel_temp   17379 non-null  float64
 10  humidity    17379 non-null  float64
 11  windspeed   17379 non-null  float64
 12  count       17379 non-null  int64  
dtypes: bool(2), float64(4), int64(1), object(6)
memory usage: 1.5+ MB


# Visualization
`ipyvizzu` works by encoding columns of data into visualization _channels_ like the x and y axes, color, shape, size, opacity, or data labels. It then creates smooth animations to show the data from different perspectives.

`ipyvizzu-story` builds that framework into slides: each slide is used to share one aspect of a story, and can include multiple animation _steps_ that lead to the final chart layout for that slide, which seamlessly transitions into the next slide using arrow-key navigation.

## The story
Imagine your manager asks you to perform an analysis on the bike sharing dataset. She has a few questions and wants you to investigate further.

Let's use the interactive visualzation capabilities of `ipyvizzu-story` to provide perspective and guide her through a story about this data.

## The questions
1. Are weekends or weekdays busier?
2. What are the seasonal patterns of rentals? What are the busiest days of the year?
3. How does the weather impact bike rentals? Do warm days have more riders than cool days? Does rain reduce rentals?
4. What is the average utilization of bikes across the fleet?

### Load the data

In [48]:
# Initialize the data object, which is used for filtering
data = Data()
data.add_data_frame(df)

# Initialize the Story (presentation). The data won't change after the Story
# is created, but you can filter the data as needed.
story = Story(data=data)

# Set the display size, CSS-style. Default is 800px by 480px
story.set_size(width='100%', height='400px')

# Add tooltips to be displayed on hover
story.set_feature("tooltip", True)

# Choose which slide the story will start on
# story.start_slide = 3

### Step 2: create slides

What are the busiest times each day?

In [49]:
slide1 = Slide()
slide1.add_step(
    Step(
        Config({
            'x': 'hour',
            'y': 'count',
            'title': 'Bike rentals by hour',
            'label': 'count'}),
        Style({
            'plot.marker.label.numberFormat': 'prefixed',
            'plot.marker.label.maxFractionDigits': 0})
    )
)

slide2 = Slide(
    Step(
        Config({
            'y': ['weekday', 'count'],
            'color': 'weekday',
            'label': None,  # remove the label
            'title': 'Bike rentals by hour, by weekday'}
        )
    )
)

slide2.add_step(Step(Config({'geometry': 'area'})))

slide3 = Slide(Step(Config({'align': 'stretch'})))

slide4 = Slide(Step(Config({'split': True, 'align': 'min'})))  # 'max', 'none', 'center', 'min', 'stretch'

slide5 = Slide(
    Step(
        Config.heatmap({
            'x': 'hour',
            'y': 'weekday',
            'lightness': 'count',
            'title': 'Bike rentals by hour, by weekday: heatmap'}
        ),
        Style({
            'plot.marker.rectangleSpacing': 0
        })
    )
)

story.add_slide(slide1)
story.add_slide(slide2)
story.add_slide(slide3)
story.add_slide(slide4)
story.add_slide(slide5)
story

### Step 2: build a slide

In [22]:
slide1 = Slide()
slide1.add_step(
    Step(
        Config({'x': 'hour', 'y': 'count', 'title': 'Bike rentals by hour'})
    )
)
slide1.add_step(
    Step(
        Config(
            {'x': 'hour',
            'y': ['count', 'weekday'],
            'color': 'weekday',
            'title': 'Bike rentals by hour, by weekday',
            'geometry': 'area'})
    )
)

story.add_slide(slide1)

slide2 = Slide(
    Step(
        Config({
            'x': 'season',
            'y': ['weather', 'count'],
            'color': 'weather',
            'geometry': 'rectangle'})
    )
)

story.add_slide(slide2)

story

In [14]:
# Create a Step, or a chart animation
slide1 = Slide(
    Step(
        # Use Data to filter
        Data.filter("record['weather'] == 'clear'"),

        # Use Config to choose the type of chart
        Config(
            dict(
                x = 'hour',
                y = 'count',
                # label = 'count',
                geometry = 'area',  # options: 'area', 'circle', 'line', 'rectangle'
                title = 'Bike rentals by time of day, on clear days'
            )
        ),
        # Use Style to adjust the display, like the font size, text orientation, and spacing
        Style(
            {
                'title': {'fontSize': '24px'},
                'plot.marker.colorPalette': '#7c2727' # or, 'plot': {'marker': {'colorPalette': '#7c2727'}}
                # 'plot.marker.label.maxFractionDigits': '-3'
            })
    )
)

story.add_slide(slide1)

story.play()
# Or, just use as the last line in the cell:
# story

In [None]:
# Create a second slide
# Create a Step, or a chart animation
slide2 = Slide(
    Step(
        # Use Data to filter
        Data.filter("record['weather'] == 'clear'"),

        # Use Config to choose the type of chart
        Config(
            dict(
                x = ['hour'],
                y = ['count', 'weekday'],
                label = None,
                color = ['weekday'],
                # legend = ['col_name'],
                # lightness = ['col_name'],
                # size = ['col_name'],
                # sort = 'byValue',  # or None
                # reverse = True,
                geometry = 'area',  # options: 'area', 'circle', 'line', 'rectangle'
                title = 'Bike rentals by time of day, by weekday, on clear days'
            )
        )
    )
)

story.add_slide(slide2)

story.play()

In [7]:
# you can export the Story into a html file
 
# story.export_to_html(filename="mystory.html")
 
# # or you can get the html Story as a string
 
# html = story.to_html()
# print(html)
 
 
# # you can display the Story with the `play` method
 
# story.play()
 
 
# # or you can also use the `_repr_html_` method.
 
# # story