# Plotly and ipywidgets

This notebook describes how to interconnect plotly plots with jupyter notebook widgets (ipywidgets). 

## Environment preparation

We will prepare the environment and use the French real estate dataset used in past lectures. 

In [1]:
import plotly.graph_objects as go
import pandas as pd

In [2]:
FILE = (
    "https://files.data.gouv.fr/geo-dvf/latest/csv/2022/"
    "departements/75.csv.gz"
)
df = pd.read_csv(FILE, compression="gzip")

  df = pd.read_csv(FILE, compression="gzip")


Keep relevant columns with non-missing values: 

In [3]:
df = df[df["nature_mutation"] == "Vente"]
df = df[["type_local", "surface_reelle_bati", "valeur_fonciere", "nombre_pieces_principales", "latitude", "longitude"]].dropna()
df.shape


(48755, 6)

## Introducing ipywidgets

Ipywidgets is a python package that allows inserting interacting widgets in jupyter notebooks. 

We'll illustrate this with the following example. 

Imagine we have a function that returns descriptive statistics of a subset of the data.

The subset is specified by: 
- The dwellings which have a given number of rooms (`n_rooms`).
- The dwellings which have a given type (`type_local`).
- The dwellings which have a surface smaller than a given value (`max_surface`). 

In [4]:
def describe_group(n_rooms, max_surface, type_local):
    df_group = df[df["type_local"] == type_local]
    df_group = df_group[df_group["nombre_pieces_principales"] == n_rooms]
    df_group = df_group[df_group["surface_reelle_bati"] < max_surface]
    return df_group.describe().transpose()

In [5]:
# Example:
describe_group(2, 100, type_local="Appartement")

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
surface_reelle_bati,14471.0,39.19252,12.5682,8.0,30.0,37.0,46.0,99.0
valeur_fonciere,14471.0,1918963.0,12580510.0,1.0,315000.0,415000.0,590000.0,606210300.0
nombre_pieces_principales,14471.0,2.0,0.0,2.0,2.0,2.0,2.0,2.0
latitude,14471.0,48.86295,0.02060604,48.819412,48.84555,48.863903,48.882041,48.90057
longitude,14471.0,2.344704,0.0355652,2.25718,2.320098,2.347069,2.373924,2.410879


### Understanding interact

`interact` is a function from the `ipywidgets` library in Python that automatically creates widgets that allow selecting values by the user and run a "target" function with the given values. 

In short, it offers a simple way to create a user interface for a function.

The arguments you give to `interact` are as follows:

1. **Function**: The first argument is the function that you want to interact with. This function will be called whenever the interactive widgets are manipulated.

2. **Keyword arguments**: The remaining arguments are keyword arguments, where each keyword corresponds to an argument of the function. The value of each keyword argument specifies the default value of the corresponding function argument and determines the type of widget that will be created for that argument.

    - If the value is a boolean, a checkbox is created.
    - If the value is a string, a text box is created.
    - If the value is an integer or a float, a slider is created.
    - If the value is a list or a dictionary, a dropdown menu is created.
    

3. **Fixed values**: If you want to set a function argument to a fixed value, you can use the `fixed` function from `ipywidgets`. This will set the argument to the specified value and no widget will be created for this argument.


In [6]:
from ipywidgets import interact, fixed
import ipywidgets as widgets

interact(describe_group, n_rooms=2, max_surface=100, type_local=fixed("Appartement"))

interactive(children=(IntSlider(value=2, description='n_rooms', max=6, min=-2), IntSlider(value=100, descripti…

<function __main__.describe_group(n_rooms, max_surface, type_local)>

You can also use the specific widgets such as `widget.IntSlider` or `widget.Dropdown` to specify the behavior of widgets in more detail: 

In [7]:
interact(describe_group, n_rooms=widgets.IntSlider(min=1, max=6, step=1, value=2), max_surface=widgets.IntSlider(min=10, max=200, step=10, value=100), type_local=widgets.Dropdown(options=df["type_local"].unique(), value="Appartement"))

interactive(children=(IntSlider(value=2, description='n_rooms', max=6, min=1), IntSlider(value=100, descriptio…

<function __main__.describe_group(n_rooms, max_surface, type_local)>

### Interact as a decorator

`interact` also offers a decorator syntax:

In [8]:
# Using interact as a decorator

@interact(n_rooms=widgets.IntSlider(min=1, max=6, step=1, value=2), max_surface=widgets.IntSlider(min=10, max=200, step=10, value=100), type_local=widgets.Dropdown(options=df["type_local"].unique(), value="Appartement"))
def describe_group2(n_rooms, max_surface, type_local):
    df_group = df[df["type_local"] == type_local]
    df_group = df_group[df_group["nombre_pieces_principales"] == n_rooms]
    df_group = df_group[df_group["surface_reelle_bati"] < max_surface]
    return df_group.describe().transpose()

interactive(children=(IntSlider(value=2, description='n_rooms', max=6, min=1), IntSlider(value=100, descriptio…

## Combining ipywidgets and plotly

A common use case is to use widgets to control what is plotted with plotly. 

However, using interact only does not produce great results. The best option is to insert the plotly plot in a container. Let's see with a specific example: 

### Use case: Clustering real estate properties

Let's say we are business analysts and want to find "submarkets" (groups of dwelllings with similar features). We apply a clustering algorithm and plot the clusters using plotly: 

In [9]:
# cluster the data according to latitude, longitude, price, and surface
# (normalize the data first)

from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

scaler = StandardScaler()
X = df[["latitude", "longitude", "valeur_fonciere", "surface_reelle_bati"]]
X = scaler.fit_transform(X)

kmeans = KMeans(n_clusters=5)
df["cluster"] = kmeans.fit_predict(X)

fig = go.Figure()
for cluster in df["cluster"].unique():
    df_cluster = df[df["cluster"] == cluster]
    fig.add_trace(go.Scatter(
        x=df_cluster["latitude"],
        y=df_cluster["longitude"],
        mode="markers",
        marker={"size": 10},
        name=f"Cluster {cluster}"
    ))

fig.show()

A situation in clustering is to decide the number of clusters qualitatively. Say we want to add a Dropdown to select the number of clusters, repeat the clustering and display the result. 

The recommended process is: 

- Create a Dropdown widget to select the number of clusters. 
- Create a `FigureWidget` object to plot the data.  
- Write a function that specifies how to change the plot given a new value of `n`.
- Put the Dropdown and the FigureWidget into a Container object. 
- Display the container object. 

In [10]:
n = widgets.IntSlider(min=1, max=10, step=1, value=1)

interactive_fig = go.FigureWidget(
    go.Scatter(
        x=df["latitude"],
        y=df["longitude"],
        mode="markers",
        marker={"size": 5}
    )
)

container = widgets.VBox([n, interactive_fig])

def update_clusters(n_clusters):
    if n_clusters <= 1:
        with interactive_fig.batch_update():
            interactive_fig.data[0].marker.color = "blue"
        return
    kmeans = KMeans(n_clusters=n_clusters)
    df["cluster"] = kmeans.fit_predict(X)
    with interactive_fig.batch_update():
        for cluster in df["cluster"].unique():
            df_cluster = df[df["cluster"] == cluster]
            interactive_fig.data[0].marker.color = df["cluster"]

n.observe(lambda change: update_clusters(change.new), names="value")

container


VBox(children=(IntSlider(value=1, max=10, min=1), FigureWidget({
    'data': [{'marker': {'size': 5},
        …

In [11]:
from plotly.subplots import make_subplots

n = widgets.IntSlider(min=1, max=10, step=1, value=1)

interactive_fig2 = go.FigureWidget(
    make_subplots(rows=1, cols=2)
)

interactive_fig2.add_trace(
    go.Scatter(
        x=df["latitude"],
        y=df["longitude"],
        mode="markers",
        marker={"size": 5}
    ), row=1, col=1)

interactive_fig2.add_trace(
    go.Histogram(
    ), row=1, col=2)

container = widgets.VBox([n, interactive_fig2])

def update_clusters(n_clusters):
    if n_clusters <= 1:
        with interactive_fig2.batch_update():
            interactive_fig2.data[0].marker.color = "blue"
        return
    kmeans = KMeans(n_clusters=n_clusters)
    df["cluster"] = kmeans.fit_predict(X)
    with interactive_fig2.batch_update():
        for cluster in df["cluster"].unique():
            df_cluster = df[df["cluster"] == cluster]
            interactive_fig2.data[0].marker.color = df["cluster"]
            interactive_fig2.data[1].x = df["cluster"]
            

n.observe(lambda change: update_clusters(change.new), names="value")

container


VBox(children=(IntSlider(value=1, max=10, min=1), FigureWidget({
    'data': [{'marker': {'size': 5},
        …