# Interactive 2-Level Plotly Explorer

**What is this notebook?**  
This is an **interactive data exploration tool** built with **Plotly** and **ipywidgets** for the **Bank Marketing dataset**.  
It allows you to investigate the data through two levels of selection:
1. **Group** → a category of related variables (Demography, Finance & Credit, Current contact, History)  
2. **Variable** → a specific feature within that group.

**Why was it created?**  
To **quickly analyze and compare** how different customer attributes relate to the target variable `y` (whether the client subscribed to a term deposit).  
It removes repetitive coding by letting you switch variables instantly, which speeds up **Exploratory Data Analysis (EDA)** and makes it more visual and interactive.

**Who is it for?**  
- **For us, the data scientists project team** exploring the dataset before modeling,  looking for customer insights and patterns.

**When should you use it?**  
During the **EDA phase** of your project, when you want to:  
- Understand variable distributions.  
- Identify trends, patterns, or outliers.  
- Compare numeric vs categorical features in relation to the target.

**Where can it be used?**  
In any **Jupyter Notebook environment** (local, JupyterLab, Google Colab ) as part of our data science workflow.

**Inputs**  
- CSV file: `bank-full.csv` with all relevant columns.  
- Group selection (Demography, Finance & Credit, Current contact, History).  
- Variable selection within the chosen group.

**How does it work?**  
1. **Choose a Group** in the first dropdown menu.  
2. The **Variable menu** updates automatically to show only the variables in that group.  
3. The notebook detects the variable type:  
   - **Numeric** → grouped histogram or box plot.  
   - **Categorical** → stacked bar chart.  
4. Visuals use a **consistent color scheme** (green = "Yes", orange = "No") and clear legends.  
5. Results update **instantly** when you change selections, allowing rapid comparisons.


**Outputs**  
- An **interactive Plotly chart** adapted to the selected variable type.  
- Automatic updates of the chart when changing group or variable.  



## 1. Imports
Import the Python libraries needed for:
- Data manipulation (pandas, numpy)
- Interactive charts (plotly.express)
- Widgets for UI (ipywidgets)
- Notebook display formatting (IPython.display)


In [49]:
# What : Here I import the libraries needed for the 2-level interactive Plotly explorer
# Why  : Because I want to manipulate the data, create interactive charts, and build a widget-based UI

import pandas as pd
import numpy as np
import plotly.express as px
import ipywidgets as widgets
from IPython.display import display


## 2. Load Data and Prepare Variables
Load the dataset, define variable groups, cast categorical columns, and set visualization options (colors, category orders).


In [50]:
# What : Here I load the dataset and prepare it for the 2-level explorer
# Why  : Because I need a clean, structured dataset with categories and formatting ready for visualization
# How  : To do this, I read the CSV, define the groups/columns, and set options (types, colors, orders)

# 1. Load data
DATA_PATH = "../data/bank-full.csv"   # adjust if needed
df = pd.read_csv(DATA_PATH, sep=";")  # read the CSV file

# 2. Define groups and their variables
GROUP_VARS = {
    "Demography": ["age", "job", "marital", "education"],
    "Finance & Credit": ["default", "balance", "housing", "loan"],
    "Current contact": ["contact", "day", "month", "duration"],
    "History": ["campaign", "pdays", "previous", "poutcome"],
}

# 3. Create relevant subset
all_cols = sorted({c for cols in GROUP_VARS.values() for c in cols} | {"y"})
df_sub = df[all_cols].copy()

# 4. Cast categorical columns
cat_cols = ["job","marital","education","default","housing","loan","contact","month","poutcome","y"]
for c in cat_cols:
    if c in df_sub.columns:
        df_sub[c] = df_sub[c].astype("category")
if "y" in df_sub:
    df_sub["y"] = df_sub["y"].cat.set_categories(["yes","no"])

# 5. Set colors and category orders
COLOR_MAP = {"yes": "#2ca02c", "no": "#ff7f0e"}  # green / orange
MONTH_ORDER = ["jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"]
if "month" in df_sub and set(df_sub["month"].astype(str).str.lower()).issubset(MONTH_ORDER):
    df_sub["month"] = df_sub["month"].cat.reorder_categories(MONTH_ORDER, ordered=True)


## 3. Create Menus (Group and Variable)
Create two dropdown menus:
- Group → selects a set of related variables
- Variable → updates dynamically based on the chosen group


In [51]:
# What : Here I create the menus (Group, Variable)
# Why  : Because I want to select a group of variables first, then choose a specific variable dynamically
# How  : To do this, I create two dropdowns and link the Variable menu to the Group menu

# 1. Instantiate menus
group_dd = widgets.Dropdown(options=list(GROUP_VARS.keys()), description="Group")
var_dd   = widgets.Dropdown(options=GROUP_VARS["Demography"], description="Variable")

# 2. Sync Variable with Group
def _on_group(change):
    g = change["new"]
    var_dd.options = GROUP_VARS[g]    # update variable options
    var_dd.value   = GROUP_VARS[g][0] # reset to first variable
group_dd.observe(_on_group, names="value")  # listen for group change


## 4. Render Plot
Display the appropriate chart based on:
- Group and Variable selected
- Variable type (numeric vs categorical)

Logic:
- Numeric (Demography) → grouped histogram (Yes/No)
- Categorical (Demography) → stacked bar chart
- Finance & Credit numeric → box plot
- Finance & Credit categorical → stacked bar chart
- Current contact duration → grouped histogram
- Current contact date fields → stacked bar chart
- History numeric → grouped histogram
- History categorical → stacked bar chart


In [52]:
# What : Here I display the right chart according to the selected group and variable
# Why  : Because I want to compare Yes/No distributions or subscription rates depending on variable type
# How  : To do this, I apply specific chart logic for each group and variable type, with consistent titles/colors/legends

def render_plot(group, var):
    # Force certain columns as categorical
    force_categorical = {"day"} if group == "Current contact" else set()
    is_numeric = (df_sub[var].dtype.kind in "biufc") and (var not in force_categorical)

    title = f"{var.capitalize()} vs Target (y)"  # consistent title format

    # Demography
    if group == "Demography":
        if var == "age":
            fig = px.histogram(
                df_sub, x="age", color="y",
                nbins=30, histnorm="percent", barmode="group",
                color_discrete_map=COLOR_MAP, title=title
            )
        else:
            fig = px.histogram(
                df_sub, x=var, color="y", barmode="stack",
                color_discrete_map=COLOR_MAP, title=title
            )
            fig.update_layout(xaxis_tickangle=-45)

    # Finance & Credit
    elif group == "Finance & Credit":
        if is_numeric:
            fig = px.box(
                df_sub, x="y", y=var, color="y",
                color_discrete_map=COLOR_MAP, points="outliers", title=title
            )
        else:
            fig = px.histogram(
                df_sub, x=var, color="y", barmode="stack",
                color_discrete_map=COLOR_MAP, title=title
            )
            fig.update_layout(xaxis_tickangle=-30)

    # Current contact
    elif group == "Current contact":
        if var == "duration":
            fig = px.histogram(
                df_sub, x="duration", color="y",
                nbins=40, histnorm="percent", barmode="group",
                color_discrete_map=COLOR_MAP, title=title
            )
        else:
            x_series = df_sub[var].astype(str) if var == "day" else df_sub[var]
            fig = px.histogram(
                df_sub.assign(_x=x_series), x="_x", color="y", barmode="stack",
                color_discrete_map=COLOR_MAP, title=title
            )
            fig.update_layout(xaxis_title=var, xaxis_tickangle=-45)

    # History
    else:
        if var in {"campaign", "pdays", "previous"} and is_numeric:
            fig = px.histogram(
                df_sub, x=var, color="y",
                nbins=30, histnorm="percent", barmode="group",
                color_discrete_map=COLOR_MAP, title=title
            )
        else:
            fig = px.histogram(
                df_sub, x=var, color="y", barmode="stack",
                color_discrete_map=COLOR_MAP, title="Poutcome vs Target (y)"
            )
            fig.update_layout(xaxis_tickangle=-30)

    # Rename traces
    fig.for_each_trace(lambda t: t.update(
        name="Yes — They Subscribed" if t.name == "yes" else "No — They Didn't"
    ))

    # Legend settings
    fig.update_layout(
        legend_title_text="Subscription Status",
        legend=dict(traceorder="reversed")
    )

    # Y-axis title if histogram is in percent
    if any(getattr(t, "histnorm", None) == "percent" for t in fig.data):
        fig.update_yaxes(title="Percent")

    fig.show()

# Display UI
out = widgets.interactive_output(render_plot, {"group": group_dd, "var": var_dd})
display(widgets.HBox([group_dd, var_dd]))
display(out)


HBox(children=(Dropdown(description='Group', options=('Demography', 'Finance & Credit', 'Current contact', 'Hi…

Output()