## Data Visualization
### Interactivity & Animation
- Xinali Zeng

- SOE, Xiamen University



### 🎯 Today's Goals: plotly 

#### Interactivity

#### Animation


### Interactive Visualization
#### Interactivity: Creating engaging, dynamic storytelling through interaction.

- Examples of interactive visualizations for statistical concepts: https://rpsychologist.com/viz

#### Interactive Tools
- plotly package:
Adds interactivity to existing plots (such as zooming, tooltips, and animation).




### Plotly
Plotly is a new-generation data visualization library for Python, R, MATLAB, Julia, etc, built on top of plotly.js. It offers comprehensive interactive capabilities and flexible plotting options.

Reference documentation:   https://plotly.com/python/



Compared to earlier libraries such as Matplotlib and Seaborn, Plotly takes data visualization to a new level. It comes with built-in interactivity and editing tools, supports both online and offline modes, and provides a stable API for integration with existing applications. Plotly charts can be displayed directly in web browsers or saved as local copies.

Almost all the plots we learned can be drawn using Plotly. Additionally, Plotly contains some datasets that we can use for practice.

#### install and import the plotly package:

In your command line (cmd):

- pip install plotly

In python:
- import plotly as px

In [None]:
import plotly.express as px
print(dir(px.data))


Two main ways of draw ploty with plotly
- plotly.express
- plotly.graph_objects

### Basic plots by plotly

In [None]:
import plotly.express as px
import seaborn as sns
import pandas as pd

# 加载数据并计数
df = sns.load_dataset("penguins")
counts = df['species'].value_counts().reset_index()
counts.columns = ['species', 'count']
# 自定义颜色
color_map = {
    'Adelie': '#1f77b4',      # 蓝
    'Chinstrap': '#ff7f0e',   # 橙
    'Gentoo': '#2ca02c'       # 绿
}
# 绘图
fig = px.bar(counts,x='species',y='count',title='Penguin Species Count',color='species',color_discrete_map=color_map,text='count')
fig.show()

### Bar plot

In [None]:
import plotly.express as px
import seaborn as sns
import pandas as pd

# 加载 penguins 数据
df = sns.load_dataset("penguins").dropna(subset=['species', 'sex'])
counts = df.groupby(['species', 'sex']).size().reset_index(name='count')
fig = px.bar(counts,x='species',y='count',color='sex',barmode='stack',text='count',title='Count of Penguins by Species and Sex',
    color_discrete_map={'Male': '#1f77b4', 'Female': '#ff7f0e'}
)
fig.show()

### Pie plot

In [None]:
import plotly.express as px
import pandas as pd
import seaborn as sns

# 加载 penguins 数据
df = sns.load_dataset("penguins")
species_counts = df['species'].value_counts().reset_index()
species_counts.columns = ['species', 'count']

# 设置颜色手动映射（可选）
color_map = {'Adelie': '#636EFA','Gentoo': '#EF553B','Chinstrap': '#00CC96'}

# 添加透明度 alpha（通过 rgba 颜色实现）
color_map_alpha = {
    'Adelie': 'rgba(99,110,250,0.5)',     # 深蓝，稍透明
    'Gentoo': 'rgba(239,85,59,1)',      # 橙红，不透明
    'Chinstrap': 'rgba(0,204,150,0.3)'    # 绿色，更透明
}

# 绘制饼图
fig = px.pie(species_counts,names='species',values='count',
    title='Proportion of Penguin Species',
    color='species',  # 告诉 plotly 使用 color_map
    color_discrete_map=color_map_alpha
)

# pull 出某一块突出（如 Gentoo）
fig.update_traces(pull=[0, 0.1, 0])  # 对应 ['Adelie', 'Gentoo', 'Chinstrap']

# 标签 & 字体美化
fig.update_traces(textinfo='label+percent+value', textfont_size=14)

# 图形标题字体美化
fig.update_layout(title_font_size=20)

fig.show()

### Rectangle Area Chart

In [None]:
df_gap = px.data.gapminder()
df_2007 = df_gap[df_gap['year'] == 2007].copy()
df_2007['total_gdp'] = df_2007['pop'] * df_2007['gdpPercap']
df_top15 = df_2007.sort_values(by='total_gdp', ascending=False).head(15)

fig_treemap = px.treemap(df_top15, path=['country'], values='total_gdp',
                         title='Treemap - Top 15 Countries by Total GDP (2007)')
fig_treemap.show()

In [None]:

df = px.data.gapminder()
df_2007 = df[df['year'] == 2007].copy()

# 计算总 GDP（单位：美元）
df_2007['total_gdp'] = df_2007['gdpPercap'] * df_2007['pop']

fig = px.treemap(df_2007,
                 path=['continent', 'country'],
                 values='total_gdp',
                 color='continent',
                 color_discrete_sequence=px.colors.qualitative.Pastel,  
                 title='Treemap: Total GDP by Country in 2007')
fig.show()

### Histogram

In [None]:
import plotly.express as px
import seaborn as sns
import pandas as pd

# 加载 penguins 数据
df = sns.load_dataset("penguins").dropna(subset=["body_mass_g", "species",])

# 构造图
fig = px.histogram(
    df,x="body_mass_g",color="species",
    labels={
        "body_mass_g": "Body Mass (g)",
    },
    title="Histogram of Penguin Body Mass",
    nbins=20,
    template="plotly_white"
)

# 美化字体大小
fig.update_traces(textfont_size=12)

fig.show()

### density plot: 

In [None]:
import plotly.graph_objects as go
from scipy.stats import gaussian_kde
import numpy as np
import seaborn as sns

# 加载数据
df = sns.load_dataset("penguins").dropna()
x = df[df["species"] == "Adelie"]["body_mass_g"]

# 核密度估计
kde = gaussian_kde(x)
x_range = np.linspace(x.min(), x.max(), 200)
y_vals = kde(x_range)

# 画图
fig = px.line(x=x_range, y=y_vals)

fig.update_layout(title='1D Density Plot of Adelie Body Mass', xaxis_title='Body Mass (g)', yaxis_title='Density')
fig.show()

### 2D density plot

In [None]:
import plotly.express as px

df = sns.load_dataset("penguins").dropna()


fig = px.density_contour(
    df,
    x="body_mass_g",
    y="bill_depth_mm",
    # color="species",  # 可选分组
    facet_col="species",

    marginal_x="box",  # x 轴边缘图
    marginal_y="box",  # y 轴边缘图
    title="2D Density Contour of Bill Dimensions"
)
fig.show()

In [None]:
fig = px.density_heatmap(
    df,
    x="body_mass_g",
    y="bill_depth_mm",
    nbinsx=30,
    nbinsy=30,
    color_continuous_scale="magma_r",
    facet_col="species",

    title="2D Density Heatmap of Penguin Bills"
)
fig.show()

Scatter Plot

In [None]:
import plotly.express as px
import seaborn as sns

# Load dataset and clean
penguins = sns.load_dataset("penguins").dropna(subset=["species", "sex", "body_mass_g", "bill_length_mm"])

# Create a custom tooltip column (same as text=... in ggplot2)
penguins["text"] = "sex: " + penguins["sex"]

# Rename columns to match R labels (optional, for labeling consistency)
penguins = penguins.rename(columns={
    "body_mass_g": "Body mass",
    "bill_length_mm": "Bill length"
})

# Create plotly plot
fig = px.scatter(
    penguins,
    x="Body mass",
    y="Bill length",
    color="species",
    hover_name="species",       # Show species as main title on hover
    hover_data={
        "text": True,           # Show sex info
        "species": False,       # Already shown in hover_name, hide here
        "Body mass": True,
        "Bill length": True
    },
    color_discrete_sequence=px.colors.qualitative.Set2,
    opacity=0.7
)

fig.update_traces(marker=dict(size=10))
fig.update_layout(title="Interactive Penguin Plot", title_x=0.5)
fig.show()

How about visualizing trend?

In [None]:
import plotly.express as px
import seaborn as sns

# Load dataset and clean
penguins = sns.load_dataset("penguins").dropna(subset=["species", "sex", "body_mass_g", "bill_length_mm"])

# Create a custom tooltip column (same as text=... in ggplot2)
penguins["text"] = "sex: " + penguins["sex"]

# Rename columns to match R labels (optional, for labeling consistency)
penguins = penguins.rename(columns={
    "body_mass_g": "Body mass",
    "bill_length_mm": "Bill length"
})

# Create plotly plot
fig = px.scatter(penguins,x="Body mass",y="Bill length",color="species",hover_name="species",
        hover_data={
        "text": True,           # Show sex info
        "species": False,       # Already shown in hover_name, hide here
        "Body mass": True,
        "Bill length": True
    },
    color_discrete_sequence=px.colors.qualitative.Set2,
    opacity=0.7,
trendline="ols")

fig.update_traces(marker=dict(size=10))
fig.update_layout(title="Interactive Penguin Plot", title_x=0.5)
fig.show()

Different functions in plotly
- px.bar(), pie(), px.treemap()
- px.histogram(), px.box(), px.violin()
- px.scatter(), px.line()
- px.density_contour(), px.density_heatmap()

Common Parameters Across All px Functions: 
- x, y:	Data axes
- color,size	Map variables to color/size
symbol, facet_col, facet_row	For split panels or symbol mapping
- marginal_x, marginal_y:	Add side plots('histogram', 'box', 'violin', 'rug')
- hover_name, hover_data:	Control tooltip details
- animation_frame:	Animate over a column (like year)
- labels:	Rename columns in legends/tooltips
- template:	Visual theme (e.g., 'plotly_white')

### hover_name, hover_data:	Control tooltip details


In [None]:
import plotly.express as px
import seaborn as sns

# 加载 penguins 数据
df = sns.load_dataset("penguins").dropna(subset=["body_mass_g", "bill_length_mm", "species", "sex", "island"])

# 构造散点图
fig = px.scatter(
    df,
    x="bill_length_mm",
    y="body_mass_g",
    color="species",
    hover_name="species",  # 显示大字标题
    hover_data={
        "sex": True,
        "island": False,
        "body_mass_g": ":.0f",
        "bill_length_mm": ":.1f"
    },
    labels={"bill_length_mm": "Bill Length (mm)",
        "body_mass_g": "Body Mass (g)"},
    title="Penguin Body Mass vs Bill Length by Species"
)

fig.show()

### plotly.graph_objects

- Low-level Plotly interface for full control

- Object-oriented API (go.Figure, go.Bar, go.Scatter, etc.)

- Unlike plotly.express, you manually construct every element

- Supports complex multi-layer plots, subplots, and custom hover



Anatomy of a go.Figure

In [None]:

import plotly.graph_objects as go

fig = go.Figure()

# Add trace (data layer)
fig.add_trace(go.Bar(x=['A', 'B'], y=[10, 15]))

# Layout settings
fig.update_layout(title="Basic Bar", template='plotly_white')

fig.show()

### different plots in go:
- barplot:	go.Bar()	
- histogram plot:	go.Histogram()	
- box plot:	go.Box()	
- violin plot: go.Violin()	
- scatter/line plot: 	go.Scatter()	mode='lines'

Example: Pyramid plot

In [None]:
import plotly.graph_objects as go
import seaborn as sns
import numpy as np
import pandas as pd

# 加载 penguins 数据
df = sns.load_dataset("penguins").dropna(subset=['body_mass_g', 'sex'])

# 分组
male_mass = df[df['sex'] == 'Male']['body_mass_g']
female_mass = df[df['sex'] == 'Female']['body_mass_g']

# 统一 bin
counts_m, bins = np.histogram(male_mass, bins=20)
counts_f, _ = np.histogram(female_mass, bins=bins)
bin_centers = 0.5 * (bins[:-1] + bins[1:])
heights = np.diff(bins)

# 创建图
fig = go.Figure()

fig.add_trace(go.Bar(
    x=counts_m,
    y=bin_centers,
    orientation='h',
    name='Male',
    marker_color='salmon',
    hovertemplate='Male: %{x} penguins<extra></extra>'
))

fig.add_trace(go.Bar(
    x=-counts_f,
    y=bin_centers,
    orientation='h',
    name='Female',
    marker_color='skyblue',
    hovertemplate='Female: %{customdata} <extra></extra>',
    customdata=counts_f  # 用于显示正数
))

# 坐标轴美化
fig.update_layout(title='Male vs Female Body Mass - Population Pyramid',xaxis_title='Frequency',
    yaxis_title='Body Mass (g)',barmode='overlay',bargap=0.1,template='plotly_white',legend=dict(x=0.75, y=1.05)
)

# x 轴标签去负号
xticks = fig.layout.xaxis.tickvals or np.linspace(-max(counts_f), max(counts_m), 5, dtype=int)
fig.update_xaxes(tickvals=xticks,ticktext=[str(abs(x)) for x in xticks])

fig.show()

### Suggestion about using plotly (or any visualization tasks): always seek help from LLMs!!!

In [None]:
import matplotlib.pyplot as plt
from palmerpenguins import load_penguins

penguins = load_penguins().dropna()

x = penguins["flipper_length_mm"]
y = penguins["body_mass_g"]
color_var = penguins["bill_length_mm"]

plt.figure(figsize=(8, 6))
sc = plt.scatter(x, y, c=color_var, cmap='rocket', alpha=0.7)

plt.xlabel("Flipper Length (mm)")
plt.ylabel("Body Mass (g)")
plt.title("Body Mass vs. Flipper Length (colored by Bill Length)")
plt.colorbar(sc, label="Bill Length (mm)")
plt.savefig('example_scatter')
plt.show()

<div>
<img src="./gpt1.png" width="2000"/>
</div>

<div>
<img src="./gpt2.png" width="2000"/>
</div>

In [None]:
import plotly.express as px
from palmerpenguins import load_penguins

# Load and clean the data
penguins = load_penguins().dropna()

# Create interactive scatter plot
fig = px.scatter(
    penguins,
    x="flipper_length_mm",
    y="body_mass_g",
    color="bill_length_mm",
    color_continuous_scale="rocket",
    opacity=0.7,
    labels={
        "flipper_length_mm": "Flipper Length (mm)",
        "body_mass_g": "Body Mass (g)",
        "bill_length_mm": "Bill Length (mm)"
    },
    title="Body Mass vs. Flipper Length (colored by Bill Length)"
)

# Customize layout if needed
fig.update_layout(
    coloraxis_colorbar=dict(title="Bill Length (mm)"),
    width=800,
    height=600
)

fig.show()

<div>
<img src="./gpt3.png" width="2000"/>
</div>

In [None]:
import plotly.express as px
from palmerpenguins import load_penguins

# Load and clean the data
penguins = load_penguins().dropna()

# Approximate 'rocket' colorscale from seaborn (normalized RGB tuples)
rocket_colorscale = [
    [0.0, "#03051A"],
    [0.1, "#3D0F3F"],
    [0.2, "#7B1C3F"],
    [0.3, "#B9322D"],
    [0.4, "#E34A1C"],
    [0.5, "#F26B1C"],
    [0.6, "#F79044"],
    [0.7, "#FBB56C"],
    [0.8, "#FCD29E"],
    [0.9, "#FEEDD0"],
    [1.0, "#FCFDBF"]
]

# Create the interactive scatter plot
fig = px.scatter(
    penguins,
    x="flipper_length_mm",
    y="body_mass_g",
    color="bill_length_mm",
    color_continuous_scale=rocket_colorscale,
    opacity=0.7,
    labels={
        "flipper_length_mm": "Flipper Length (mm)",
        "body_mass_g": "Body Mass (g)",
        "bill_length_mm": "Bill Length (mm)"
    },
    title="Body Mass vs. Flipper Length (colored by Bill Length)"
)

fig.update_layout(
    coloraxis_colorbar=dict(title="Bill Length (mm)"),
    width=800,
    height=600
)

fig.show()

<div>
<img src="./gpt4.png" width="2000"/>
</div>

In [None]:
import plotly.express as px
from palmerpenguins import load_penguins

# Load and clean the data
penguins = load_penguins().dropna()

# Approximate 'rocket' colorscale from seaborn (normalized RGB tuples)
rocket_colorscale = [
    [0.0, "#03051A"],
    [0.1, "#3D0F3F"],
    [0.2, "#7B1C3F"],
    [0.3, "#B9322D"],
    [0.4, "#E34A1C"],
    [0.5, "#F26B1C"],
    [0.6, "#F79044"],
    [0.7, "#FBB56C"],
    [0.8, "#FCD29E"],
    [0.9, "#FEEDD0"],
    [1.0, "#FCFDBF"]
]

# Create the interactive scatter plot
fig = px.scatter(
    penguins,
    x="flipper_length_mm",
    y="body_mass_g",
    color="bill_length_mm",
    color_continuous_scale=rocket_colorscale,
    opacity=0.7,
    labels={
        "flipper_length_mm": "Flipper Length (mm)",
        "body_mass_g": "Body Mass (g)",
        "bill_length_mm": "Bill Length (mm)"
    },
    title="Body Mass vs. Flipper Length (colored by Bill Length)"
)

# Enlarge marker size
fig.update_traces(marker=dict(size=10))  # You can adjust the value as needed

fig.update_layout(
    coloraxis_colorbar=dict(title="Bill Length (mm)"),
    width=800,
    height=600
)

fig.show()

### Basic about Dash

What is Dash?

- Dash is a Python framework for building interactive web applications.

- Developed by Plotly, built on top of Flask, React.js, and Plotly.js.

- No need for HTML, CSS, or JavaScript knowledge.

- Ideal for data science dashboards, analytical tools, and custom visualizations.

In [None]:
import dash
from dash import html, dcc
import plotly.express as px
import pandas as pd

# 准备数据
df = px.data.iris()

# 创建 Dash 应用
app = dash.Dash(__name__)

# 设置布局
app.layout = html.Div([
    html.H2("Iris 数据集图示"),
    dcc.Dropdown(
        id="feature",
        options=[{"label": col, "value": col} for col in df.columns if col != "species"],
        value="sepal_length"
    ),
    dcc.Graph(id="graph")
])

# 设置回调函数
@app.callback(
    dash.dependencies.Output("graph", "figure"),
    [dash.dependencies.Input("feature", "value")]
)
def update_graph(feature):
    fig = px.histogram(df, x=feature, color="species")
    return fig

# 运行服务器
if __name__ == "__main__":
    app.run(debug=True)  # 推荐写法


In [None]:
import dash
from dash import html, dcc
import plotly.express as px
from palmerpenguins import load_penguins

# 加载并清洗数据
df = load_penguins().dropna()

# 创建 Dash 应用
app = dash.Dash(__name__)

# 设置布局
app.layout = html.Div([
    html.H2("企鹅数据集图示（Y轴固定为 Body Mass）"),
    dcc.Dropdown(
        id="feature",
        options=[
            {"label": col, "value": col}
            for col in df.columns
            if col not in ["species", "body_mass_g", "sex", "year","island"]
        ],
        value="flipper_length_mm"
    ),
    dcc.Graph(id="graph")
])

# 设置回调函数
@app.callback(
    dash.dependencies.Output("graph", "figure"),
    [dash.dependencies.Input("feature", "value")]
)
def update_graph(feature):
    fig = px.scatter(
        df,
        x=feature,
        y="body_mass_g",
        color="species",
        opacity=0.7,
        labels={
            feature: feature.replace("_", " ").title(),
            "body_mass_g": "Body Mass (g)"
        },
        title=f"Body Mass vs. {feature.replace('_', ' ').title()}"
    )
    fig.update_traces(marker=dict(size=10))
    return fig

# 运行服务器
if __name__ == "__main__":
    app.run(debug=True)

### Animation plot

#### What is Animation?

- Adding time dimension to static visualizations
- Enables display of changes in data over time
- Creates engaging and dynamic content
- Useful for temporal data and complex patterns

#### Basic animation plot using matplotlib

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
 
fig = plt.figure()
x = np.arange(-np.pi, np.pi, 0.1)
 
ims = []
for a in range(80):
    y = np.sin(x - a / 20 *  np.pi)
    im1, = plt.plot(x, y, "b")  # x,y are both array
    im2, = plt.plot(x[0], y[0], marker='o', color='r' ) 
    ims.append([im1, im2])

ani=animation.ArtistAnimation(fig, ims, interval=33)
ani.save('sample.gif', writer='pillow')

#### This animation is frame-based:

- Each frame depicts a dynamically moving sine wave.
- The red dot im2 is always positioned at the beginning of the curve, creating a "motion trajectory" effect.
- ArtistAnimation collects all frames to generate a .gif.

How about plotly?

Ask GPT!


In [None]:
import numpy as np
import plotly.graph_objects as go

# Prepare data
x = np.arange(-np.pi, np.pi, 0.1)
frames = []
num_frames = 80

# Create figure with initial frame
y0 = np.sin(x - 0 / 20 * np.pi)
fig = go.Figure(
    data=[
        go.Scatter(x=x, y=y0, mode="lines", name="Sine Wave", line=dict(color="blue")),
        go.Scatter(x=[x[0]], y=[y0[0]], mode="markers", name="Start Point", marker=dict(color="red", size=10)),
    ],
    layout=go.Layout(
        title="Animated Sine Wave with Moving Phase",
        xaxis=dict(range=[-np.pi, np.pi], title="x"),
        yaxis=dict(range=[-1.2, 1.2], title="sin(x - phase)"),
        updatemenus=[dict(
            type="buttons",
            showactive=False,
            buttons=[dict(label="Play", method="animate", args=[None])]
        )]
    ),
    frames=[
        go.Frame(
            data=[
                go.Scatter(x=x, y=np.sin(x - a / 20 * np.pi), mode="lines", line=dict(color="blue")),
                go.Scatter(x=[x[0]], y=[np.sin(x[0] - a / 20 * np.pi)], mode="markers", marker=dict(color="red", size=10)),
            ]
        )
        for a in range(num_frames)
    ]
)

fig.show()


<div>
<img src="./gpt5.png" width="2000"/>
</div>

<div>
<img src="./gpt6.png" width="2000"/>
</div>

<div>
<img src="./gpt7.png" width="2000"/>
</div>

<div>
<img src="./gpt8.png" width="2000"/>
</div>

In [None]:
score_by_match_df = pd.read_csv('match_points')
score_by_match_df

In [None]:
score_by_match_df.index.name = "Round"

# 转换为 long-format
score_by_match_df_long = score_by_match_df.reset_index().melt(id_vars="Round", var_name="Team", value_name="Points")
frames = []
for round_num in sorted(score_by_match_df_long["Round"].unique()):
    frame_data = score_by_match_df_long[score_by_match_df_long["Round"] <= round_num].copy()
    frame_data["animation_round"] = round_num
    frames.append(frame_data)

df_anim = pd.concat(frames)

# 动画折线图
fig = px.line(
    df_anim,
    x="Round",
    y="Points",
    color="Team",
    animation_frame="animation_round",
    labels={"Round": "Match Round", "Points": "Total Points"},
    title="Premier League Points Growth by Round"
)

# 坐标范围固定，避免动画跳动
fig.update_layout(
    xaxis=dict(range=[df_anim["Round"].min(), df_anim["Round"].max()]),
    yaxis=dict(range=[0, df_anim["Points"].max() + 5]),
    height=500
)

fig.show()

In [None]:
### Hans Rosling Ted talk: Probably the best data visualization
- https://www.bilibili.com/video/BV1ns411o7kY/?spm_id_from=333.337.search-card.all.click
- https://www.bilibili.com/video/BV1954ReKE6Q/?spm_id_from=333.337.search-card.all.click

In [None]:
import plotly.express as px

# 加载 gapminder 数据
df = px.data.gapminder()

# 创建气泡图动画
fig = px.scatter(
    df,
    x="gdpPercap",
    y="lifeExp",
    size="pop",
    color="continent",
    hover_name="country",
    log_x=True,
    size_max=60,
    animation_frame="year",
    animation_group="country",
    range_x=[100, 100000],
    range_y=[25, 90],
    title="Gapminder 世界发展气泡图（模仿 Hans Rosling）"
)

fig.show()

When to use animation?
- it adds emphasis to your story or conclusion.
    - e.g. the race of Prime League
- it can add another useful dimension (often time).
    - e.g. the growth of economy

Do not use animation when it does not provide more information!