# Exploration – Synthetic Data for Stroke Execution Quality

This notebook explores synthetic stroke-related features
and their relationship to execution quality labels.

The goal is to validate problem formulation and feature usefulness
before introducing machine learning models.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
np.random.seed(42)
N = 300

data = pd.DataFrame({
    "swing_speed": np.random.normal(30, 5, N),        # km/h
    "racket_angle": np.random.normal(0, 10, N),       # deviation from ideal (degrees)
    "time_pressure": np.random.normal(0.35, 0.05, N), # seconds
    "body_balance": np.random.uniform(0.5, 1.0, N),   # normalized
    "shuttle_height": np.random.normal(1.8, 0.3, N),  # meters
})

data.head()

## Feature Semantics

- **swing_speed**: approximates force generation and offensive intent.
- **racket_angle**: deviation from the ideal racket face angle at contact.
- **time_pressure**: a proxy for temporal constraint during stroke execution,
  reflecting how rushed the player is when initiating the stroke.
- **body_balance**: represents overall body stability and coordination at execution.
- **shuttle_height**: contact point height, describing the spatial context of execution.

In [None]:
score = (
    0.08 * data["swing_speed"]
    - 1.2 * data["time_pressure"]
    + 1.5 * data["body_balance"]
    - 0.05 * np.abs(data["racket_angle"])
    + 0.6 * data["shuttle_height"]
)

data["execution_score"] = score

In [None]:
threshold = data["execution_score"].median()
data["quality_label"] = (data["execution_score"] > threshold).astype(int)

data[["execution_score", "quality_label"]].head()

The distribution suggests that strokes executed under higher time pressure
are more likely to result in poor execution quality,
supporting the role of time pressure as a negative execution condition.

In [None]:
plt.figure(figsize=(6, 4))

data.boxplot(
    column="time_pressure",
    by="quality_label",
    grid=False
)

plt.suptitle("")
plt.title("Time Pressure vs Execution Quality")
plt.xlabel("Execution Quality (0 = poor, 1 = good)")
plt.ylabel("Time Pressure")

plt.show()


Time pressure alone shows a weak but consistent negative association
with execution quality, suggesting that additional contextual factors
(e.g., body balance) may moderate its effect.


In [None]:
# Split body_balance into high / low groups using median
balance_threshold = data["body_balance"].median()

data["balance_group"] = (
    data["body_balance"] > balance_threshold
).astype(int)
# 1 = high balance, 0 = low balance

In [None]:
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(10, 4), sharey=True)

# Low balance
data[data["balance_group"] == 0].boxplot(
    column="time_pressure",
    by="quality_label",
    ax=axes[0],
    grid=False
)
axes[0].set_title("Low Body Balance")
axes[0].set_xlabel("Execution Quality (0 = poor, 1 = good)")
axes[0].set_ylabel("Time Pressure")

# High balance
data[data["balance_group"] == 1].boxplot(
    column="time_pressure",
    by="quality_label",
    ax=axes[1],
    grid=False
)
axes[1].set_title("High Body Balance")
axes[1].set_xlabel("Execution Quality (0 = poor, 1 = good)")

plt.suptitle("Interaction: Time Pressure × Body Balance", fontsize=14)
plt.show()

The negative effect of time pressure on execution quality
is attenuated when body balance is high,
indicating an interaction between temporal pressure and physical stability.

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

features = [
    "time_pressure",
    "body_balance"
]

X = data[features]
y = data["quality_label"]

tree = DecisionTreeClassifier(
    max_depth=3,        # 控制深度，才看得懂
    min_samples_leaf=10, # 避免太碎
    random_state=42
)

tree.fit(X, y)

In [None]:
plt.figure(figsize=(16, 6))
plot_tree(
    tree,
    feature_names=features,
    class_names=["poor", "good"],
    filled=True,
    rounded=True
)
plt.title("Decision Tree for Stroke Execution Quality")
plt.show()

Decision Tree analysis shows that body balance is the primary factor
in determining execution quality, while time pressure plays a critical
role only when physical stability is insufficient.

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import pandas as pd

# 不含 interaction 的 features
features_no_interaction = [
    "time_pressure",
    "body_balance"
]

X_no = data[features_no_interaction]
y = data["quality_label"]

pipe_no = Pipeline([
    ("scaler", StandardScaler()),
    ("clf", LogisticRegression())
])

pipe_no.fit(X_no, y)

coef_no = pd.DataFrame({
    "feature": features_no_interaction,
    "coefficient": pipe_no.named_steps["clf"].coef_[0]
})

coef_no

In [None]:
coef_with = coef_df.copy()
coef_with["model"] = "with_interaction"

coef_no["model"] = "no_interaction"

compare = pd.concat([coef_with, coef_no], ignore_index=True)
compare

This comparison highlights that the interaction term is not merely an
additional feature, but a necessary structural component for maintaining
a meaningful and interpretable relationship between time pressure,
body balance, and execution quality.