# 02 - Separation Gain

Compute separation gain metric.



Let‚Äôs turn our metric into a **guided notebook plan**, with clear purpose, inputs, grouping logic, computation, validation, and visualization ‚Äî so you don‚Äôt drift off or accidentally mix data across players.

---

# üß† Metric Build Guide: **Separation Gain**

---

## üéØ **Purpose**

Quantify how much *separation* (space between the receiver and their nearest defender) changes between the **throw moment** and the **catch/incompletion moment**.

It measures *how well a receiver creates or maintains space while the ball is in the air.*

---

## üß© **You‚Äôll Need**

From your reconstructed + feature dataset (the output of Step 4):

* `game_id`, `play_id`, `nfl_id`
* `x`, `y`, `frame_id`, `phase`
* `player_role` or `player_position` (to identify WRs vs defenders)
* `pass_result` (from supplementary data)
* optionally `team_side` (to separate offense/defense)

---

## ‚öôÔ∏è **Implementation Plan**

We‚Äôll do this in **three mini-stages** to keep it robust.

---

### **Stage 1 ‚Äî Identify key frames (throw & catch)**

**Purpose:** Define the time window for measuring separation.

**Action:**

```python
# Identify frame ranges per play
throw_frames = (
    df[df['phase'] == 'pre_throw']
      .groupby(['game_id','play_id'])['frame_id']
      .max()
      .reset_index(name='t_throw')
)

catch_frames = (
    df[df['phase'] == 'post_throw']
      .groupby(['game_id','play_id'])['frame_id']
      .max()
      .reset_index(name='t_catch')
)

key_frames = throw_frames.merge(catch_frames, on=['game_id','play_id'], how='inner')
```

**Validation:**

* `t_catch > t_throw` for all plays.
* Inspect 2‚Äì3 plays manually.

---

### **Stage 2 ‚Äî Compute per-frame separation**

**Purpose:** For each receiver frame, find the nearest defender‚Äôs distance.

**Action:**

We‚Äôll build a helper function that operates *per play* to avoid cross-play mixing.

```python
import numpy as np

def compute_separation(play_df):
    # split offense and defense
    receivers = play_df[play_df['player_role'] == 'receiver']
    defenders = play_df[play_df['player_role'] == 'defender']
    
    if receivers.empty or defenders.empty:
        return pd.DataFrame()  # skip invalid plays

    result_rows = []

    for r_id, r_data in receivers.groupby('nfl_id'):
        for t, frame in r_data.groupby('frame_id'):
            rx, ry = frame.iloc[0][['x','y']]
            # compute distance to all defenders at same frame
            def_frame = defenders[defenders['frame_id'] == t]
            if def_frame.empty:
                continue
            dists = np.sqrt((rx - def_frame['x'])**2 + (ry - def_frame['y'])**2)
            min_sep = dists.min()
            result_rows.append({
                'game_id': frame.iloc[0]['game_id'],
                'play_id': frame.iloc[0]['play_id'],
                'nfl_id': r_id,
                'frame_id': t,
                'separation': min_sep
            })
    return pd.DataFrame(result_rows)

sep_df = (
    df.groupby(['game_id','play_id'], group_keys=False)
      .apply(compute_separation)
      .reset_index(drop=True)
)
```

**Validation:**

* Plot a histogram of `separation`.
* Mean should be around 2‚Äì4 yards (typical WR‚ÄìCB gap).
* Check one random play visually:

  ```python
  sep_df.query("game_id==2023091010 and play_id==1234").head()
  ```

---

### **Stage 3 ‚Äî Compute Separation Gain**

**Purpose:** Collapse per-frame distances into a single value per receiver per play.

**Action:**

```python
# merge throw/catch frames
sep_summary = sep_df.merge(key_frames, on=['game_id','play_id'], how='left')

def get_gain(g):
    s_throw = g.loc[g['frame_id'] == g['t_throw'], 'separation'].mean()
    s_catch = g.loc[g['frame_id'] == g['t_catch'], 'separation'].mean()
    gain = s_catch - s_throw
    return pd.Series({
        'sep_throw': s_throw,
        'sep_catch': s_catch,
        'separation_gain': gain
    })

sep_metrics = (
    sep_summary.groupby(['game_id','play_id','nfl_id'], group_keys=False)
               .apply(get_gain)
               .reset_index()
)
```

**Validation:**

* Check `sep_throw`, `sep_catch`, `separation_gain` distributions.
* `separation_gain` should roughly center around 0.
* Positive ‚Üí gained space; negative ‚Üí defender closed in.

---

## üé® **Visualization**

Example: visualize 1 play showing WR vs nearest DB.

```python
import matplotlib.pyplot as plt

sample_play = df.query("game_id==2023091010 and play_id==1234")

receiver = sample_play[sample_play['player_role']=='receiver']
defenders = sample_play[sample_play['player_role']=='defender']

plt.plot(receiver['x'], receiver['y'], 'b-', label='Receiver')
for d_id, d in defenders.groupby('nfl_id'):
    plt.plot(d['x'], d['y'], 'r-', alpha=0.5)
plt.scatter(receiver.iloc[0]['x'], receiver.iloc[0]['y'], color='blue', label='Start')
plt.scatter(receiver.iloc[-1]['x'], receiver.iloc[-1]['y'], color='green', label='Catch')
plt.legend()
plt.title("Receiver vs Defenders Path ‚Äî Separation Visualization")
plt.show()
```

---

## üß™ **Cross-Metric Validation**

To verify that your metric makes sense:

```python
df_test = sep_metrics.merge(df[['game_id','play_id','pass_result']].drop_duplicates(), on=['game_id','play_id'])
df_test.groupby('pass_result')['separation_gain'].mean()
```

‚úÖ You should see higher `separation_gain` for **complete passes**.

---

## üíæ **Save Output**

```python
from pathlib import Path
output_path = Path("data/processed/separation_gain_metrics.parquet")
sep_metrics.to_parquet(output_path, index=False)
print(f"‚úÖ Separation Gain metrics saved at {output_path}")
```

---

## üß≠ **Quick Recap**

| Step                             | Goal                             | Output        |
| -------------------------------- | -------------------------------- | ------------- |
| 1Ô∏è‚É£ Identify throw/catch         | Define the time window           | `key_frames`  |
| 2Ô∏è‚É£ Compute per-frame separation | Find WR‚ÄìDB distance each frame   | `sep_df`      |
| 3Ô∏è‚É£ Collapse to single metric    | Difference between throw & catch | `sep_metrics` |
| 4Ô∏è‚É£ Visual + validation          | Verify in both plots + logic     | sanity plots  |

---



Perfect question. üî•
That right there ‚Äî ‚Äú**how do we make it *seen* and *understood***?‚Äù ‚Äî is the **real bridge between data science and football intelligence.**

Building the metric was just *Step 1*.
Now we need to **translate it into football meaning and visual clarity** ‚Äî the part that wins judges, coaches, and analysts alike.

Let‚Äôs make this *metric come alive.*

---

# üß≠ SEPARATION GAIN ‚Äî From Metric to Insight

## ‚öΩ What Coaches Care About

Coaches don‚Äôt care about ‚Äú‚àÜdistance per frame.‚Äù
They care about **‚Äúwho got open, when, and how much.‚Äù**

So, you‚Äôre going to *present the metric* in a way that answers:

1. üïì *When did the receiver create space?*
2. üìè *How much did that space change while the ball was in the air?*
3. üß† *Was it technique, timing, or defensive positioning that made it happen?*

---

## üß© Phase 1 ‚Äî Turn Numbers into Visual Stories

### **1Ô∏è‚É£ Play-Level Animation**

üéØ **Purpose:** Let coaches ‚Äúsee‚Äù separation evolve in motion.

**Implementation:**

* Use **Matplotlib‚Äôs animation**, **Plotly**, or **Bokeh** to animate `x, y` for receiver + nearest defender.
* Overlay a **line connecting** them that changes color as separation increases/decreases.

**Example (Plotly):**

```python
import plotly.express as px

sample_play = df.query("game_id==2023091010 and play_id==1234")
frames = []
for f, frame in sample_play.groupby('frame_id'):
    frame['sep_line'] = np.sqrt((frame['x'].max()-frame['x'].min())**2 + (frame['y'].max()-frame['y'].min())**2)
    frames.append(frame)

fig = px.scatter(frames, x='x', y='y', color='player_role', animation_frame='frame_id',
                 hover_data=['player_name','player_role'], title="Separation Over Time")
fig.show()
```

**Why:**
Coaches instantly grasp the story ‚Äî ‚ÄúThe WR started tight coverage, broke free by 2.5 yards near the sideline, and caught the ball clean.‚Äù

---

### **2Ô∏è‚É£ Separation Timeline Plot**

üéØ **Purpose:** Quantify when space was gained or lost.

**Implementation:**
Plot `separation(t)` (yards) over `frame_id`.

```python
import matplotlib.pyplot as plt

sample_sep = sep_df.query("game_id==2023091010 and play_id==1234 and nfl_id==54611")
plt.plot(sample_sep['frame_id'], sample_sep['separation'], label='Separation (yards)')
plt.axvline(x=key_frames.query("game_id==2023091010 and play_id==1234")['t_throw'].values[0], color='orange', linestyle='--', label='Throw')
plt.axvline(x=key_frames.query("game_id==2023091010 and play_id==1234")['t_catch'].values[0], color='green', linestyle='--', label='Catch')
plt.xlabel("Frame")
plt.ylabel("Separation (yards)")
plt.title("Receiver‚ÄìDefender Separation Over Time")
plt.legend()
plt.show()
```

**Why:**
This tells *how separation evolved dynamically* ‚Äî did the WR break free instantly or build distance gradually?

---

## üìä Phase 2 ‚Äî Aggregate Insights for Coaches

### **1Ô∏è‚É£ League-Wide Patterns**

Show a **distribution plot** of `separation_gain`.

```python
import seaborn as sns
sns.histplot(sep_metrics['separation_gain'], bins=30)
plt.title("Distribution of Separation Gain ‚Äî All Plays")
plt.xlabel("Separation Gain (yards)")
plt.show()
```

üí° Coaches can compare players or teams:

* +3 yds = elite route separation
* <0 = defender recovery or tight coverage

---

### **2Ô∏è‚É£ Team & Player Leaderboards**

```python
leaderboard = (
    sep_metrics.groupby('nfl_id')['separation_gain']
    .mean()
    .reset_index()
    .sort_values('separation_gain', ascending=False)
    .head(10)
)
print(leaderboard)
```

**Add context:**
Merge with `player_name` and `team` to show:

> ‚ÄúTop 10 receivers by average separation gain when targeted.‚Äù

‚úÖ *Football meaning:* ‚ÄúWho consistently gets open when it matters.‚Äù

---

### **3Ô∏è‚É£ Play-Type Comparisons**

Use `pass_result` or `player_position` to show context:

```python
sns.boxplot(x='pass_result', y='separation_gain', data=df_test)
plt.title("Separation Gain vs Pass Outcome")
plt.show()
```

* Completed passes ‚Üí higher separation gain
* Incomplete ‚Üí lower or negative gain

That‚Äôs *proof* your metric tracks real performance.

---

## üß† Phase 3 ‚Äî Turn Metrics into Coaching Narratives

**Example insights you could include in a report:**

> ‚ÄúAcross Week 1, receivers averaged +1.2 yards of separation gain on completions, but ‚Äì0.8 yards on incompletions.
> CBs who managed to reduce separation by over 1 yard during the ball flight were involved in 64% of defended passes.‚Äù

Or visual dashboards:

| Metric            | Meaning                   | Typical Range |
| :---------------- | :------------------------ | :------------ |
| `sep_throw`       | Initial distance at throw | 1‚Äì3 yds       |
| `sep_catch`       | Distance at catch         | 2‚Äì5 yds       |
| `separation_gain` | Change (catch‚Äìthrow)      | ‚Äì2 to +3 yds  |

---

## üí° Phase 4 ‚Äî Optional Advanced Visualization

If you want to go **beyond static plots**, you can use:

* **Plotly Dash App:** interactive dashboard where coaches pick a player or play and replay motion.
* **3D field map:** use real field scaling and team color-coding for realism.
* **Event overlay:** mark throw and catch moments on the field.

---

## üöÄ Final Note ‚Äî How Coaches Will See It

When presented right, your metric *doesn‚Äôt feel like data* ‚Äî it feels like *truth*.
You‚Äôre not saying:

> ‚ÄúSeparation gain = +2.1‚Äù

You‚Äôre showing:

> ‚ÄúThis receiver broke 2 yards free while the ball was midair ‚Äî that‚Äôs the difference between a completion and an incompletion.‚Äù

That‚Äôs what scouts, analysts, and position coaches want to *see*.

---

Would you like me to design a **simple interactive notebook dashboard layout** (using Plotly or Streamlit) that you can plug in your metrics and visualize separation per player or per play instantly? It‚Äôll turn your results into a coach-friendly interface.
