# 02 - Separation Gain

Compute separation gain metric.



Let‚Äôs turn our metric into a **guided notebook plan**, with clear purpose, inputs, grouping logic, computation, validation, and visualization ‚Äî so you don‚Äôt drift off or accidentally mix data across players.

---

# üß† Metric Build Guide: **Separation Gain**

---

## üéØ **Purpose**

Quantify how much *separation* (space between the receiver and their nearest defender) changes between the **throw moment** and the **catch/incompletion moment**.

It measures *how well a receiver creates or maintains space while the ball is in the air.*

---

## üß© **You‚Äôll Need**

From your reconstructed + feature dataset (the output of Step 4):

* `game_id`, `play_id`, `nfl_id`
* `x`, `y`, `frame_id`, `phase`
* `player_role` or `player_position` (to identify WRs vs defenders)
* `pass_result` (from supplementary data)
* optionally `team_side` (to separate offense/defense)

---

## ‚öôÔ∏è **Implementation Plan**

We‚Äôll do this in **three mini-stages** to keep it robust.

---

### **Stage 1 ‚Äî Identify key frames (throw & catch)**

**Purpose:** Define the time window for measuring separation.

**Action:**

```python
# Identify frame ranges per play
throw_frames = (
    df[df['phase'] == 'pre_throw']
      .groupby(['game_id','play_id'])['frame_id']
      .max()
      .reset_index(name='t_throw')
)

catch_frames = (
    df[df['phase'] == 'post_throw']
      .groupby(['game_id','play_id'])['frame_id']
      .max()
      .reset_index(name='t_catch')
)

key_frames = throw_frames.merge(catch_frames, on=['game_id','play_id'], how='inner')
```

**Validation:**

* `t_catch > t_throw` for all plays.
* Inspect 2‚Äì3 plays manually.

---

### **Stage 2 ‚Äî Compute per-frame separation**

**Purpose:** For each receiver frame, find the nearest defender‚Äôs distance.

**Action:**

We‚Äôll build a helper function that operates *per play* to avoid cross-play mixing.

```python
import numpy as np

def compute_separation(play_df):
    # split offense and defense
    receivers = play_df[play_df['player_role'] == 'receiver']
    defenders = play_df[play_df['player_role'] == 'defender']
    
    if receivers.empty or defenders.empty:
        return pd.DataFrame()  # skip invalid plays

    result_rows = []

    for r_id, r_data in receivers.groupby('nfl_id'):
        for t, frame in r_data.groupby('frame_id'):
            rx, ry = frame.iloc[0][['x','y']]
            # compute distance to all defenders at same frame
            def_frame = defenders[defenders['frame_id'] == t]
            if def_frame.empty:
                continue
            dists = np.sqrt((rx - def_frame['x'])**2 + (ry - def_frame['y'])**2)
            min_sep = dists.min()
            result_rows.append({
                'game_id': frame.iloc[0]['game_id'],
                'play_id': frame.iloc[0]['play_id'],
                'nfl_id': r_id,
                'frame_id': t,
                'separation': min_sep
            })
    return pd.DataFrame(result_rows)

sep_df = (
    df.groupby(['game_id','play_id'], group_keys=False)
      .apply(compute_separation)
      .reset_index(drop=True)
)
```

**Validation:**

* Plot a histogram of `separation`.
* Mean should be around 2‚Äì4 yards (typical WR‚ÄìCB gap).
* Check one random play visually:

  ```python
  sep_df.query("game_id==2023091010 and play_id==1234").head()
  ```

---

### **Stage 3 ‚Äî Compute Separation Gain**

**Purpose:** Collapse per-frame distances into a single value per receiver per play.

**Action:**

```python
# merge throw/catch frames
sep_summary = sep_df.merge(key_frames, on=['game_id','play_id'], how='left')

def get_gain(g):
    s_throw = g.loc[g['frame_id'] == g['t_throw'], 'separation'].mean()
    s_catch = g.loc[g['frame_id'] == g['t_catch'], 'separation'].mean()
    gain = s_catch - s_throw
    return pd.Series({
        'sep_throw': s_throw,
        'sep_catch': s_catch,
        'separation_gain': gain
    })

sep_metrics = (
    sep_summary.groupby(['game_id','play_id','nfl_id'], group_keys=False)
               .apply(get_gain)
               .reset_index()
)
```

**Validation:**

* Check `sep_throw`, `sep_catch`, `separation_gain` distributions.
* `separation_gain` should roughly center around 0.
* Positive ‚Üí gained space; negative ‚Üí defender closed in.

---

## üé® **Visualization**

Example: visualize 1 play showing WR vs nearest DB.

```python
import matplotlib.pyplot as plt

sample_play = df.query("game_id==2023091010 and play_id==1234")

receiver = sample_play[sample_play['player_role']=='receiver']
defenders = sample_play[sample_play['player_role']=='defender']

plt.plot(receiver['x'], receiver['y'], 'b-', label='Receiver')
for d_id, d in defenders.groupby('nfl_id'):
    plt.plot(d['x'], d['y'], 'r-', alpha=0.5)
plt.scatter(receiver.iloc[0]['x'], receiver.iloc[0]['y'], color='blue', label='Start')
plt.scatter(receiver.iloc[-1]['x'], receiver.iloc[-1]['y'], color='green', label='Catch')
plt.legend()
plt.title("Receiver vs Defenders Path ‚Äî Separation Visualization")
plt.show()
```

---

## üß™ **Cross-Metric Validation**

To verify that your metric makes sense:

```python
df_test = sep_metrics.merge(df[['game_id','play_id','pass_result']].drop_duplicates(), on=['game_id','play_id'])
df_test.groupby('pass_result')['separation_gain'].mean()
```

‚úÖ You should see higher `separation_gain` for **complete passes**.

---

## üíæ **Save Output**

```python
from pathlib import Path
output_path = Path("data/processed/separation_gain_metrics.parquet")
sep_metrics.to_parquet(output_path, index=False)
print(f"‚úÖ Separation Gain metrics saved at {output_path}")
```

---

## üß≠ **Quick Recap**

| Step                             | Goal                             | Output        |
| -------------------------------- | -------------------------------- | ------------- |
| 1Ô∏è‚É£ Identify throw/catch         | Define the time window           | `key_frames`  |
| 2Ô∏è‚É£ Compute per-frame separation | Find WR‚ÄìDB distance each frame   | `sep_df`      |
| 3Ô∏è‚É£ Collapse to single metric    | Difference between throw & catch | `sep_metrics` |
| 4Ô∏è‚É£ Visual + validation          | Verify in both plots + logic     | sanity plots  |

---

