# Part 4: Advanced Tactical Metrics Framework

To analyze tactical archetypes and tournament translation patterns, we calculate **12 metrics** across three dimensions: Possession, Progression, and Defensive systems.

---

### 4.1 Metric Categories Overview

Our 12 metrics are organized into 3 tactical dimensions:

| **Dimension** | **Metrics** | **Purpose** |
|--------------|-------------|-------------|
| **POSSESSION** | 1. Possession %<br>2. Field Tilt<br>3. Possession Value (EPR)<br>4. Sequence Length | Measure ball control quality and territorial dominance |
| **PROGRESSION** | 5. Progressive Passes<br>6. Progressive Carries<br>7. Progressive Actions<br>8. Packing* | Quantify forward ball movement and chance creation |
| **DEFENSIVE** | 9. PPDA<br>10. High Turnovers<br>11. Defensive Line Height<br>12. Defensive Actions by Zone | Evaluate pressing intensity and defensive structure |

*Supplementary metric (requires 360° data, available for 9.3% of matches)

**Calculation Levels:**
- **Team-level:** Aggregated per team per match/season (for archetype clustering)
- **Player-level:** Individual contributions per 90 minutes (for dashboard profiles)

### 4.2 POSSESSION METRICS

Possession metrics measure how teams control the ball, not just how much.

#### Metric 1: Possession Percentage (Team-level)

**Definition:** Percentage of events (touches) controlled by a team

**Calculation:**
```python
possession_pct = (team_events / total_match_events) * 100
```

**Output:** `possession_overall_pct.csv`

**Why It Matters:**
- Foundation metric separating dominant vs reactive styles
- Range: 30-70% (extremes indicate tactical identity)
- Barcelona-style possession: 60-70%
- Counter-attacking teams: 35-45%

**Interpretation:**
- High possession ≠ dominance (must combine with Field Tilt and Possession Value)
- Context matters: Tournament vs club, opponent quality

#### Metric 2: Field Tilt (Team-level)

**Definition:** Percentage of touches in opponent's half

**Calculation:**
```python
# StatsBomb pitch: 120x80 yards, opponent half = x > 60
attacking_third_events = events[events['location_x'] > 60]
field_tilt = (attacking_third_events / total_events) * 100
```

**Output:** `possession_field_tilt.csv`

**Why It Matters:**
- Distinguishes territorial dominance from mere possession
- >60% = high press/attacking dominance
- <40% = defensive/counter-attack style

**Tactical Signature:**
- High Possession + High Field Tilt = Dominant possession (e.g., Spain)
- High Possession + Low Field Tilt = Patient build-up (e.g., Italy)
- Low Possession + Low Field Tilt = Deep block (e.g., defensive tournament teams)

#### Metric 3: Possession Value / EPR (Team-level)

**Definition:** Expected Possession Value based on field location and context

**Calculation:**
```python
# Assigns value to possession based on pitch zones
# Higher values in dangerous areas (final third, central channels)
possession_value = weighted_location_value(events)
```

**Output:** `possession_efficiency_epr.csv`

**Why It Matters:**
- **Quality** of possession, not just quantity
- Possession in final third >> possession in own half
- Separates effective possession from sterile circulation

**Use Case:**
- Team A: 60% possession, low EPR = sterile possession
- Team B: 45% possession, high EPR = efficient possession in dangerous areas

#### Metric 4: Sequence Length (Team-level)

**Definition:** Average number of touches per possession sequence

**Calculation:**
```python
# Group events by possession chains
sequences = events.groupby(['match_id', 'possession']).size()
avg_sequence_length = sequences.mean()
```

**Output:** `possession_sequence_style.csv`

**Why It Matters:**
- Patient build-up (Barcelona): 15+ touches/sequence
- Direct play (counter teams): 3-5 touches/sequence
- **Tactical identity marker**

**Tournament Insight:**
- Longer sequences harder to maintain under tournament pressure
- Expect sequence length to **decrease** from club → tournament ("tactical compression")

---

### 4.3 PROGRESSION METRICS

Progression metrics quantify how teams advance the ball toward goal.

#### Metric 5: Progressive Passes (Player + Team-level)

**Definition:** Passes that move the ball ≥10 yards toward opponent's goal (or ≥30m from goal)

**Calculation:**
```python
# Progressive if:
# 1. Moves ball 10+ yards forward, OR
# 2. Ends within 30m of goal
progressive_pass = (
    (pass_end_x - pass_start_x >= 10) | 
    (pass_end_x >= 90)  # 30m from goal line (120 - 30 = 90)
)
```

**Outputs:**
- Player-level: `progression_passes.csv`
- Team-level: `progression_team_detail.csv`

**Why It Matters:**
- Identifies **playmakers** and progressive teams
- Separates sideways circulation from forward intent
- Key metric for "creative midfielder" role

**Dashboard Use:**
- Player profiles: Progressive passes per 90
- Team comparison: % of passes that are progressive

#### Metric 6: Progressive Carries (Player + Team-level)

**Definition:** Ball carries (dribbles) advancing ≥10 yards toward goal

**Calculation:**
```python
# Carry events with significant forward movement
progressive_carry = (carry_end_x - carry_start_x >= 10)
```

**Outputs:**
- Player-level: `progression_carries.csv`
- Team-level: `progression_team_detail.csv`

**Why It Matters:**
- Identifies **dribblers** vs **passers**
- High carry % = dribbling culture (Bundesliga)
- Low carry % = passing culture (La Liga)

**Tactical Signature:**
- Prog Passes >> Prog Carries = Passing team
- Prog Carries >> Prog Passes = Dribbling team
- Balanced = Versatile progression

#### Metric 7: Progressive Actions (Player + Team-level)

**Definition:** Progressive passes + Progressive carries (non-overlapping)

**Calculation:**
```python
# Combines both progression methods
# De-duplicates to avoid double-counting same progression
progressive_actions = progressive_passes + progressive_carries_unique
```

**Outputs:**
- Player-level: `progression_actions_no_overlap.csv`
- Team-level: `progression_team_summary.csv`

**Why It Matters:**
- **Total progression contribution** regardless of method
- Primary metric for dashboard "creativity" rating
- Enables player comparisons across different roles

**Use Case:**
- Deep-lying playmaker: High prog passes, low prog carries
- Winger: High prog carries, medium prog passes
- Both contribute to team progression → measured by this metric

#### Metric 8: Packing* (Player-level, 360° required)

**Definition:** Number of opponents eliminated by a pass or carry

**Calculation:**
```python
# Requires 360° tracking data to know player positions
# Counts how many opponents are "behind" the ball after action
opponents_packed = count_players_between(start_pos, end_pos, opponent_positions)
```

**Output:** `advanced_packing_stats.csv`

**Why It Matters:**
- **Elite playmaking indicator**
- Pass that eliminates 5 defenders >> pass that eliminates 1
- Used by top clubs for player recruitment

**Limitation:**
- Only 9.3% of matches have 360° data
- **Supplementary metric only** (not core to archetype clustering)
- Available for specific tournaments (UEFA Euro 100%, Bundesliga 10%, no La Liga/PL)

---

### 4.4 DEFENSIVE METRICS

Defensive metrics evaluate pressing intensity and defensive structure.

#### Metric 9: PPDA - Passes Allowed Per Defensive Action (Team-level)

**Definition:** Opponent passes allowed before defensive action

**Calculation:**
```python
# Only count opponent passes in attacking 2/3 of pitch
opponent_passes = opponent_events[(type == 'Pass') & (location_x < 80)]
defensive_actions = team_events[type.isin(['Pressure', 'Tackle', 'Interception'])]

ppda = opponent_passes / defensive_actions
```

**Output:** `defensive_ppda.csv`

**Why It Matters:**
- **THE** pressing intensity metric
- Lower = more aggressive pressing
- Higher = deeper defensive block

**Scale:**
- <8 = Ultra-high press (Liverpool under Klopp)
- 8-12 = Medium press
- >12 = Low block (Atletico Madrid)

**Tournament Insight:**
- High press teams often increase PPDA in tournaments (less aggressive)
- Physical/mental fatigue in compressed schedule

#### Metric 10: High Turnovers (Team-level)

**Definition:** Ball recoveries in attacking third

**Calculation:**
```python
# Ball recoveries in final third (x > 80)
high_turnovers = team_events[
    (type == 'Ball Recovery') & 
    (location_x > 80)
]
```

**Output:** `defensive_high_turnovers.csv`

**Why It Matters:**
- Measures **gegenpressing success**
- Low PPDA + High turnovers = Effective pressing
- Low PPDA + Low turnovers = Ineffective pressing (all effort, no reward)

**Combined Analysis:**
```
PPDA vs High Turnovers Matrix:

               Low Turnovers    High Turnovers
Low PPDA       Ineffective      Elite Press
               Press            (Liverpool)

High PPDA      Deep Block       Mid-Block
               (Atletico)       Counter
```

#### Metric 11: Defensive Line Height (Team-level)

**Definition:** Average Y-coordinate of defensive actions

**Calculation:**
```python
# Average location of tackles, interceptions, clearances
defensive_events = team_events[type.isin([
    'Interception', 'Clearance', 'Block', 'Tackle'
])]

defensive_line_height = defensive_events['location_x'].mean()
```

**Output:** `defensive_line_height_team.csv`

**Why It Matters:**
- High line (x > 60) = aggressive, space behind defense
- Deep block (x < 40) = conservative, compact

**Tactical Implication:**
- High line requires fast defenders
- Deep block vulnerable to possession teams
- Tournament teams often drop line (risk aversion)

#### Metric 12: Defensive Actions by Zone (Team-level)

**Definition:** Distribution of tackles/interceptions across thirds

**Calculation:**
```python
# Divide pitch into thirds
defensive_third = x < 40
middle_third = (x >= 40) & (x < 80)
attacking_third = x >= 80

zone_distribution = {
    'defensive': actions[defensive_third].count(),
    'middle': actions[middle_third].count(),
    'attacking': actions[attacking_third].count()
}
```

**Output:** `defensive_actions_by_zone.csv`

**Why It Matters:**
- **Visualizes defensive structure**
- Front-heavy = high press
- Back-heavy = absorb pressure

**Pattern Recognition:**
- High press: 30% defensive, 40% middle, 30% attacking
- Mid-block: 40% defensive, 50% middle, 10% attacking
- Deep block: 60% defensive, 35% middle, 5% attacking

---

### 4.5 Supplementary Metrics (Not in Core 12)

Additional metrics calculated for player profiling:

#### xG Chain (Player-level)

**Definition:** Total xG from possessions a player participated in

**Output:** `advanced_xg_chain_raw.csv`

**Use:** Identifies players involved in dangerous sequences (even without direct goal contribution)

---

#### xG Buildup (Player + Team-level)

**Definition:** xG from possessions player touched (excluding shot/assist)

**Outputs:**
- Player: `advanced_xg_buildup_raw.csv`
- Team: `advanced_xg_buildup_team.csv`

**Use:** Deep-lying playmakers (Busquets-type) who build attacks without direct involvement in final action

---

#### Player Role Classification

**Definition:** Automatic role assignment based on metric clusters

**Output:** `advanced_player_roles_master.csv`

**Roles:** Goal Threat, Creator, Deep Playmaker, Balanced, Defensive

**Use:** Dashboard player profiles and team composition analysis

---

### 4.6 Data Pipeline Overview

In [1]:
# Pipeline structure
from IPython.display import Markdown

pipeline_diagram = """
```
Raw Events (events.parquet)
         ↓
  [Metric Calculation Scripts]
  • src/metrics.py
  • run_pipeline.py
         ↓
  Individual CSV Exports
  • outputs/raw_metrics/
  • 20+ metric files
         ↓
  [Aggregation & Normalization]
  • Team-level summaries
  • Per-90 adjustments
         ↓
  [Clustering Analysis] ⟵ Next step
  • PCA/t-SNE dimensionality reduction
  • K-means clustering
  • Archetype profiling
         ↓
  Dashboard Visualization
```
"""

display(Markdown(pipeline_diagram))


```
Raw Events (events.parquet)
         ↓
  [Metric Calculation Scripts]
  • src/metrics.py
  • run_pipeline.py
         ↓
  Individual CSV Exports
  • outputs/raw_metrics/
  • 20+ metric files
         ↓
  [Aggregation & Normalization]
  • Team-level summaries
  • Per-90 adjustments
         ↓
  [Clustering Analysis] ⟵ Next step
  • PCA/t-SNE dimensionality reduction
  • K-means clustering
  • Archetype profiling
         ↓
  Dashboard Visualization
```


**Current Status:**
- Raw metrics calculated (see `outputs/raw_metrics/`)
- Next: Aggregate to team-season level
- Next: Dimensionality reduction & clustering
- Next: Archetype validation

**Metric Files Generated:**

**Possession:**
- `possession_overall_pct.csv`
- `possession_field_tilt.csv`
- `possession_efficiency_epr.csv`
- `possession_sequence_style.csv`

**Progression:**
- `progression_passes.csv` (player)
- `progression_carries.csv` (player)
- `progression_actions_no_overlap.csv` (player)
- `progression_team_summary.csv` (team)
- `progression_team_detail.csv` (team)

**Defensive:**
- `defensive_ppda.csv`
- `defensive_high_turnovers.csv`
- `defensive_line_height_team.csv`
- `defensive_actions_by_zone.csv`

**Advanced:**
- `advanced_xg_chain_raw.csv`
- `advanced_xg_buildup_raw.csv`
- `advanced_xg_buildup_team.csv`
- `advanced_player_roles_master.csv`
- `advanced_packing_stats.csv` (if 360° available)

---

### 4.7 How Metrics Enable Tactical Analysis

Our 12 metrics combine to identify **tactical archetypes**:

#### Example Archetype Profiles:

**"Dominant Possession"** (e.g., Barcelona-style)
- High Possession % (60-70%)
- High Field Tilt (>60%)
- Long Sequence Length (15+ touches)
- High Progressive Passes
- Low PPDA (<10)
- High Defensive Line (x > 60)

**"High Press Counter"** (e.g., Liverpool-style)
- Medium Possession % (50-55%)
- High Field Tilt (55-65%)
- Short Sequence Length (5-8 touches)
- High Progressive Carries
- Very Low PPDA (<8)
- High Turnovers in attacking third

**"Deep Block Counter"** (e.g., Tournament pragmatism)
- Low Possession % (35-45%)
- Low Field Tilt (<40%)
- Short Sequence Length (3-5 touches)
- Low Progressive Actions
- High PPDA (>12)
- Low Defensive Line (x < 40)

**"Mid-Block Control"** (e.g., Spain 2023)
- High Possession % (55-65%)
- Medium Field Tilt (50-55%)
- Medium Sequence Length (10-12 touches)
- Balanced Progression (passes + carries)
- Medium PPDA (10-12)
- Medium Defensive Line (x = 50-55)

#### Tournament Translation Hypothesis:

We expect **tactical compression** from club → tournament:
- Shorter sequences (fatigue, pressure)
- Higher PPDA (less aggressive pressing)
- Lower defensive line (risk aversion)
- Fewer progressive actions (conservative play)


---

## Summary of Section 4

### Metric Framework:

**12 Core Metrics Defined**
- 4 Possession metrics (control quality)
- 4 Progression metrics (forward movement)
- 4 Defensive metrics (pressing & structure)

**Calculation Pipeline Operational**
- 20+ CSV files generated
- Team & player-level metrics
- Per-90 normalization ready

**Tactical Archetype Framework Established**
- Metrics combine to form tactical signatures
- 4-6 archetypes expected
- Tournament compression hypothesis defined

---