In [1]:
from src.data_loader import load_match_data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from src.analyze import prepare_possession_data, link_runs_to_possessions, analyze_run_impact
from src.analyze import analyze_untargeted_runs, analyze_defensive_impact

## 1. Load data

Following Merging_Dynamic_Events_and_Tracking_Data_Tutorial.ipynb (from opendata repo) tracking and events data is loaded and merged

In [2]:
event_data, enriched_tracking_data, synced = load_match_data(1886347, minutes=90)

*For getting all matches data (too heavy for running in my local)

In [None]:
# AVAILABLE_MATCHES = {1886347: "Match 1886347", 1899585: "Match 1899585", 1925299: "Match 1925299", 1953632: "Match 1953632",
#                      1996435: "Match 1996435", 2006229: "Match 2006229", 2011166: "Match 2011166", 2013725: "Match 2013725", 
#                      2015213: "Match 2015213", 2017461: "Match 2017461"}

In [None]:
# all_events = []
# all_tracking = []
# all_synced = []

# for match_id in AVAILABLE_MATCHES:
#     print(f"Loading match {match_id}...")

#     event_data, enriched_tracking_data, synced = load_match_data(
#         match_id,
#         minutes=90
#     )

#     # Add match_id for traceability
#     event_data["match_id"] = match_id
#     enriched_tracking_data["match_id"] = match_id
#     synced["match_id"] = match_id

#     all_events.append(event_data)
#     all_tracking.append(enriched_tracking_data)
#     all_synced.append(synced)

# # Concatenate
# events_df = pd.concat(all_events, ignore_index=True)
# tracking_df = pd.concat(all_tracking, ignore_index=True)
# synced_df = pd.concat(all_synced, ignore_index=True)

# print(
#     events_df.shape,
#     tracking_df.shape,
#     synced_df.shape)

## 2. Analyze data

As there is many geometrical information about the players, the focus will be on runs or movements made by the players without the ball and the effect they have on the final outcome of the play.

Disclaimer: The data comes from a single match, so the results may not be fully accurate or generalizable to other contexts.

### 2.1 Prepare data 

Extract possessions and runs and link them together. Note that the data is sampled every 0.1 seconds, so absolute values are less informative than relative comparisons.

In [3]:
possessions, passing_options = prepare_possession_data(synced)
possessions_with_runs = link_runs_to_possessions(possessions, passing_options)

Extracted 20636 possessions and 52624 passing options
Successfully linked 860 possessions to their runs


### 2.2 Analyze impact run

In [4]:
analyze_run_impact(possessions_with_runs)


POSSESSION OUTCOMES BY RUN CHARACTERISTICS

Possessions with at least 1 dangerous run: 2574
Possessions with untargeted dangerous runs: 1738

--- Comparing possessions WITH vs WITHOUT dangerous runs ---

pass_outcome distribution:
With dangerous runs:
pass_outcome
successful      0.56701
unsuccessful    0.43299
Name: proportion, dtype: float64

xthreat:
  With dangerous run: nan
  Without dangerous run: nan


* There are many decoys (dangerous runs that didn't recieve the ball), it represents 67.5%  (1,738 / 2,574) of all dangerous run possessions.
* In many dangerous runs, the pass succeed so it was probably a smart move.

### 2.3 Defensive impact on the runs

Let's analyze the defensive impact on runs targeted (were the pass was completed) and ignored (pass not completed)

In [5]:
targeted_dangerous, ignored_dangerous = analyze_untargeted_runs(possessions_with_runs)


UNTARGETED RUN VALUE HYPOTHESIS

Possessions where dangerous run WAS targeted: 2112
Possessions where dangerous run was IGNORED: 462

Did ignoring the dangerous run hurt the outcome?
  xthreat: Targeted=nan, Ignored=nan, Diff=nan


Most of the times the pass reached the runner, the question now is if ignoring the run hurt the outcome. 

In [6]:
analyze_defensive_impact(possessions_with_runs, ignored_dangerous)


DEFENSIVE IMPACT - DID RUNS CREATE SPACE?

Defensive metrics when dangerous runs were ignored:
  n_opponents_ahead_start: 4.95
  n_opponents_ahead_end: 4.71
  separation_start: 4.95
  separation_end: 2.50
  separation_gain: -2.45

Did the actual pass benefit from the decoy run?
  Avg separation gained: -2.45


In [7]:
analyze_defensive_impact(possessions_with_runs, targeted_dangerous)


DEFENSIVE IMPACT - DID RUNS CREATE SPACE?

Defensive metrics when dangerous runs were ignored:
  n_opponents_ahead_start: 4.77
  n_opponents_ahead_end: 5.03
  separation_start: 5.14
  separation_end: 3.13
  separation_gain: -2.00

Did the actual pass benefit from the decoy run?
  Avg separation gained: -2.00


* The starting conditions are very similar
* Targeted runs attract defenders (higher n_opponents_ahead_end)
* Of course dangerous runs attract defenders, and the separation is lower. Although we expected decoy runs to create space by pulling defenders away. But this is not the case, targeted passes loss less separation (18% better). 
- Conclusion: Dangerous runs should be TARGETED, not used as decoys.

### 2.4 Compare attacks with and without runs

In [8]:
from src.analyze import compare_with_vs_without_runs

In [9]:
comparison_results = compare_with_vs_without_runs(possessions_with_runs)


RUN VALUE ADDED (RVA) METRIC DEVELOPMENT

Possessions without runs: 12474
Possessions with runs: 8162

IMPACT ANALYSIS: WITH RUNS vs WITHOUT RUNS

PASS_SUCCESS:
  With runs: 0.741
  Without runs: 0.737
  Difference: +0.004 
  p-value: 0.5197

PROGRESSION:
  With runs: -0.492
  Without runs: -0.716
  Difference: +0.225 ***
  p-value: 0.0000

SEPARATION_GAINED:
  With runs: -2.667
  Without runs: -1.584
  Difference: -1.083 ***
  p-value: 0.0000

LEAD_TO_SHOT:
  With runs: 0.078
  Without runs: 0.044
  Difference: +0.034 ***
  p-value: 0.0000


* Making runs really leads to possible shots.
* Separation is not reduced by the increased attention the defenders will have.

### 2.5 Calculate run value added (RVA)

Based on the evidences created, a new metric was created to calculate how every player runs add or not value to the action

In [10]:
from src.analyze import calculate_run_value_added, summarize_rva

These are the 5 different values that are combined for calculating this metric

In [11]:
passing_options_with_rva = calculate_run_value_added(passing_options, possessions_with_runs)


RUN VALUE ADDED (RVA) FORMULA - EVIDENCE-BASED V2

RVA components based on empirical findings:
1. Shot Creation Value: Only credited to TARGETED dangerous runs
2. Direct Threat Value: xThreat * completion when targeted
3. Progression Value: Helps advance play (+0.225m per possession)
4. Decoy Penalty: Ignored dangerous runs LOSE shot credit + penalty
5. Simultaneous Run Bonus: Multiple runs stress defense more


In [12]:
player_rva = summarize_rva(passing_options_with_rva)


RVA SUMMARY STATISTICS

Average RVA per run: 0.0022
Average RVA (targeted): 0.0068
Average RVA (untargeted): -0.0001
Average RVA (untargeted dangerous): -0.0005

--- RVA Component Breakdown ---
shot_value: 0.0013
direct_value: 0.0013
progression_value: 0.0007
decoy_penalty: -0.0011
overload_value: 0.0000

TOP RUN VALUE CREATORS (by total RVA)
             total_RVA  avg_RVA  n_runs n_targeted n_dangerous  \
player_name                                                      
G. May        500.9861   0.0095   52756      16456       21296   
L. Rogerson   357.7217   0.0076   46948      16456        8712   
J. Brimmer    305.5459   0.0046   66792      17908       10648   
L. Gillion    242.3198   0.0036   66792      22748       14036   
F. Gallegos   166.0886   0.0023   72116      20328        5808   
E. Adams      142.9063   0.0028   51788      18392        4840   
F. De Vries   141.5312   0.0021   66308      23232        4356   
B. Gibson     132.6351   0.0034   38720       9680        96

#### Conclusions

**1. Targeting is EVERYTHING**

- Targeted: 0.0068 (3x better)
- Untargeted: -0.0001 (neutral/negative)

**2. Dangerous runs are HIGH RISK**

- If targeted: Very positive (shot + direct value)
- If ignored: Negative (-0.0005)

**3. The Formula is Balanced**

- Positive components: 0.0013 + 0.0013 + 0.0007 = 0.0033
- Negative component: -0.0011
- Net average: 0.0022 

#### Players performance

1. G. May — “The Complete Threat”
Elite two-way runner combining dangerous movement with precise targeting and strong shot involvement.

2. L. Rogerson — “The Efficient Engine”
High-value volume runner who creates shots consistently with little wasted movement.

3. J. Brimmer — “The Shot Builder”
Smart, high-volume runner focused on shot creation, with room to improve danger per run.