In [4]:
%pip install statsbomb mplsoccer

Note: you may need to restart the kernel to use updated packages.


In [None]:
# numpy: numerical array operations; used here for coordinate handling
# pandas: tabular data manipulation; used to inspect and filter the shot event DataFrame
# statsbomb: alternative StatsBomb parser library (distinct from statsbombpy); provides
#   an object-oriented interface with event_id-based loading and typed get_dataframe() output
# mplsoccer.VerticalPitch: renders a football pitch rotated 90 degrees into portrait orientation,
#   which is the standard layout for shot maps — it focuses the visual space on the attacking half
import numpy as np
import pandas as pd
import statsbomb as sb

from mplsoccer import VerticalPitch

In [None]:
# sb.Events() instantiates an Events object for the match identified by event_id.
# This uses the statsbomb package (not statsbombpy) which fetches data from the
# StatsBomb Open Data GitHub repository and caches it for the session.
# event_id='3923881' corresponds to a specific match in the open data catalog.
# The Events object lazily loads the raw JSON; data is not parsed until get_dataframe() is called.
events = sb.Events(event_id='3923881')
events

In [None]:
# get_dataframe() parses the raw event data into a typed pandas DataFrame.
# event_type='shot' filters the full event stream to shot events only,
# discarding passes, carries, pressures, and all other event types.
# The resulting DataFrame contains shot-specific columns:
#   statsbomb_xg       — expected goals probability (0–1) from StatsBomb's xG model
#   start_location_x/y — shot origin in StatsBomb's 120x80 coordinate system
#   end_location_x/y/z — final ball position; z is height at the goal line (for saves/posts)
#   outcome            — result: Goal, Saved, Blocked, Off T, Wayward, Post
#   technique          — shot technique: Normal, Volley, Half Volley, Overhead Kick, Lob
#   body_part          — Head, Left Foot, Right Foot, No Touch
#   play_pattern       — context: Regular Play, From Corner, From Free Kick, From Throw In
df = events.get_dataframe(event_type='shot')

In [None]:
# Inspect the first 5 rows to verify the schema.
# start_location_x/y: shot origin coordinates — x is distance from the home goal line (0–120),
#   y is horizontal position (0–80). Values near x=120 indicate shots close to the away goal.
# end_location_x/y/z: where the ball traveled to; z (height) is only populated for shots
#   on target, making it useful for goalkeeping analysis.
# statsbomb_xg: a float between 0 and 1 representing shot quality from StatsBomb's model.
#   xG > 0.5 is considered a big chance; xG < 0.05 is a low-probability attempt.
df.head()

In [None]:
# VerticalPitch renders the pitch in portrait orientation (goal at top, halfway line at bottom).
# half=True crops the pitch to the attacking half only, from the midline to the goal.
# This is the canonical layout for shot maps: it maximizes visual space in the final third
# and removes the defensive half which carries no information for shot analysis.
# pitch.draw() initializes a matplotlib Figure and Axes with all pitch markings rendered.
pitch = VerticalPitch(
    half=True
)
fig, ax = pitch.draw(figsize=(10, 8))

In [None]:
# pitch.scatter() plots each shot as a circular marker positioned at its origin coordinates.
# x=start_location_x maps to the vertical axis (distance from goal) in the VerticalPitch orientation.
# y=start_location_y maps to the horizontal axis (width of pitch).
# All shots are rendered at equal size here — no quality encoding yet.
# The axes (ax) must be passed explicitly when working with pitch.grid() or custom layouts.
sc = pitch.scatter(x=df["start_location_x"], y=df["start_location_y"], ax=ax)

In [None]:
# Encode shot quality (xG) as marker size using a linear scaling formula.
# s = statsbomb_xg * 500 + 100:
#   - The +100 base ensures every shot remains visible regardless of its xG value.
#   - The *500 multiplier amplifies differences — a 0.5 xG shot has s=350,
#     while a 0.1 xG shot has s=150, creating a clear visual hierarchy.
# This communicates both shot location and shot quality in a single plot
# without requiring a separate size legend.
pitch = VerticalPitch(
    half=True
)
fig, ax = pitch.draw(figsize=(10, 8))
sc = pitch.scatter(
    x=df["start_location_x"],
    y=df["start_location_y"],
    ax=ax,
    s=df["statsbomb_xg"] * 500 + 100
)

In [None]:
# Filter the shot DataFrame to a single team for isolated team-level shot analysis.
# Boolean indexing on df["team"] == 'Nigeria' selects only Nigeria's shot events.
# .copy() is called to produce an independent DataFrame copy, avoiding the
# SettingWithCopyWarning that arises when modifying a slice of the original DataFrame.
nigeria_color = "#46f415"
nigeria_df = df[df["team"] == 'Nigeria'].copy()

pitch = VerticalPitch(
    half=True
)
fig, ax = pitch.draw(figsize=(10, 8))

# c= assigns a uniform hex color to all markers in this scatter series,
#   visually separating Nigeria's shots from any other team's shots if overlaid.
# label= registers a legend entry string for this series.
# s= continues the xG-as-size encoding established in the previous cell,
#   maintaining consistency across all shot map views in this notebook.
nigeria_sc = pitch.scatter(
    x=nigeria_df["start_location_x"],
    y=nigeria_df["start_location_y"],
    ax=ax,
    c=nigeria_color,
    label="Nigeria",
    s=nigeria_df["statsbomb_xg"] * 500 + 100
)
ax.legend()

## Summary: Shot Map Visualization with xG Encoding

### What This Notebook Does

This notebook constructs a progressive shot map visualization using StatsBomb event data and the `mplsoccer` library. It starts from an empty pitch and builds toward a team-filtered, xG-encoded scatter plot that communicates both shot location and shot quality in a single view.

The canonical use case for this visualization is post-match reporting, opponent scouting, and identifying patterns in a team's attacking output over a season.

### Key Concepts

- **VerticalPitch with `half=True`**: Renders only the attacking half in portrait orientation. This is the standard layout for shot maps because it concentrates visual attention on the final third and goal-threatening areas, making spatial patterns immediately readable.
- **xG (Expected Goals)**: A model-derived probability (0–1) that a given shot results in a goal, conditioned on factors including: distance from goal, angle to goal, body part used, play pattern (open play vs set piece), and whether the shot followed a dribble. StatsBomb's xG model is one of the most detailed publicly available.
- **Size encoding for xG**: Mapping `statsbomb_xg` to the `s` parameter (`xg * 500 + 100`) is a standard technique in football visualization. It encodes a second dimension (quality) into the same plot as location, avoiding the need for a color bar or separate legend. The linear scale ensures the ratio between chance qualities is preserved visually.
- **`pitch.scatter()`**: A wrapper around `matplotlib`'s scatter that handles the coordinate transformation required by the pitch orientation, simplifying the mapping from StatsBomb coordinates to the rendered axes.

### Data Available

The `df` DataFrame contains all shot events for match `3923881` with the following key columns:

| Column | Description |
|---|---|
| `start_location_x/y` | Shot origin in StatsBomb 120x80 coordinates |
| `end_location_x/y/z` | Final ball position; `z` (height) is populated for shots on target |
| `statsbomb_xg` | Expected goals probability per shot |
| `outcome` | Goal, Saved, Blocked, Off T, Wayward, Post |
| `team`, `player` | Attribution for each shot |
| `play_pattern` | Context: Regular Play, From Corner, From Free Kick |

### Ideas to Extract More Value

- **Outcome color coding**: Map `outcome` to distinct colors (e.g. green for Goal, amber for Saved, red for off target) to visually distinguish scoring shots from misses without adding a separate plot.
- **Shot zone aggregation**: Divide the pitch into standard zones (six-yard box, penalty spot area, edge of box, outside box) using x/y thresholds and compute total xG and conversion rate per zone to identify team strengths and weaknesses by area.
- **xGOT (Expected Goals on Target)**: Use `end_location_y` and `end_location_z` to map on-target shots onto a goal frame graphic, revealing which areas of the goal are most frequently targeted and where saves are being made.
- **Multi-match shot accumulation**: Call `get_dataframe(event_type='shot')` across all matches in a competition and concatenate the results into a single DataFrame to produce a season-level shot profile, revealing attacking patterns and set-piece dependence.
- **Goalkeeper save map**: Filter to `outcome == 'Saved'` and plot the `end_location_y` / `end_location_z` values on a goal-frame coordinate system to show the distribution of saves across the goal frame — a useful tool for goalkeeper performance analysis.
- **Shot quality vs volume comparison**: Compute average xG per shot (quality) and total shot volume per team across a league season and plot them as a scatter to classify teams: high-quality/low-volume finishers vs high-volume/low-quality approaches.