# Accessing the StatsBomb API From Flow

**Flow** includes built-in support for querying the StatsBomb API directly (both paid and free access) - making it easy to fetch, explore, and analyze football data in just a few lines of code.

Whether you're working with events, lineups, or 360 data, `Flow` lets you stream and query it without flattening or wrangling.

## 🔑 Setup

Before you begin, make sure you have access credentials for the StatsBomb API, unless you are only accessing their free, open data.

You can set them via environment variables:

```bash
export SB_USERNAME="your-username"
export SB_PASSWORD="your-password"
```

Or pass them directly via the `creds` argument.

## 🚀 Getting Started

In [8]:
from penaltyblog.matchflow import Flow

# Get all available competitions
comps = Flow.statsbomb.competitions().collect()
comps[0]

{'competition_id': 9,
 'season_id': 281,
 'country_name': 'Germany',
 'competition_name': '1. Bundesliga',
 'competition_gender': 'male',
 'competition_youth': False,
 'competition_international': False,
 'season_name': '2023/2024',
 'match_updated': '2024-07-15T14:15:54.671676',
 'match_updated_360': '2024-07-15T14:17:00.877356',
 'match_available_360': '2024-07-15T14:17:00.877356',
 'match_available': '2024-07-15T14:15:54.671676'}

## 📅 Loading Match Metadata

In [9]:
matches = Flow.statsbomb.matches(competition_id=43, season_id=106).collect()
len(matches)

64

## 🎯 Filtering Events (e.g. Shots)

In [10]:
from penaltyblog.matchflow import where_equals

shots = (
    Flow.statsbomb.events(match_id=3788741)
    .filter(where_equals("type.name", "Shot"))
    .select("player.name", "location", "shot.statsbomb_xg")
    .collect()
)

shots[0]

{'player.name': 'Ciro Immobile',
 'location': [114.6, 50.1],
 'shot.statsbomb_xg': 0.05223599}

## ⚽ Working with Lineups

In [11]:
lineups = Flow.statsbomb.lineups(match_id=3788741)
lineups = lineups.select("lineup")
lineups = lineups.collect()

lineups[0]["lineup"][0]


{'player_id': 4355,
 'player_name': 'Emerson Palmieri dos Santos',
 'player_nickname': 'Emerson',
 'jersey_number': 13,
 'country': {'id': 112, 'name': 'Italy'},
 'cards': [],
 'positions': []}

## 🧠 Join with Match Metadata

In [12]:
from penaltyblog.matchflow import where_exists

meta = Flow(lineups[1]["lineup"])

events = (
    Flow.statsbomb.events(3788741)
    .filter(where_exists("player.id"))
    .assign(player_id=lambda r: r["player"].get("id"))
)

joined = (
    events
    .join(meta, left_on="player_id", right_on="player_id", how="left")
    .select("jersey_number", "type")
    .limit(10)
    .collect()
    )

joined[0]

{'jersey_number': 17, 'type': {'id': 30, 'name': 'Pass'}}

## 👟 Presets for Common Tasks

Flow also includes preset filters and transforms for common StatsBomb use cases - such as isolating shots or passes - so you don’t have to rewrite the same filters each time.

In [13]:
from penaltyblog.matchflow.statsbomb.presets import shots_only, passes_only

# All shots from a match
shots = Flow.statsbomb.events(3788741).pipe(shots_only).collect()

passes = Flow.statsbomb.events(3788741).pipe(passes_only).collect()

len(shots), len(passes)

(27, 1059)

Presets can also take arguments to filter events further. For example:

In [14]:
from penaltyblog.matchflow.statsbomb.presets import xg_above

shots = Flow.statsbomb.events(3788741).pipe(xg_above(0.25)).collect()

len(shots)

1

## 🛠️ Tip: Combine with .materialize() to safely reuse data

Since Flow streams data, it's often a good idea to materialize if you're reusing it:

```python
flow = Flow.statsbomb.events(3788741).materialize()

shots = flow.pipe(shots_only).collect()
passes = flow.pipe(passes_only).collect()
high_xg_shots = flow.pipe(xg_above(0.25)).collect()
```

## ✅ Summary

- Use `Flow.statsbomb` to access all available endpoints
- Filter, select, and join nested fields without flattening
- Works out of the box with the official statsbombpy client
- Ideal for quick exploration, dashboards, or building pipelines
