# Utility, Inspection, & Interoperability

Beyond transforming data, `Flow` provides several tools to help you inspect, debug, and connect your pipelines to other libraries like pandas - all while preserving the lazy, composable nature of your workflow.


## Inspecting Your Flow

`Flow` includes methods for peeking into the stream, checking fields, or verifying structure - without accidentally materializing the `Flow`.

### `.first()`: Peek at the First Record

Returns the first record without materializing the `Flow` (where possible). If the stream is empty, returns `None`.

```python
first_event = events_flow.first()
```

Useful for sanity checks or schema inspection.

### `.last()`: Get the Final Record

```python
last_event = events_flow.last()
```

💡 This materializes the `Flow`.

### `.is_empty()`: Check for Records

Returns `True` if the stream is empty.

```python
if events_flow.is_empty():
    print("No events found.")
```

💡 Safe to call - doesn’t materialize the `Flow`.

### `.keys(limit=None)`: Discover Field Names

Returns the union of all keys across records (up to an optional limit).

```python
events_flow.keys(limit=10)
events_flow.keys()      
```

Helpful for exploring semi-structured or nested JSON.

### `.len()`: Count the Records

```python
flow = Flow(data).cache()
print(len(flow))  # Safe and repeatable
```

💡 This materializes the `Flow`.

### `print(flow)`: Summary Preview

Printing a Flow shows a summary and a sample:

```python
print(flow)
# <Penaltyblog Flow | n≈? | sample=[..., ...]>
```

💡 Once `.cache()` or `.len()` has been called, the count becomes accurate.

## Materializing the Flow

### `.collect()`: Get All Records as a List

Materializes the full pipeline and returns a list of dicts.

```python
records = events_flow.filter(...).assign(...).collect()
```

## Custom Logic and Pipelines

### `.pipe()`: Insert Your Own Function

Use `.pipe()` to plug your own logic into the pipeline. Perfect for reusable functions, integrations, or branching.

```python
def process_shots(flow, min_xg=0.25):
    return (
        flow
        .filter(lambda r: r.get("type_name") == "Shot" and r.get("shot_xg", 0) >= min_xg)
        .assign(is_high_xg=True)
    )

# Chain your custom step using .pipe
high_xg = Flow(events).pipe(process_shots, min_xg=0.3)
results = high_xg.select("player_name", "shot_xg", "is_high_xg").collect()
```

💡 Your function should return a Flow if you want to keep chaining, or return any object if you’re done (e.g., DataFrame, plot, summary).

### Use `.pipe()` with External Tools

You can bridge into pandas (or anything else) via .pipe():

```python
def team_summary(flow):
    df = flow.to_pandas()
    return df.groupby("team_name")["shot_xg"].agg(["sum", "mean", "count"])

summary_df = (
    Flow(events)
    .filter(lambda r: r.get("type_name") == "Shot")
    .pipe(team_summary)
)
```

## Integration with pandas

### `.to_pandas()`: Convert to DataFrame

```python
df = Flow(events).filter(lambda r: r.get("period") == 1).to_pandas()
print(df.head())
```

💡 This materializes the `Flow` into memory.

### `.describe()`: Summary Stats

Returns `pandas.DataFrame.describe()` on the full `Flow`.

```python
Flow(events)
  .select("shot_xg", "location_x", "location_y")
  .describe()
```

You can pass pandas-style args:

```python
# Show object/string field summaries
Flow(events).describe(include="object")

# Change percentiles
Flow(events).describe(percentiles=(0.1, 0.5, 0.9))
```

### Summary

These utility methods let you:

- Inspect flows safely (`first()`, `keys()`, `is_empty()`)
- Debug or validate pipelines (`print(flow)`)
- Materialize and export results (`collect()`, `to_pandas()`)
- Integrate your own logic cleanly (`pipe()`)

`Flow` gives you structure when you need it — and flexibility when you don’t.

## What’s Next?

Next, we’ll look at best practices for working with `Flow`.