# Helper Functions: Clean, Safe Pipelines in MatchFlow

Working with real-world JSON data often means writing a lot of code like this:

```python
.filter(lambda r: r.get("type", {}).get("name") == "Shot")
.assign(player=lambda r: r.get("player", {}).get("name"))
```

It works but it’s repetitive, hard to read, and not compatible with parallel pipelines (e.g. in `folder_flow`) due to un-pickleable lambdas.

To solve this, **MatchFlow includes a set of helper functions** that make your pipelines cleaner, safer, and more reusable.

## Why Use Helpers?

- **Readable:** Declarative, short expressions
- **Safe:** Handles missing keys and optional fields
- **Parallel-friendly:** Fully pickleable (unlike lambdas)
- **Reusable:** Compose them like building blocks

## Field Access Helpers

### `get_field(path: str, default=None)`

Access nested fields using dot notation:

```python
from penaltyblog.matchflow import get_field, where_equals

flow.filter(where_equals(get_field("type.name"), "Shot"))
flow.assign(player=get_field("player.name"))
```

This safely returns None (or a default) if the path doesn’t exist.

### `get_index(key: str, index: int, default=None)`

Extract an item from a list field (e.g. location coordinates):

```python
from penaltyblog.matchflow import get_index

flow.assign(
    x=get_index("location", 0),
    y=get_index("location", 1)
)
```

## Predicate Helpers

Shortcut for filtering where a field equals a value:

```python
from penaltyblog.matchflow import where_equals

flow.filter(where_equals("type.name", "Shot"))
```

### `where_in(path: str, values: set | list | tuple)`

Shortcut for filtering where a field is in a list of values:

```python
from penaltyblog.matchflow import where_in

flow.filter(where_in("player.name", {"Bukayo Saka", "Mohamed Salah"}))
```

### `where_exists(path: str)`

Keep only records where a field exists and is not `None`:

```python
from penaltyblog.matchflow import where_exists

flow.filter(where_exists("player.name"))
```

## Field Combination & Transformation

### `combine_fields(target: str, *paths: str, join_str: str = "")`

Concatenate multiple fields into a new one:

```python
from penaltyblog.matchflow import combine_fields

flow.assign(
    player_full_name=combine_fields("player_full_name", "player.first", "player.last")
)
```

## In Parallel Pipelines

These helpers replace lambdas in multiprocessing environments, like `folder_flow`, where functions must be pickleable:

```python
folder_flow(
    input_folder="data/",
    flow_fn=lambda f: f.filter(where_equals("type.name", "Shot")),
    output_folder="out/"
)
```

## Creating Your Own Helpers

In addition to the built-in helpers like `get_field()` and `get_index()`, you can define your own reusable functions to keep your Flow pipelines clean and expressive.

### Why Use Helpers?

- Avoid repetition across multiple filters or assignments
- Improve readability by naming your logic
- Enable pickle-safe functions for use with `folder_flow()`
- Make your analysis easier to share and maintain

### Example: A Reusable Pass Filter

Rather than repeating this logic:

```python
flow.filter(lambda r: r.get("type", {}).get("name") == "Pass")
```

Define a named function:


```python
from matchflow.helpers import get_field

def is_pass(record: dict) -> bool:
    return get_field("type.name")(record) == "Pass"
```

Then use it:

```python
flow.filter(is_pass)
```

### Example: Filtering for a Specific Player

```python
def player_is(name: str):
    def checker(record: dict) -> bool:
        return get_field("player.name")(record) == name
    return checker

flow.filter(player_is("Harry Kane"))
```

### Example: Filtering for Successful Passes

```python
def is_successful_pass(record: dict) -> bool:
    type_ = get_field("type.name")(record)
    outcome = get_field("pass.outcome.name", default="Successful")(record)
    return type_ == "Pass" and outcome == "Successful"
```

Now your Flow is much cleaner:

```python
passes = (
    flow
    .filter(is_successful_pass)
    .filter(player_is("Harry Kane"))
    .assign(...)  # etc.
)
```

### Example: Assigning with a Custom Field Extractor

If you’re assigning new fields repeatedly, consider helpers for that too:

```python
def extract_coords(field: str, index: int):
    from typing import Optional

    def accessor(record: dict) -> Optional[float]:
        coords = get_field(field, default=[])(record)
        return coords[index] if len(coords) > index else None
    return accessor

flow.assign(
    start_x=extract_coords("location", 0),
    start_y=extract_coords("location", 1),
)
```

### Tip: Always Use Named Functions in `folder_flow()`

Because `folder_flow()` relies on Python's multiprocessing, any function passed to it (like `flow_fn` or `reduce_fn`) must be pickleable. This means:

- ✅ Named functions defined at the top level
- ❌ Lambdas or nested functions

Using custom helpers makes your code compatible and modular.
