# Working with Files: Input & Output

Flow makes it easy to **load, stream, and save structured JSON data** from a variety of sources. Whether you’re pulling from disk, an API, or a folder of `.jsonl` files — Flow provides a consistent, lazy interface for building pipelines.

---

## 📥 Loading Data into Flow

Use `Flow.from_*` methods to create a new Flow from Python objects or files.

---

### 🧠 From Python Data: `.from_records(...)`

```python
from penaltyblog.matchflow import Flow

data = [{"id": 1, "value": "A"}, {"id": 2, "value": "B"}]
flow = Flow.from_records(data)
```

Also works with single dicts or generators:

```python
flow = Flow.from_records({"id": 3, "value": "C"})

def gen():
    for i in range(3):
        yield {"id": i}

flow = Flow.from_records(gen())
```
> ⚠️ If you mutate records (e.g. with `.assign()`), Flow modifies them in place. Use `.copy()` or `deepcopy()` to protect your originals.

---

## 📄 From JSON Lines File: `.from_jsonl(...)`

```python
flow = Flow.from_jsonl("data/events.jsonl")
```

---

## 📂 From Folder of JSON Files: `.from_folder(...)`

```python
flow = Flow.from_folder("data/events/")
```

Reads all `.json` and `.jsonl` files in a directory.

Each `.json` file must contain either:

- A single dict
- A list of dicts
- Files are streamed one at a time - efficient for bulk ingestion.

---

## ✨ From Glob Pattern: `.from_glob(...)`

```python
flow = Flow.from_glob("data/**/*.json")
```

Searches recursively using `glob.glob`. Same behavior as `.from_folder`, but more flexible for matching paths and subfolders.

---

## 🧾 From JSON File (Single Object or Array): `.from_json(...)`

```python
flow = Flow.from_json("data/game.json")
```

- Accepts a single object (as one record), or
- A list of objects (as multiple records)

> This reads the entire file into memory. Use `.from_jsonl()` for streaming large datasets.

---

## ⚽ From StatsBomb GitHub (Open Data): `.from_statsbomb(...)`

```python
flow = Flow.statsbomb.from_github_file(match_id=266516, type="events")
```

Supported type values: "events", "lineups", "matches", "three-sixty"

---

## 💾 Saving Data from a Flow

Once your pipeline is complete, use `.to_*()` methods to export the result.

### `.to_jsonl(path)`

Write one record per line:

```python
flow.to_jsonl("output/events.jsonl")
```

---

### `.to_json(path)`

Write all records as a JSON array:

```python
flow.to_json("summary.json", indent=4)
```

> This collects the entire stream before writing.

---

### `.to_json_files(folder, by="id")`

Write each record to its own .json file:

```python
flow.to_json_files("out/", by="event_id")
```

- "out/123.json"
- "out/456.json"

Field must be a string or something serializable to filename.

---

### `.to_pandas()`

Convert the flow to a Pandas DataFrame:

```python
df = flow.select("player_name", "shot_xg").to_pandas()
```

> Best used after filtering/flattening to avoid deeply nested fields.

---

## ✅ Summary

| Source Format    | Method                | Streaming? | Notes                        |
| ---------------- | --------------------- | ---------- | ---------------------------- |
| Python objects   | `.from_records()`     | ✅          | Lists, dicts, or generators  |
| JSONL file       | `.from_jsonl()`       | ✅          | Efficient for large datasets |
| Single JSON file | `.from_json()`        | ❌          | Loads entire file at once    |
| Folder of files  | `.from_folder()`      | ✅          | Streams one file at a time   |
| Glob pattern     | `.from_glob()`        | ✅          | Recursively matches files    |
| StatsBomb GitHub | `.statsbomb.from_...` | ✅          | Downloads open match data    |

## 📤 Summary of Output Options

| Output Method      | Format          | Streaming? | Notes                         |
| ------------------ | --------------- | ---------- | ----------------------------- |
| `.to_jsonl()`      | JSONL           | ✅          | One line per record           |
| `.to_json()`       | JSON array      | ❌          | Collects before writing       |
| `.to_json_files()` | Folder of files | ✅          | One file per record           |
| `.to_pandas()`     | DataFrame       | ❌          | Collects all data into memory |

## 🧠 What’s Next?

Now that you can load and save data, let’s look at inspecting, debugging, and explaining your flows using `.head()`, `.keys()`, `.explain()` and more.