Pandas-style toolkit for nested JSON/NDJSON. It makes semi-structured data—event logs, API responses, tracking payloads, JSONL—feel like working with Pandas: one-line load, path selection, controlled flattening, array explode, and reconstruction back to nested records.
- PyPI: coming soon
- License: MIT
- Python: 3.9+
json_normalizebecomes unwieldy on complex arrays and loses hierarchy.- Analysts need to switch between “preserving the original nested structure” and “an analyzable flat table”.
- Selecting deep paths and exploding arrays while preserving parent-child relationships is repetitive and error-prone.
NestedFrameprovidesfrom_records, path-basedselect, controlledexplode, andto_nestedto reconstruct records.- Columns use dotted paths (e.g.,
user.id,meta.country). When explodingitems, child fields becomeitems.id,items.qty.
pip install nestedframeFor local development:
pip install -e .from nestedframe import NestedFrame
records = [
{"user": {"id": 1, "name": "A"}, "items": [{"id": "i1", "qty": 2}, {"id": "i2", "qty": 1}], "meta": {"country": "CN"}},
{"user": {"id": 2, "name": "B"}, "items": [{"id": "i3", "qty": 5}], "meta": {"country": "US"}}
]
nf = NestedFrame.from_records(records)
nf.to_pandas()
nf2 = nf.explode("items")
nf2.to_pandas()
nf2.to_nested(group_by="_root_id")- Dotted path columns with robust flattening of nested dicts.
- Controlled array explode that preserves a
_root_idfor reconstruction. - Flexible column selection using
fnmatch-style patterns. - Round-trip conversion back to nested records for export.
- NDJSON/JSON IO helpers with
.gzJSONL support.
from_pandas(df)to wrap an existing DataFrame asNestedFramewhile ensuring_root_id.select(patterns=None, exclude=None)supports include and exclude patterns.subset(include, exclude)returns a newNestedFramewith selected columns.explode_many(paths)sequentially explodes multiple array paths.schema()returns available columns and top-level prefixes.to_ndjson(path, group_by="_root_id")writes reconstructed records to NDJSON.read_jsonl(path)alias forread_ndjson, automatically supports.gzfiles.
from nestedframe import read_json, read_ndjson, read_jsonl
nf = read_json("data.json")
nf = read_ndjson("events.jsonl")
nf = read_jsonl("events.jsonl.gz")
nf.to_ndjson("out.jsonl")df = nf.select(patterns=["user.*", "meta.country"], exclude=["*.name"])
nf_sub = nf.subset(patterns=["items.*", "user.id"], exclude=["items.qty"])nf3 = nf.explode_many(["items"]) # add more paths as neededNestedFrame.from_records(records)NestedFrame.from_pandas(df)NestedFrame.to_pandas()NestedFrame.select(patterns=None, exclude=None)→ DataFrameNestedFrame.subset(patterns=None, exclude=None)→ NestedFrameNestedFrame.explode(path)→ NestedFrameNestedFrame.explode_many(paths)→ NestedFrameNestedFrame.to_nested(group_by=None)→ list[dict]NestedFrame.schema()→ dictNestedFrame.to_ndjson(path, group_by="_root_id")read_json(path)→ NestedFrameread_ndjson(path)/read_jsonl(path)→ NestedFramewrite_ndjson(records, path)
- Explode operations iterate per-row and per-element; prefer targeted paths and pre-filtering.
- Reconstruction uses grouping by
_root_id; ensure this column persists through transformations.
- Run tests with
pytest. - Please keep public APIs stable and add tests for new features.
MIT