Skip to content

handsomevictor/nestedframe

Repository files navigation

nestedframe

Pandas-style toolkit for nested JSON/NDJSON. It makes semi-structured data—event logs, API responses, tracking payloads, JSONL—feel like working with Pandas: one-line load, path selection, controlled flattening, array explode, and reconstruction back to nested records.

  • PyPI: coming soon
  • License: MIT
  • Python: 3.9+

Why

  • json_normalize becomes unwieldy on complex arrays and loses hierarchy.
  • Analysts need to switch between “preserving the original nested structure” and “an analyzable flat table”.
  • Selecting deep paths and exploding arrays while preserving parent-child relationships is repetitive and error-prone.

What

  • NestedFrame provides from_records, path-based select, controlled explode, and to_nested to reconstruct records.
  • Columns use dotted paths (e.g., user.id, meta.country). When exploding items, child fields become items.id, items.qty.

Installation

pip install nestedframe

For local development:

pip install -e .

Quickstart

from nestedframe import NestedFrame

records = [
    {"user": {"id": 1, "name": "A"}, "items": [{"id": "i1", "qty": 2}, {"id": "i2", "qty": 1}], "meta": {"country": "CN"}},
    {"user": {"id": 2, "name": "B"}, "items": [{"id": "i3", "qty": 5}], "meta": {"country": "US"}}
]

nf = NestedFrame.from_records(records)
nf.to_pandas()

nf2 = nf.explode("items")
nf2.to_pandas()

nf2.to_nested(group_by="_root_id")

Key Features

  • Dotted path columns with robust flattening of nested dicts.
  • Controlled array explode that preserves a _root_id for reconstruction.
  • Flexible column selection using fnmatch-style patterns.
  • Round-trip conversion back to nested records for export.
  • NDJSON/JSON IO helpers with .gz JSONL support.

New in this release

  • from_pandas(df) to wrap an existing DataFrame as NestedFrame while ensuring _root_id.
  • select(patterns=None, exclude=None) supports include and exclude patterns.
  • subset(include, exclude) returns a new NestedFrame with selected columns.
  • explode_many(paths) sequentially explodes multiple array paths.
  • schema() returns available columns and top-level prefixes.
  • to_ndjson(path, group_by="_root_id") writes reconstructed records to NDJSON.
  • read_jsonl(path) alias for read_ndjson, automatically supports .gz files.

IO

from nestedframe import read_json, read_ndjson, read_jsonl

nf = read_json("data.json")
nf = read_ndjson("events.jsonl")
nf = read_jsonl("events.jsonl.gz")

nf.to_ndjson("out.jsonl")

Column Selection

df = nf.select(patterns=["user.*", "meta.country"], exclude=["*.name"])
nf_sub = nf.subset(patterns=["items.*", "user.id"], exclude=["items.qty"])

Multiple Explodes

nf3 = nf.explode_many(["items"])  # add more paths as needed

API Reference

  • NestedFrame.from_records(records)
  • NestedFrame.from_pandas(df)
  • NestedFrame.to_pandas()
  • NestedFrame.select(patterns=None, exclude=None) → DataFrame
  • NestedFrame.subset(patterns=None, exclude=None) → NestedFrame
  • NestedFrame.explode(path) → NestedFrame
  • NestedFrame.explode_many(paths) → NestedFrame
  • NestedFrame.to_nested(group_by=None) → list[dict]
  • NestedFrame.schema() → dict
  • NestedFrame.to_ndjson(path, group_by="_root_id")
  • read_json(path) → NestedFrame
  • read_ndjson(path) / read_jsonl(path) → NestedFrame
  • write_ndjson(records, path)

Performance Notes

  • Explode operations iterate per-row and per-element; prefer targeted paths and pre-filtering.
  • Reconstruction uses grouping by _root_id; ensure this column persists through transformations.

Contributing

  • Run tests with pytest.
  • Please keep public APIs stable and add tests for new features.

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages