Add `defstruct` #105

jcrist · 2022-05-09T22:32:52Z

Adds a method for dynamically defining new struct types. This is helpful
for situations where types aren't known until runtime, but you still
want to provide type validation when encoding/decoding.

Adds a method for dynamically defining new struct types. This is helpful for situations where types aren't known until runtime, but you still want to provide type validation when encoding/decoding.

Recent mypy upgrade broke the CI setup.

jcrist · 2022-05-10T03:17:34Z

One use of this is for dynamically defining a type used exclusively for extracting a few known fields from a larger structure. In code where the necessary fields are static, a classic struct definition would suffice. But for code where the fields aren't known until runtime, msgspec.defstruct becomes necessary.

For example, here's a small script that parses and queries the current repodata.json file for conda-forge. A struct type is defined at runtime to parse only the fields required for the query, avoiding allocating extra data that is never used.

from operator import attrgetter
import msgspec


def top10_packages(sort_field):
    # Dynamically define a new type with only the required fields
    Package = msgspec.defstruct("Package", ["name", sort_field])
    RepoData = msgspec.defstruct("RepoData", [("packages", dict[str, Package])])

    # Load and parse the data into this new type
    with open("current_repodata.json", "rb") as f:
        repo_data = msgspec.json.decode(f.read(), type=RepoData)

    # Sort by the designated field
    packages = list(repo_data.packages.values())
    getter = attrgetter(sort_field)
    packages.sort(key=getter, reverse=True)

    # Return the results
    return [(p.name, getter(p)) for p in packages[:10]]


for name, size in top10_packages("size"):
    print(f"- {name}: {size / (2 ** 20):.2f} MiB")

Results:

$ python example.py
- spacy-model-en_core_web_lg: 630.93 MiB
- spacy-model-en_vectors_web_lg: 584.45 MiB
- geant4-data-ndl: 572.80 MiB
- proj-data: 565.82 MiB
- spacy-model-en_core_web_trf: 442.50 MiB
- nltk_data: 428.19 MiB
- geant4-data-emlow: 295.86 MiB
- scitime: 287.50 MiB
- pyspark: 267.96 MiB
- cartopy_offlinedata: 216.34 MiB

jcrist added 2 commits May 9, 2022 17:30

Add defstruct

6b4300f

Adds a method for dynamically defining new struct types. This is helpful for situations where types aren't known until runtime, but you still want to provide type validation when encoding/decoding.

Fixup mypy

d80fbcc

Recent mypy upgrade broke the CI setup.

jcrist merged commit d85078c into master May 10, 2022

jcrist deleted the defstruct branch May 10, 2022 03:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `defstruct` #105

Add `defstruct` #105

jcrist commented May 9, 2022

jcrist commented May 10, 2022

Add defstruct #105

Add defstruct #105

Conversation

jcrist commented May 9, 2022

jcrist commented May 10, 2022

Add `defstruct` #105

Add `defstruct` #105