Skip to content

Create explicit schema/type for otData.py#4076

Merged
behdad merged 3 commits into
fonttools:mainfrom
wmedrano:wm-schema
Apr 5, 2026
Merged

Create explicit schema/type for otData.py#4076
behdad merged 3 commits into
fonttools:mainfrom
wmedrano:wm-schema

Conversation

@wmedrano
Copy link
Copy Markdown
Contributor

@wmedrano wmedrano commented Mar 31, 2026

Moves OpenType table definitions from raw tuples to a structured dataclass.

  • Introduces FieldSpec(type, name, repeat, aux, description).
  • Updates otData.py to use the new dataclass for all schema definitions.
  • Adjusts otConverters.py and otTables.py to consume the dataclass objects.
  • Makes it easier to track and extend usage

@behdad
Copy link
Copy Markdown
Member

behdad commented Mar 31, 2026

This looks good to me. @justvanrossum ?

@justvanrossum
Copy link
Copy Markdown
Collaborator

No objection, but I would like to know if there’s any measurable performance impact.

@wmedrano
Copy link
Copy Markdown
Contributor Author

wmedrano commented Mar 31, 2026

Ack, I'll run some performance tests. I can even try out named tuple.

  • Comb through the diff more carefully. Test failures are due to TupleList defaulting repeat to None instead of ""
  • Measure the performance impact
    • I'll also try a named tuple

Is there a preference between NamedTuple and DataClass? I kind of like DataClass syntax better but not a strong preference either way

@behdad
Copy link
Copy Markdown
Member

behdad commented Mar 31, 2026

Is there a preference between NamedTuple and DataClass?

I think the performance might guide us there.

@wmedrano
Copy link
Copy Markdown
Contributor Author

wmedrano commented Apr 5, 2026

I switched over to use the class based NamedTuple which has pretty similar syntax but is slightly faster than DataClass. Compared to the original tuple, NamedTuple slows down _buildClasses() by about 0.6%

Benchmarks

Python Interpreter Loading

This is just testing how long it takes to start and stop Python. Around 5.5 ms on my system.

hyperfine --warmup 3 --runs 1000 'python -c "pass"'
Method Mean
dataclass 5.5 ms ± 0.3 ms

Creating classes

This measures how long it takes to run the _buildClasses() code which reads the otData.py definitions

hyperfine --warmup 3 --runs 1000 'python -c "from fontTools.ttLib.tables.otTables import *"'
Method Mean Diff
tuple (original/base) 35.8 ms ± 1.5 ms 100%
DataClass 36.3 ms ± 1.5 ms 101.4%
NamedTuple (class) 36.0 ms ± 1.6 ms 100.6%
NamedTuple (functional) 36.1 ms ± 1.6 ms 100.8%

@behdad behdad merged commit 88fe4f3 into fonttools:main Apr 5, 2026
12 checks passed
@madig
Copy link
Copy Markdown
Collaborator

madig commented Apr 6, 2026

Random thought: What effort would be required to not dynamically generate all these classes on load but instead code-generate them from a schema, kinda like fontations does? It would enable better type checker integration (which doesn't deal with dynamically generated code) and would make loading the file a trivial module load, which Python can cache in a .pyc file.

@behdad
Copy link
Copy Markdown
Member

behdad commented Apr 6, 2026

Random thought: What effort would be required to not dynamically generate all these classes on load but instead code-generate them from a schema, kinda like fontations does? It would enable better type checker integration (which doesn't deal with dynamically generated code) and would make loading the file a trivial module load, which Python can cache in a .pyc file.

The main annoyance would be that, in current design, otTables.py can define a class, and the otBase.py can augment the class based on the fields in otData. If we generate code, it's not clear how to augment generated code with handwritten methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants