Skip to content

Python: share the ty-types type registry across a project parse#7932

Merged
knutwannheden merged 1 commit into
mainfrom
fix-multi-file-python-supertype-attribution
Jun 7, 2026
Merged

Python: share the ty-types type registry across a project parse#7932
knutwannheden merged 1 commit into
mainfrom
fix-multi-file-python-supertype-attribution

Conversation

@knutwannheden

@knutwannheden knutwannheden commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Motivation

rewrite-python's whole-project parse path (handle_parse_project, used by the CLI's mod build via rpc.parseProject) parses every .py file in a project through a single shared ty-types --serve session. ty's session deduplicates type descriptors: each type is emitted in full exactly once — in the first getTypes response that references it — and later responses reference it by id only.

But the parser builds a fresh per-file type registry, populated solely from that file's getTypes response. So any type first seen in an earlier file (e.g. pydantic.BaseModel) is absent from later files' registries, and supertypes pointing into it can't resolve. The net effect on a real multi-file repo: first-party classes in every file but the first lose their supertypes, self no longer resolves as a BaseModel subclass, and a type-aware recipe like ReplaceModelFieldsInstanceAccess silently does nothing across most of the project.

ty type ids are stable within a session (e.g. id 1 == BaseModel across all files), which is what makes a session-scoped registry sound.

Summary

  • TyTypesClient now accumulates every descriptor returned across all getTypes calls into a cumulative, session-scoped session_types table, keyed by ty's session-stable type ids. The table is reset when a new session starts (a fresh client, or initialize with a different project root) so no state leaks across unrelated parses.
  • PythonTypeMapping references that table and, in _build_index, back-fills any descriptor missing from the current file's own response (and seeds _class_literal_index from the cumulative class literals). File-local descriptors take precedence, preserving the existing FQN-based dedup of Class objects.
  • This keeps ty's dedup performance benefit (descriptors are still emitted only once per session) while restoring correct supertype resolution in every file of a project.

Alternatives considered: (a) a fresh ty session per file in handle_parse_project — correct but re-initializes ty per file and loses the shared-session speedup; (b) a ty --serve "no-dedup" mode — least control and would regress payload size. The session-scoped registry retains the performance benefit and is fully under our control.

Test plan

  • Added TestProjectParseSupertypeAcrossFiles to tests/python/test_type_attribution.py, exercising handle_parse_project exactly as mod build does. It uses two peer model files so the failure is order-independent (whichever file ty parses second loses its base), asserting that self resolves as a pydantic.main.BaseModel subclass in both files. Fails before this change (the second file drops the supertype), passes after.
  • The single-file TestExternalSupertypeResolutionInParsePath positive/negative tests still pass.
  • Full tests/python/test_type_attribution.py green (130 passed).
  • Broader tests/python + tests/recipes suites green (1565 passed, 6 skipped).

(The new test relies on ty-types and uv being available and pydantic not being importable in the test interpreter; it skips otherwise by design.)

ty's `--serve` session deduplicates type descriptors: each type is emitted
in full only once, in the first `getTypes` response that references it, and
later responses reference it by id only. `handle_parse_project` parses every
file in a project through a single shared session, but each file built its
own per-file type registry from only that file's response. A type first seen
in an earlier file (e.g. `pydantic.BaseModel`) was therefore absent from
later files' registries, so first-party classes in every file but the first
lost their supertypes and `self` stopped resolving as a subclass.

Accumulate every descriptor returned across a session into a cumulative
`TyTypesClient.session_types` table (keyed by ty's session-stable ids), and
have `PythonTypeMapping` back-fill any descriptor missing from a file's own
response from that table. File-local descriptors take precedence; the table
is reset when a new session starts so no state leaks across parses. This
keeps ty's dedup performance benefit while restoring resolution in every
file.
@github-project-automation github-project-automation Bot moved this to In Progress in OpenRewrite Jun 7, 2026
@knutwannheden knutwannheden changed the title rewrite-python: share the ty-types type registry across a project parse Python: share the ty-types type registry across a project parse Jun 7, 2026
@knutwannheden knutwannheden merged commit b6b2941 into main Jun 7, 2026
1 check passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in OpenRewrite Jun 7, 2026
@knutwannheden knutwannheden deleted the fix-multi-file-python-supertype-attribution branch June 7, 2026 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant