Fix JSON node types cache: commit missing file and repair union deserialization#235
Merged
bashandbone merged 5 commits intofix-insecure-deserialization-node-types-cache-15389003928708879868from Mar 16, 2026
Conversation
…tract helper methods, add cache file Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
…c, simplify adapter guard Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
Contributor
Author
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Copilot
AI
changed the title
[WIP] Fix insecure deserialization vulnerability in node types parser
Fix JSON node types cache: commit missing file and repair union deserialization
Mar 16, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes the semantic node-types JSON cache so it can be loaded reliably at startup (instead of silently falling back to full JSON parsing), and commits the generated cache artifact needed for runtime loading.
Changes:
- Repair cache deserialization by validating cached connections as raw dicts and reconstructing
DirectConnectionvsPositionalConnectionsusing"role"as a discriminator. - Add and license the generated
node_types_cache.jsonartifact; update preprocessing to emit compact JSON. - Bump the
check-added-large-fileshook limit and pinhkinmise.tomlto accommodate the cache artifact.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/codeweaver/semantic/node_type_parser.py | Fixes cache load/deserialization; adds cached TypeAdapter and reconstruction helpers. |
| src/codeweaver/semantic/data/node_types_cache.json | Adds the missing runtime cache artifact (large JSON). |
| src/codeweaver/semantic/data/node_types_cache.json.license | Adds REUSE/Spdx licensing sidecar for the cache artifact. |
| scripts/build/preprocess-node-types.py | Writes compact JSON to reduce cache size. |
| mise.toml | Pins hk tool version. |
| hk.pkl | Raises large-file hook threshold to allow committing the cache file. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
a6438c7
into
fix-insecure-deserialization-node-types-cache-15389003928708879868
3 checks passed
bashandbone
added a commit
that referenced
this pull request
Mar 16, 2026
* 🔒 Replace insecure pickle with JSON for node types cache Mitigate insecure deserialization vulnerability by switching the tree-sitter node types cache from pickle to JSON. - Updated scripts/build/preprocess-node-types.py to serialize cache to JSON. - Updated src/codeweaver/semantic/node_type_parser.py to load and validate the JSON cache using pydantic.TypeAdapter. - Corrected build and CI artifact paths in mise.dev.toml. - Updated documentation in src/codeweaver/semantic/data/__init__.py. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * 🔒 Switch node types cache to JSON and fix CI issues - Switch tree-sitter node types cache from pickle to JSON to mitigate insecure deserialization (S301). - Use Pydantic's TypeAdapter for safe validation and loading. - Fix CI failures on Python 3.14 by using the project's internal uuid7 utility instead of uuid_extensions. - Clean up classification_result during cache loading to ensure fresh recomputation. - Correct artifact paths in mise.dev.toml and documentation. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * Potential fix for pull request finding 'Unused import' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> * Fix JSON node types cache: commit missing file and repair union deserialization (#235) * Initial plan * Fix JSON cache loading: resolve connection union, fix type errors, extract helper methods, add cache file Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * Address code review: clarify noqa comment, improve circular-import doc, simplify adapter guard Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> --------- Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI
added a commit
that referenced
this pull request
Mar 17, 2026
* 🔒 Replace insecure pickle with JSON for node types cache Mitigate insecure deserialization vulnerability by switching the tree-sitter node types cache from pickle to JSON. - Updated scripts/build/preprocess-node-types.py to serialize cache to JSON. - Updated src/codeweaver/semantic/node_type_parser.py to load and validate the JSON cache using pydantic.TypeAdapter. - Corrected build and CI artifact paths in mise.dev.toml. - Updated documentation in src/codeweaver/semantic/data/__init__.py. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * 🔒 Switch node types cache to JSON and fix CI issues - Switch tree-sitter node types cache from pickle to JSON to mitigate insecure deserialization (S301). - Use Pydantic's TypeAdapter for safe validation and loading. - Fix CI failures on Python 3.14 by using the project's internal uuid7 utility instead of uuid_extensions. - Clean up classification_result during cache loading to ensure fresh recomputation. - Correct artifact paths in mise.dev.toml and documentation. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * Potential fix for pull request finding 'Unused import' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> * Fix JSON node types cache: commit missing file and repair union deserialization (#235) * Initial plan * Fix JSON cache loading: resolve connection union, fix type errors, extract helper methods, add cache file Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * Address code review: clarify noqa comment, improve circular-import doc, simplify adapter guard Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> --------- Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The JSON cache was never actually loading — the missing
node_types_cache.jsonfile caused fallback to full parse on every startup, and a silentKeyErrorbug in the union deserialization meant the cache would have failed even if the file existed.Root cause:
DirectConnection | PositionalConnectionsunion always failedTypeAdaptertriesDirectConnectionfirst on all connection dicts.Connection.__init__callssuper().__init__(**data)which triggers pydantic's internal field extractor — a compiled lambda that raisesKeyError: 'role'(notValidationError) when aPositionalConnectionsdict is processed. The original code silently swallowed this viaexcept KeyError, making the cache a permanent no-op.Changes
node_types_cache.json— generated and committed (13 MB compact JSON); added REUSE license sidecarnode_type_parser.py— fix union deserialization; moveValidationError/TypeAdapterimports to module scope (fixes fragile import-inside-try); extract_ensure_cache_adapter(),_reconstruct_cache(),_clear_stale_cached_properties()helpers; cacheTypeAdapterasClassVar[TypeAdapter[Any] | None]built once per class lifetime; remove redundantjson.JSONDecodeError(covered byValidationError)preprocess-node-types.py— emit compact JSON (no indentation) to stay under the 13 MB VCS limithk.pkl— raisecheck-added-large-fileslimit from 10 000 → 15 000 KB to accommodate the cache artifact💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.