Fix JSON Schema comment injection in generated code#2870
Merged
Conversation
Follow-up to the initial comment-injection fix, closing gaps found in review: - Escape the block-comment opener `/*` (not just the closer `*/`), so a bare `/*` in a description can't open a nested comment in Kotlin and Scala 3, whose block comments nest and would otherwise swallow the rest of the file. - In triple-quoted comments (Python docstrings, Elixir moduledocs), double backslashes and escape a line-ending quote before an inline closing `"""`, so a description ending in `\` or `"` can't unbalance the delimiter. Removes the now-redundant Python trailing-quote hack. - Normalize U+0085/U+2028/U+2029 to `\n` alongside CR, since JavaScript and C# lexers treat them as line terminators that would otherwise break out of a single-line comment. - Entity-escape XML metacharacters in C# `///` doc comments (both dense and normal paths). - Neutralize a trailing backslash in C, C++, and Objective-C line/block comments, which those lexers splice into the next line. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Add /* opener, XML metacharacters, Unicode line terminators, and trailing \ / " / """ payloads to the schema fixtures. - Add an enum-schema variant of the nested-comment payload and route enum-only tree-sitter targets to it, so each sample run does distinct work instead of regenerating identical output. - Scan generated bytes for injection classes the grammars can't detect (their lexers are more lenient than the real compilers): forbid any surviving U+0085/U+2028/U+2029, and add per-target forbiddenSubstrings (C# rejects the raw XML payload). - Strip the extern-C guard blocks before parsing cjson output so a real MISSING-node injection is caught, instead of blanket-allowing missing nodes; work around a tree-sitter-dart false positive on `/*` inside line comments. - Cache tree-sitter grammars per WASM path across targets and samples. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Add comment-injection-objective-c to the matrix on macos-latest as an experimental (continue-on-error) job, since Objective-C needs Apple's clang + Foundation and the full objective-c fixture used to segfault there. This fixture was registered but never selected by the matrix. - Document the new escapes, payloads, content scan, and fixture workarounds. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The macos-latest trial confirmed the generated .m compiles cleanly but the compiled ./test binary segfaults at runtime (SIGSEGV / exit 139) on the sample JSON — the same pre-existing Objective-C harness issue that keeps the full objective-c fixture out of CI, unrelated to comment injection. Keep the fixture registered for local macOS runs, but drop the CI matrix entry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
emitCommentLines/emitCommentsso renderers do not need to escape manually.*/), Elm/Haskell (-}), and triple-quoted (""") documentation comments.CI coverage
The PR workflow now includes a dedicated row for:
comment-injection-treesittercomment-injection-typescriptcomment-injection-typescript-zodcomment-injection-typescript-effect-schemaExisting schema fixture rows also pick up the new schema samples for languages that already run
schema-*in CI.Not yet in CI: Objective-C comment-injection fixture, because Objective-C is currently disabled/commented out in the workflow. PHP is covered by the tree-sitter parser fixture, not a runtime fixture.
Validation run locally
npm run buildnpx tsc -p test/tsconfig.json --noEmitCPUs=1 QUICKTEST=true FIXTURE=comment-injection-treesitter npm test -- test/inputs/schema/comment-injection.schemaCPUs=1 QUICKTEST=true FIXTURE=schema-typescript npm test -- test/inputs/schema/comment-injection.schemaCPUs=1 QUICKTEST=true FIXTURE=comment-injection-typescript npm testCPUs=1 QUICKTEST=true FIXTURE=comment-injection-typescript-zod npm testCPUs=1 QUICKTEST=true FIXTURE=comment-injection-typescript-effect-schema npm testCPUs=1 QUICKTEST=true FIXTURE=comment-injection-treesitter,comment-injection-typescript,comment-injection-typescript-zod,comment-injection-typescript-effect-schema npm test