Skip to content

Fix JSON Schema comment injection in generated code#2870

Merged
schani merged 24 commits into
masterfrom
fix-comment-injection
Jul 3, 2026
Merged

Fix JSON Schema comment injection in generated code#2870
schani merged 24 commits into
masterfrom
fix-comment-injection

Conversation

@schani

@schani schani commented Jun 26, 2026

Copy link
Copy Markdown
Member

Summary

  • Add JSON Schema fixtures that reproduce comment injection through schema descriptions.
  • Add a parser-only tree-sitter regression fixture that loops over covered generated languages.
  • Normalize description line endings before comment emission.
  • Centralize comment delimiter escaping in emitCommentLines/emitComments so renderers do not need to escape manually.
  • Escape C-style (*/), Elm/Haskell (-}), and triple-quoted (""") documentation comments.
  • Add focused comment-injection fixtures to PR CI.

CI coverage

The PR workflow now includes a dedicated row for:

  • comment-injection-treesitter
  • comment-injection-typescript
  • comment-injection-typescript-zod
  • comment-injection-typescript-effect-schema

Existing schema fixture rows also pick up the new schema samples for languages that already run schema-* in CI.

Not yet in CI: Objective-C comment-injection fixture, because Objective-C is currently disabled/commented out in the workflow. PHP is covered by the tree-sitter parser fixture, not a runtime fixture.

Validation run locally

  • npm run build
  • npx tsc -p test/tsconfig.json --noEmit
  • CPUs=1 QUICKTEST=true FIXTURE=comment-injection-treesitter npm test -- test/inputs/schema/comment-injection.schema
  • CPUs=1 QUICKTEST=true FIXTURE=schema-typescript npm test -- test/inputs/schema/comment-injection.schema
  • CPUs=1 QUICKTEST=true FIXTURE=comment-injection-typescript npm test
  • CPUs=1 QUICKTEST=true FIXTURE=comment-injection-typescript-zod npm test
  • CPUs=1 QUICKTEST=true FIXTURE=comment-injection-typescript-effect-schema npm test
  • CPUs=1 QUICKTEST=true FIXTURE=comment-injection-treesitter,comment-injection-typescript,comment-injection-typescript-zod,comment-injection-typescript-effect-schema npm test

Open in Devin Review

@schani schani changed the title Fix JSON Schema comment injection in generated docs Fix JSON Schema comment injection in generated code Jun 26, 2026

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

Open in Devin Review

Comment thread packages/quicktype-core/src/ConvenienceRenderer.ts
Comment thread packages/quicktype-core/src/ConvenienceRenderer.ts Outdated
schani and others added 14 commits June 26, 2026 18:15
Follow-up to the initial comment-injection fix, closing gaps found in
review:

- Escape the block-comment opener `/*` (not just the closer `*/`), so a
  bare `/*` in a description can't open a nested comment in Kotlin and
  Scala 3, whose block comments nest and would otherwise swallow the
  rest of the file.
- In triple-quoted comments (Python docstrings, Elixir moduledocs),
  double backslashes and escape a line-ending quote before an inline
  closing `"""`, so a description ending in `\` or `"` can't unbalance
  the delimiter. Removes the now-redundant Python trailing-quote hack.
- Normalize U+0085/U+2028/U+2029 to `\n` alongside CR, since JavaScript
  and C# lexers treat them as line terminators that would otherwise
  break out of a single-line comment.
- Entity-escape XML metacharacters in C# `///` doc comments (both dense
  and normal paths).
- Neutralize a trailing backslash in C, C++, and Objective-C line/block
  comments, which those lexers splice into the next line.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Add /* opener, XML metacharacters, Unicode line terminators, and
  trailing \ / " / """ payloads to the schema fixtures.
- Add an enum-schema variant of the nested-comment payload and route
  enum-only tree-sitter targets to it, so each sample run does distinct
  work instead of regenerating identical output.
- Scan generated bytes for injection classes the grammars can't detect
  (their lexers are more lenient than the real compilers): forbid any
  surviving U+0085/U+2028/U+2029, and add per-target forbiddenSubstrings
  (C# rejects the raw XML payload).
- Strip the extern-C guard blocks before parsing cjson output so a real
  MISSING-node injection is caught, instead of blanket-allowing missing
  nodes; work around a tree-sitter-dart false positive on `/*` inside
  line comments.
- Cache tree-sitter grammars per WASM path across targets and samples.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Add comment-injection-objective-c to the matrix on macos-latest as an
  experimental (continue-on-error) job, since Objective-C needs Apple's
  clang + Foundation and the full objective-c fixture used to segfault
  there. This fixture was registered but never selected by the matrix.
- Document the new escapes, payloads, content scan, and fixture
  workarounds.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The macos-latest trial confirmed the generated .m compiles cleanly but
the compiled ./test binary segfaults at runtime (SIGSEGV / exit 139) on
the sample JSON — the same pre-existing Objective-C harness issue that
keeps the full objective-c fixture out of CI, unrelated to comment
injection. Keep the fixture registered for local macOS runs, but drop
the CI matrix entry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@schani schani merged commit 64b3546 into master Jul 3, 2026
22 checks passed
@schani schani deleted the fix-comment-injection branch July 3, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant