Preserve Unicode escapes in Javadoc HTML element and attribute names by timtebeek · Pull Request #7394 · openrewrite/rewrite

timtebeek · 2026-04-16T16:00:40Z

Summary

Fixes an IllegalStateException: ... is not print idempotent when parsing Javadoc containing Unicode-escaped characters (e.g. <\u00ef>) as HTML element/attribute names. The Java compiler decodes \u00ef into ï before the com.sun.source.doctree parser runs, but the raw source still contains the 6-char escape — so cursor += name.length() in visitStartElement, visitEndElement, and visitAttribute drifted by 5 chars and mangled subsequent tag parsing. Fixed by consuming names with the same escape-aware logic already used in visitText, applied across all five ReloadableJavaNJavadocVisitor implementations (Java 8/11/17/21/25), with a shared reproducer added to rewrite-java-tck's JavadocTest.

Test plan

./gradlew :rewrite-java-8:compatibilityTest :rewrite-java-11:compatibilityTest :rewrite-java-17:compatibilityTest :rewrite-java-21:compatibilityTest :rewrite-java-25:compatibilityTest --tests "org.openrewrite.java.tree.JavadocTest"

When the Java compiler expands Unicode escapes (e.g. `\u00ef`) before the `com.sun.source.doctree` parser runs, the parser returns the decoded character but the source still contains the 6-character escape. The existing `cursor += name.length()` in `visitStartElement`, `visitEndElement`, and `visitAttribute` assumed one source char per name char, drifting the cursor and breaking print idempotency. Consume names using the same escape-aware logic already present in `visitText`.

github-project-automation bot added this to OpenRewrite Apr 16, 2026

github-project-automation bot moved this to In Progress in OpenRewrite Apr 16, 2026

moderne-meeseeks bot assigned timtebeek Apr 16, 2026

Fix formatting in JavadocTest.java

0756b80

timtebeek marked this pull request as ready for review April 16, 2026 22:10

Merge branch 'main' into tim/javadoc-html-diaeresis

feaab43

timtebeek requested a review from sambsnyd April 16, 2026 22:11

sambsnyd approved these changes Apr 16, 2026

View reviewed changes

github-project-automation bot moved this from In Progress to Ready to Review in OpenRewrite Apr 16, 2026

timtebeek merged commit 89a3475 into main Apr 16, 2026
1 check passed

timtebeek deleted the tim/javadoc-html-diaeresis branch April 16, 2026 22:54

github-project-automation bot moved this from Ready to Review to Done in OpenRewrite Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve Unicode escapes in Javadoc HTML element and attribute names#7394

Preserve Unicode escapes in Javadoc HTML element and attribute names#7394
timtebeek merged 3 commits intomainfrom
tim/javadoc-html-diaeresis

timtebeek commented Apr 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timtebeek commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timtebeek commented Apr 16, 2026 •

edited

Loading