Skip to content

Preserve Unicode escapes in Javadoc HTML element and attribute names#7394

Merged
timtebeek merged 3 commits intomainfrom
tim/javadoc-html-diaeresis
Apr 16, 2026
Merged

Preserve Unicode escapes in Javadoc HTML element and attribute names#7394
timtebeek merged 3 commits intomainfrom
tim/javadoc-html-diaeresis

Conversation

@timtebeek
Copy link
Copy Markdown
Member

@timtebeek timtebeek commented Apr 16, 2026

Summary

Fixes an IllegalStateException: ... is not print idempotent when parsing Javadoc containing Unicode-escaped characters (e.g. <\u00ef>) as HTML element/attribute names. The Java compiler decodes \u00ef into ï before the com.sun.source.doctree parser runs, but the raw source still contains the 6-char escape — so cursor += name.length() in visitStartElement, visitEndElement, and visitAttribute drifted by 5 chars and mangled subsequent tag parsing. Fixed by consuming names with the same escape-aware logic already used in visitText, applied across all five ReloadableJavaNJavadocVisitor implementations (Java 8/11/17/21/25), with a shared reproducer added to rewrite-java-tck's JavadocTest.

Test plan

  • ./gradlew :rewrite-java-8:compatibilityTest :rewrite-java-11:compatibilityTest :rewrite-java-17:compatibilityTest :rewrite-java-21:compatibilityTest :rewrite-java-25:compatibilityTest --tests "org.openrewrite.java.tree.JavadocTest"

When the Java compiler expands Unicode escapes (e.g. `\u00ef`) before
the `com.sun.source.doctree` parser runs, the parser returns the decoded
character but the source still contains the 6-character escape. The
existing `cursor += name.length()` in `visitStartElement`, `visitEndElement`,
and `visitAttribute` assumed one source char per name char, drifting the
cursor and breaking print idempotency. Consume names using the same
escape-aware logic already present in `visitText`.
@timtebeek timtebeek marked this pull request as ready for review April 16, 2026 22:10
@timtebeek timtebeek requested a review from sambsnyd April 16, 2026 22:11
@github-project-automation github-project-automation bot moved this from In Progress to Ready to Review in OpenRewrite Apr 16, 2026
@timtebeek timtebeek merged commit 89a3475 into main Apr 16, 2026
1 check passed
@timtebeek timtebeek deleted the tim/javadoc-html-diaeresis branch April 16, 2026 22:54
@github-project-automation github-project-automation bot moved this from Ready to Review to Done in OpenRewrite Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants