Skip to content

rewrite-xml: support HTML void elements in JSP/HTML parsing#7906

Merged
jkschneider merged 1 commit into
mainfrom
lucky-quail
Jun 6, 2026
Merged

rewrite-xml: support HTML void elements in JSP/HTML parsing#7906
jkschneider merged 1 commit into
mainfrom
lucky-quail

Conversation

@knutwannheden
Copy link
Copy Markdown
Contributor

Motivation

JSP and HTML files are parsed by the rewrite-xml XML grammar (the JSP extensions and the .jsp extension live there). The grammar's element rule only knew two shapes — <a>…</a> and <a/> — with no notion of an HTML void element. HTML5 allows void elements such as <br>, <img>, <input> and <meta> to be written without a trailing slash.

When the parser hit <br>, it treated it as the start of a normal element expecting </br>. ANTLR error recovery then mangled the tree (an entire <html>…</html> block collapsed to <html/> on reprint), the reprint no longer matched the input, and Parser.requirePrintEqualsInput downgraded the whole file to a ParseError. The original text was preserved, but the file became opaque to recipes.

This surfaced on real, valid Spring Boot JSP smoke tests (welcome.jsp), whose only "offense" was an unclosed <br>.

Examples

Previously this .jsp failed to parse (became a ParseError); it now round-trips as a proper Xml.Document:

<html lang="en">
<body>
	<br>
	Message
	<br>
</body>
</html>

Void elements with attributes are supported too:

<meta charset="utf-8">
<link rel="stylesheet" href="app.css">
<img src="logo.png" alt="logo">
<input type="text" name="q">
<hr>

Strict XML is unchanged: an element that merely shares a name with a void element still parses as a container (<link>https://example.com</link>), and an unclosed <br> in a plain .xml file remains a ParseError.

Summary

  • XMLParser.g4: added a third element alternative for void elements, gated by an isVoidElement($name.text) semantic predicate, plus an empty voidClose marker rule so the choice is detectable in the parse tree.
  • Added XMLParserBase and wired it via the grammar's superClass option. It holds the htmlMode flag and isVoidElement(...), keeping the .g4 free of target-specific (Java) members so the C# generation in rewrite-csharp is not broken. (When the C# sources are next regenerated they will need a matching XMLParserBase in the OpenRewrite.Xml.Grammar namespace; this is noted in a grammar comment.)
  • XmlParser: enables htmlMode only for .jsp/.jspx/.html/.htm sources.
  • XmlParserVisitor: maps the void shape to a Tag with null content/closing and attaches an HtmlVoidElement marker.
  • New HtmlVoidElement marker + XmlPrinter: a marked tag prints a bare > instead of />. A marker (rather than a model-field change) keeps the LST shape and serialization unchanged.
  • Regenerated the Java ANTLR sources.

Test plan

  • New XmlParserTest cases: <br> in a .jsp; void elements with attributes (<meta>/<link>/<img>/<input>/<hr>); the full Spring Boot welcome.jsp round-trip.
  • Regression guards: void-named container elements (<link>…</link>, <source>…</source>) still parse in XML mode; an unclosed <br> in .xml remains a ParseError (void leniency is HTML-only).
  • Existing JSP tests (jsp, jspScriptlet, mixedJspElements, …) still pass.
  • ./gradlew :rewrite-xml:check is green (tests + license).

JSP and HTML sources are parsed by the XML grammar, whose `element` rule
only accepted fully-closed (`<a>…</a>`) and self-closing (`<a/>`) tags.
An HTML void element written without a slash (e.g. `<br>`) was parsed as
the start of a normal element; ANTLR error recovery then mangled the tree,
the reprint no longer matched the input, and the whole file was downgraded
to a ParseError.

Add HTML void-element support, enabled only for HTML-like sources
(.jsp/.jspx/.html/.htm) so strict XML parsing is unaffected:

- the grammar gains a void-element alternative gated by a semantic
  predicate, plus an empty `voidClose` marker rule to detect it
- htmlMode and isVoidElement live in a hand-written XMLParserBase wired
  via the grammar's superClass option, keeping the .g4 free of
  target-specific members so the C# generation is not broken
- void tags carry an HtmlVoidElement marker so the printer emits a bare
  `>` instead of `/>`
@github-project-automation github-project-automation Bot moved this to In Progress in OpenRewrite Jun 4, 2026
@jkschneider jkschneider merged commit 7f3a204 into main Jun 6, 2026
1 check passed
@jkschneider jkschneider deleted the lucky-quail branch June 6, 2026 21:37
@github-project-automation github-project-automation Bot moved this from In Progress to Done in OpenRewrite Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants