Properly format javadocs by sugmanue · Pull Request #1168 · smithy-lang/smithy-java

sugmanue · 2026-05-02T22:27:22Z

Convert Smithy documentation traits to properly formatted Javadoc

Smithy's @documentation trait accepts Markdown, and AWS service models use raw HTML in their documentation traits. Previously, this content was written to Javadoc comments as-is, producing poorly formatted output with excessive whitespace, missing paragraph separation, invalid HTML nesting, and no line wrapping.

This change introduces a proper conversion pipeline that produces clean, well-formatted Javadoc from both Markdown and HTML input.

Pipeline

Input (Markdown or HTML)

Escape Java generics (List -> List<String>)
commonmark HtmlRenderer (Markdown -> HTML; HTML passes through)
jsoup DOM parser (HTML string -> DOM tree)
DOM cleanup (unwrap  inside <li>, remove empty )
Javadoc renderer (DOM -> formatted string with indentation and wrapping)

What changed

New dependencies:

org.commonmark:commonmark:0.28.0 (BSD-2-Clause, no transitive deps) for Markdown to HTML
org.jsoup:jsoup:1.22.2 (MIT, no transitive deps) for HTML parsing and DOM manipulation

MarkdownToJavadoc (new) - Converts documentation trait values to Javadoc-compatible HTML:

First paragraph has no  tag (Javadoc convention)
Subsequent paragraphs separated by blank lines with  prefix
Block-level tags (<ul>, <li>, etc.) pretty-printed on their own lines with 2-space indentation
Blank line before top-level block tags for visual separation
HTML-aware line wrapping that never breaks inside tags, attributes, or {@literal @} blocks
Wrapping width is nesting-dependent (117 chars for class-level, 113 for member-level Javadoc)
@ escaped as {@literal @} to prevent Javadoc tag conflicts
<, >, & in text properly encoded as HTML entities
Java generics (List<String>) preserved as List<String> instead of being parsed as HTML
Invalid nesting cleaned up (e.g., <li>text</li> simplified to <li>text</li>)
Empty  elements removed
Markdown code blocks rendered as <pre>{@code ...}</pre>

JavadocFormatterInterceptor (simplified) - Reduced from 230 lines to 110. Now only handles wrapping content in /** ... */ delimiters,
prefixing lines with *, escaping *, and {@snippet} blocks. All HTML formatting and line wrapping moved to MarkdownToJavadoc.

DocumentationTraitInterceptor (updated) - Passes nesting-dependent max width to the converter based on whether the Javadoc is for a class or
a member.

Before / After

(Examples used the trascribestreaming model)

Before (AudioStream.java):

/**
 * <p>An encoded stream of audio blobs. Audio streams are encoded as either HTTP/2 or WebSocket
 *       data frames.</p>
 * <p>For more information, see <a href="https://docs.aws.amazon.com/transcribe/latest/dg/streaming.html">Transcribing streaming audio</a>.</p>
 */

After:

/**
 * An encoded stream of audio blobs. Audio streams are encoded as either HTTP/2 or WebSocket data frames.
 *
 * <p>For more information, see <a href="https://docs.aws.amazon.com/transcribe/latest/dg/streaming.html">Transcribing
 * streaming audio</a>.
 */

Before (MedicalScribeInputStream.java):

/**
 * <p>An encoded stream of events.</p>
 * <ul>
 *             <li>
 *                <p>
 *                   <code>MedicalScribeConfigurationEvent</code>
 *                </p>
 *             </li>
 *          </ul>
 */

After:

/**
 * An encoded stream of events. The stream is encoded as HTTP/2 data frames.
 *
 * <ul>
 *   <li>
 *     <code>MedicalScribeConfigurationEvent</code>
 *   </li>
 * </ul>
 */

Testing

Added unit tests

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Properly format javadocs

2fb357b

sugmanue enabled auto-merge (squash) May 2, 2026 22:29

Filter out unknown tags

a0e89c9

mtdowling approved these changes May 4, 2026

View reviewed changes

sugmanue merged commit d1860cf into smithy-lang:main May 4, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly format javadocs#1168

Properly format javadocs#1168
sugmanue merged 2 commits into
smithy-lang:mainfrom
sugmanue:sugmanue/javadocs-overhaul

sugmanue commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sugmanue commented May 2, 2026

Convert Smithy documentation traits to properly formatted Javadoc

Pipeline

What changed

Before / After

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants