Skip to content

Properly format javadocs#1168

Merged
sugmanue merged 2 commits into
smithy-lang:mainfrom
sugmanue:sugmanue/javadocs-overhaul
May 4, 2026
Merged

Properly format javadocs#1168
sugmanue merged 2 commits into
smithy-lang:mainfrom
sugmanue:sugmanue/javadocs-overhaul

Conversation

@sugmanue
Copy link
Copy Markdown
Contributor

@sugmanue sugmanue commented May 2, 2026

Convert Smithy documentation traits to properly formatted Javadoc

Smithy's @documentation trait accepts Markdown, and AWS service models use raw HTML in their documentation traits. Previously, this content was written to Javadoc comments as-is, producing poorly formatted output with excessive whitespace, missing paragraph separation, invalid HTML nesting, and no line wrapping.

This change introduces a proper conversion pipeline that produces clean, well-formatted Javadoc from both Markdown and HTML input.

Pipeline

Input (Markdown or HTML)

  • Escape Java generics (List -> List<String>)
  • commonmark HtmlRenderer (Markdown -> HTML; HTML passes through)
  • jsoup DOM parser (HTML string -> DOM tree)
  • DOM cleanup (unwrap <p> inside <li>, remove empty <p>)
  • Javadoc renderer (DOM -> formatted string with indentation and wrapping)

What changed

New dependencies:

  • org.commonmark:commonmark:0.28.0 (BSD-2-Clause, no transitive deps) for Markdown to HTML
  • org.jsoup:jsoup:1.22.2 (MIT, no transitive deps) for HTML parsing and DOM manipulation

MarkdownToJavadoc (new) - Converts documentation trait values to Javadoc-compatible HTML:

  • First paragraph has no <p> tag (Javadoc convention)
  • Subsequent paragraphs separated by blank lines with <p> prefix
  • Block-level tags (<ul>, <li>, etc.) pretty-printed on their own lines with 2-space indentation
  • Blank line before top-level block tags for visual separation
  • HTML-aware line wrapping that never breaks inside tags, attributes, or {@literal @} blocks
  • Wrapping width is nesting-dependent (117 chars for class-level, 113 for member-level Javadoc)
  • @ escaped as {@literal @} to prevent Javadoc tag conflicts
  • <, >, & in text properly encoded as HTML entities
  • Java generics (List<String>) preserved as List&lt;String&gt; instead of being parsed as HTML
  • Invalid nesting cleaned up (e.g., <li><p>text</p></li> simplified to <li>text</li>)
  • Empty <p> elements removed
  • Markdown code blocks rendered as <pre>{@code ...}</pre>

JavadocFormatterInterceptor (simplified) - Reduced from 230 lines to 110. Now only handles wrapping content in /** ... */ delimiters,
prefixing lines with *, escaping *, and {@snippet} blocks. All HTML formatting and line wrapping moved to MarkdownToJavadoc.

DocumentationTraitInterceptor (updated) - Passes nesting-dependent max width to the converter based on whether the Javadoc is for a class or
a member.

Before / After

(Examples used the trascribestreaming model)

Before (AudioStream.java):

/**
 * <p>An encoded stream of audio blobs. Audio streams are encoded as either HTTP/2 or WebSocket
 *       data frames.</p>
 * <p>For more information, see <a href="https://docs.aws.amazon.com/transcribe/latest/dg/streaming.html">Transcribing streaming audio</a>.</p>
 */

After:

/**
 * An encoded stream of audio blobs. Audio streams are encoded as either HTTP/2 or WebSocket data frames.
 *
 * <p>For more information, see <a href="https://docs.aws.amazon.com/transcribe/latest/dg/streaming.html">Transcribing
 * streaming audio</a>.
 */

Before (MedicalScribeInputStream.java):

/**
 * <p>An encoded stream of events.</p>
 * <ul>
 *             <li>
 *                <p>
 *                   <code>MedicalScribeConfigurationEvent</code>
 *                </p>
 *             </li>
 *          </ul>
 */

After:

/**
 * An encoded stream of events. The stream is encoded as HTTP/2 data frames.
 *
 * <ul>
 *   <li>
 *     <code>MedicalScribeConfigurationEvent</code>
 *   </li>
 * </ul>
 */

Testing

Added unit tests


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@sugmanue sugmanue enabled auto-merge (squash) May 2, 2026 22:29
@sugmanue sugmanue merged commit d1860cf into smithy-lang:main May 4, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants