Skip to content

Add to_markdown to Action Text, mirroring to_plain_text#56858

Merged
dhh merged 1 commit intorails:mainfrom
flavorjones:action-text-to-markdown
Feb 24, 2026
Merged

Add to_markdown to Action Text, mirroring to_plain_text#56858
dhh merged 1 commit intorails:mainfrom
flavorjones:action-text-to-markdown

Conversation

@flavorjones
Copy link
Copy Markdown
Member

@flavorjones flavorjones commented Feb 23, 2026

Motivation / Background

Today, Rails supports two formats for "exporting" rich text content:

  • HTML
  • Plain text

HTML is very verbose, and not always handled well (or cheaply) by LLM agents. Plain text format is
pretty good for both human and agent consumption, but loses some of the nuance and formatting of the
original (for example: headers, strong, italic, strikethrough).

This pull request was created to allow rich text to be exported as Markdown, keeping as much of the
formatting intact as possible.

Detail

Introduce markdown conversion across the Action Text stack:

  • Content#to_markdown renders attachments then converts the fragment
  • Fragment#to_markdown delegates to the new MarkdownConversion module
  • RichText#to_markdown delegates through Content
  • Attachment#to_markdown delegates to attachable_markdown_representation

All attachable types implement attachable_markdown_representation:

MarkdownConversion is a bottom-up tree reducer (like PlainTextConversion) that converts HTML nodes to Markdown. It handles inline formatting (bold, italic, strikethrough, code), block elements (paragraphs, headings, blockquotes, code blocks, horizontal rules), lists (ordered, unordered, nested), links, tables, and details/summary. This implementation was manually tested against the markup generated by both Trix and Lexxy.

This commit also promotes the BottomUpReducer tree-walking class from ActionText::PlainTextConversion into its own file as ActionText::BottomUpReducer for use by both conversion modules.

Additional information

Security note: Markdown links are checked against Loofah's allowed URI protocols to prevent unsafe schemes like javascript: from appearing in the output.

Performance note: This implementation benchmarks ~35% faster than to_plain_text on my development machine for a ~42k HTML document generated by cmark-gfm from a real-world markdown file (but I'll note it's likely that a little effort can bring plain text conversion in line with the markdown performance, should someone choose to work on i).

\#!/usr/bin/env ruby

require "bundler/inline"
gemfile do
  source "https://rubygems.org"
  gem "benchmark-ips"
end

require_relative "../config/environment"

html = `cmark-gfm --to html #{File.expand_path("~/code/github.com/basecamp/activerecord-tenanted/GUIDE.md")}`

puts "HTML: #{html.bytesize} bytes, #{html.lines.count} lines"

html_doc = Nokogiri::HTML5.fragment(html)

Benchmark.ips do |x|
  x.report("to_markdown") { ActionText::MarkdownConversion.node_to_markdown(html_doc) }
  x.report("to_plain_text") { ActionText::PlainTextConversion.node_to_plain_text(html_doc) }
  x.compare!
end
HTML: 42282 bytes, 932 lines
ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x86_64-linux]
Warming up --------------------------------------
         to_markdown     7.000 i/100ms
       to_plain_text     4.000 i/100ms
Calculating -------------------------------------
         to_markdown     70.931 (± 7.0%) i/s   (14.10 ms/i) -    357.000 in   5.066019s
       to_plain_text     51.811 (± 5.8%) i/s   (19.30 ms/i) -    260.000 in   5.033790s

Comparison:
         to_markdown:       70.9 i/s
       to_plain_text:       51.8 i/s - 1.37x  slower

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Unrelated changes should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

Introduce markdown conversion across the Action Text stack:

- `Content#to_markdown` renders attachments then converts the fragment
- `Fragment#to_markdown` delegates to the new `MarkdownConversion` module
- `RichText#to_markdown` delegates through `Content`
- `Attachment#to_markdown` delegates to `attachable_markdown_representation`

All attachable types implement `attachable_markdown_representation`:

- `RemoteImage` renders `![caption](url)`
- `ContentAttachment` converts its embedded HTML to markdown
- `ActiveStorage::Blob` renders `[caption || filename]`
- `MissingAttachable` renders `☒` (see related rails#56854)

`MarkdownConversion` is a bottom-up tree reducer (like `PlainTextConversion`)
that converts HTML nodes to Markdown. It handles inline formatting (bold,
italic, strikethrough, code), block elements (paragraphs, headings,
blockquotes, code blocks, horizontal rules), lists (ordered, unordered,
nested), links, tables, and details/summary. This implementation was
manually tested against the markup generated by both Trix and Lexxy.

This commit also promotes the `BottomUpReducer` tree-walking class
from `ActionText::PlainTextConversion` into its own file as
`ActionText::BottomUpReducer` for use by both conversion modules.

Security note: Markdown links are checked against Loofah's allowed URI
protocols to prevent unsafe schemes like `javascript:` from appearing
in the output.

Performance note: This implementation benchmarks ~35% faster than
`to_plain_text` on my development machine for a ~42k HTML document
generated by cmark-gfm from a real-world markdown file:

```
\#!/usr/bin/env ruby

require "bundler/inline"
gemfile do
  source "https://rubygems.org"
  gem "benchmark-ips"
end

require_relative "../config/environment"

html = `cmark-gfm --to html #{File.expand_path("~/code/github.com/basecamp/activerecord-tenanted/GUIDE.md")}`

puts "HTML: #{html.bytesize} bytes, #{html.lines.count} lines"

html_doc = Nokogiri::HTML5.fragment(html)

Benchmark.ips do |x|
  x.report("to_markdown") { ActionText::MarkdownConversion.node_to_markdown(html_doc) }
  x.report("to_plain_text") { ActionText::PlainTextConversion.node_to_plain_text(html_doc) }
  x.compare!
end
```

```
HTML: 42282 bytes, 932 lines
ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x86_64-linux]
Warming up --------------------------------------
         to_markdown     7.000 i/100ms
       to_plain_text     4.000 i/100ms
Calculating -------------------------------------
         to_markdown     70.931 (± 7.0%) i/s   (14.10 ms/i) -    357.000 in   5.066019s
       to_plain_text     51.811 (± 5.8%) i/s   (19.30 ms/i) -    260.000 in   5.033790s

Comparison:
         to_markdown:       70.9 i/s
       to_plain_text:       51.8 i/s - 1.37x  slower

```
@flavorjones flavorjones force-pushed the action-text-to-markdown branch from a852bfc to c3dd09b Compare February 23, 2026 03:43
@dhh dhh merged commit e63f384 into rails:main Feb 24, 2026
4 checks passed
@flavorjones flavorjones deleted the action-text-to-markdown branch February 24, 2026 13:37
@yshmarov
Copy link
Copy Markdown

yshmarov commented Mar 2, 2026

had the same idea :) be83ff0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants