Skip to content

feat: support <caption> element in tables#21

Open
mitsuru wants to merge 1 commit intomainfrom
feature/table-caption
Open

feat: support <caption> element in tables#21
mitsuru wants to merge 1 commit intomainfrom
feature/table-caption

Conversation

@mitsuru
Copy link
Copy Markdown
Owner

@mitsuru mitsuru commented Mar 22, 2026

Summary

  • テーブルの<caption>要素をPDF出力に反映する機能を追加
  • Blitz/Taffyが<caption>のレイアウトを計算しない(0×0を返す)制約を、HTML前処理による書き換えで回避
  • <caption>inline-blockラッパー内の<div>に変換し、既存のBlockPageable/TablePageableのページ分割・ヘッダー繰り返し機構をそのまま活用
  • caption-side: top(デフォルト)とbottomに対応
  • ネストされたテーブルを正しく処理

Approach

WeasyPrintのTableWrapperBoxパターンを参考に、テーブルとキャプションをinline-blockのラッパーdivで包む方式を採用。キャプションの幅がテーブル幅に連動する。

<!-- Before -->
<table><caption>Title</caption>...</table>

<!-- After (preprocessed) -->
<div style="display: inline-block">
  <div>Title</div>
  <table>...</table>
</div>

Test plan

  • cargo test --lib (54 tests pass)
  • cargo test -p fulgur --test table_header_test -- --test-threads=1 (6 tests pass)
  • cargo fmt --check clean
  • cargo clippy clean
  • caption-side: top のテーブルでcaptionがテーブル上部に表示されることを目視確認
  • caption-side: bottom のテーブルでcaptionがテーブル下部に表示されることを目視確認
  • 長いテーブル(50行)でページ分割時にcaptionが1ページ目のみに表示されることを確認

Closes fulgur-78o


Open with Devin

Summary by CodeRabbit

リリースノート

  • 新機能

    • HTMLテーブルのキャプション処理を改善しました。キャプション要素が自動的に変換され、caption-side: bottomスタイルに対応する形で配置されます。
  • テスト

    • キャプション表示位置、複数行テーブル、ネストされたテーブルなど、さまざまなシナリオのテストを追加しました。

Blitz/Taffy does not compute layout for <caption> (returns 0x0).
Work around this by rewriting <caption> into an inline-block wrapper
structure before passing HTML to Blitz:

  <div style="display: inline-block">
    <div>caption text</div>  <!-- top or bottom based on caption-side -->
    <table>...</table>
  </div>

This preserves table page splitting and header repetition since the
wrapper becomes a BlockPageable containing the TablePageable.
Supports caption-side: top (default) and bottom.
Handles nested tables correctly.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 22, 2026

📝 Walkthrough

Walkthrough

HTMLのテーブル要素を処理する新機能が追加されました。キャプション要素を検出してインラインブロックのdivラッパーに書き換え、ネストされたテーブルを正しく処理し、caption-sideスタイルに応じてキャプション位置を制御する機能を実装しています。

Changes

Cohort / File(s) Summary
テーブルキャプション書き換え機能
crates/fulgur/src/engine.rs
rewrite_table_captions()find_matching_close_table()detect_caption_side()を実装。HTMLのテーブル要素を走査し、キャプションをdiv要素に変換。ネストされたテーブルのマッチング処理と、caption-side: bottomスタイルに基づいた配置制御を含む。Engine::render_htmlに統合。ユニットテスト追加(236行)
テーブルキャプション動作テスト
crates/fulgur/tests/table_header_test.rs
デフォルトキャプション、caption-side: bottom、長いテーブル(50行生成)を含む3つの新規テストケースを追加。PDF出力の確認処理を含む(42行)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

キャプション君が引っ越した
テーブルの中でくるくる回転
ネストも深くても大丈夫
divに変身、スタイル美しく
🐰 ✨ キャプション処理、完了です!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed) 要素のサポート追加という主要な変更内容を正確に説明しており、明確で簡潔です。
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed プルリクエストのタイトルは、HTMLテーブルの
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/table-caption

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/fulgur/src/engine.rs`:
- Around line 111-119: The current extract_caption function uses
table_inner.find("<caption") which can incorrectly match tags like
"<captionxyz>"; change the detection to mirror find_matching_close_table by only
accepting a true caption start when it is either "<caption>" or "<caption "
(i.e. check the substring at cap_start to ensure the following character is '>'
or whitespace) before proceeding with caption extraction and the nested-table
check; update references around cap_start and the nested_table check in
extract_caption to use this stricter check.
- Around line 145-151: The current parsing around cap_tag only looks for
style="..." (find("style=\"")) and returns CaptionSideHint::Top when
double-quoted style isn't found; update the logic in the code that computes
style_start, style_val and style_end so it supports both double- and
single-quoted attributes: locate the substring "style=" in cap_tag, then read
the next character to determine the quote char ('"' or '\''), slice style_val
starting after that quote, and search for the matching closing quote to extract
the attribute value before parsing caption-side; keep behavior identical
(fallback to CaptionSideHint::Top) on missing or malformed values and update
references to cap_tag, style_start, style_val and style_end accordingly.
- Around line 153-157: The detection for "caption-side: bottom" is too loose
because it only checks for the substrings "caption-side" and "bottom" in the
style string (the style variable used in this block) and can misfire for
unrelated "bottom" values; update the logic in the same block that returns
CaptionSideHint::Bottom/Top to perform stricter matching—either split the style
string into declarations and check that a declaration's property equals
"caption-side" and its value equals "bottom" (trimmed, case-insensitive), or use
a precise regex that matches "caption-side\s*:\s*bottom\b"—then return
CaptionSideHint::Bottom only when that exact property:value pair is present,
otherwise return CaptionSideHint::Top.
- Around line 77-79: The current prefix check if
html[pos..].starts_with("<table") can false-positive on tags like <tabledata>;
change the condition in the parsing logic that updates depth/pos (the block
referencing html, pos and depth) to require a stricter tag boundary: after
matching "<table" ensure the following character (peek at html[pos+6]) is either
whitespace, '>', '/', or the end of input (and consider case-insensitive
matching if needed) before incrementing depth and advancing pos.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 74eccbab-3ad1-4be0-b78b-cb2e959cbfd7

📥 Commits

Reviewing files that changed from the base of the PR and between 7b9aaf5 and 85b3ef6.

📒 Files selected for processing (2)
  • crates/fulgur/src/engine.rs
  • crates/fulgur/tests/table_header_test.rs

Comment on lines +77 to +79
if html[pos..].starts_with("<table") {
depth += 1;
pos += 6;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

<tableの誤マッチングの可能性

"<table"のプレフィックスチェックは、<tabledata><tablexyz>などの異なるタグにも誤ってマッチする可能性があります。より厳密なマッチングを検討してください。

🛡️ 提案する修正
     while pos < html.len() {
-        if html[pos..].starts_with("<table") {
+        let rest = &html[pos..];
+        if rest.starts_with("<table>") || rest.starts_with("<table ") {
             depth += 1;
-            pos += 6;
+            pos += 6; // "<table" の長さ分進める
         } else if html[pos..].starts_with("</table>") {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/fulgur/src/engine.rs` around lines 77 - 79, The current prefix check
if html[pos..].starts_with("<table") can false-positive on tags like
<tabledata>; change the condition in the parsing logic that updates depth/pos
(the block referencing html, pos and depth) to require a stricter tag boundary:
after matching "<table" ensure the following character (peek at html[pos+6]) is
either whitespace, '>', '/', or the end of input (and consider case-insensitive
matching if needed) before incrementing depth and advancing pos.

Comment on lines +111 to +119
fn extract_caption(table_inner: &str) -> Option<CaptionInfo> {
let cap_start = table_inner.find("<caption")?;

// Ensure this caption appears before any nested <table> tag
if let Some(nested_table) = table_inner.find("<table") {
if cap_start > nested_table {
return None;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

<captionのマッチングも同様の考慮が必要

find_matching_close_tableと同様に、"<caption"<captionxyz>のようなタグにもマッチする可能性があります。一貫性のため、"<caption>"または"<caption "のチェックを検討してください。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/fulgur/src/engine.rs` around lines 111 - 119, The current
extract_caption function uses table_inner.find("<caption") which can incorrectly
match tags like "<captionxyz>"; change the detection to mirror
find_matching_close_table by only accepting a true caption start when it is
either "<caption>" or "<caption " (i.e. check the substring at cap_start to
ensure the following character is '>' or whitespace) before proceeding with
caption extraction and the nested-table check; update references around
cap_start and the nested_table check in extract_caption to use this stricter
check.

Comment on lines +145 to +151
let Some(style_start) = cap_tag.find("style=\"") else {
return CaptionSideHint::Top;
};
let style_val = &cap_tag[style_start + 7..];
let Some(style_end) = style_val.find('"') else {
return CaptionSideHint::Top;
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

シングルクォートのstyle属性が未サポート

style='caption-side: bottom'のようなシングルクォート形式のstyle属性は現在処理されません。実用上は稀ですが、堅牢性のために対応を検討してください。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/fulgur/src/engine.rs` around lines 145 - 151, The current parsing
around cap_tag only looks for style="..." (find("style=\"")) and returns
CaptionSideHint::Top when double-quoted style isn't found; update the logic in
the code that computes style_start, style_val and style_end so it supports both
double- and single-quoted attributes: locate the substring "style=" in cap_tag,
then read the next character to determine the quote char ('"' or '\''), slice
style_val starting after that quote, and search for the matching closing quote
to extract the attribute value before parsing caption-side; keep behavior
identical (fallback to CaptionSideHint::Top) on missing or malformed values and
update references to cap_tag, style_start, style_val and style_end accordingly.

Comment on lines +153 to +157
if style.contains("caption-side") && style.contains("bottom") {
CaptionSideHint::Bottom
} else {
CaptionSideHint::Top
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

caption-side: bottomの検出ロジックが緩すぎる可能性

現在の実装は"caption-side""bottom"が両方存在するかをチェックしていますが、style="caption-side: top; margin-bottom: 10px"のようなケースで誤検出する可能性があります。

🛡️ より厳密なパターンマッチング
     let style = &style_val[..style_end];
-    if style.contains("caption-side") && style.contains("bottom") {
+    // caption-side: bottom または caption-side:bottom を検出
+    let normalized = style.replace(' ', "");
+    if normalized.contains("caption-side:bottom") {
         CaptionSideHint::Bottom
     } else {
         CaptionSideHint::Top
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if style.contains("caption-side") && style.contains("bottom") {
CaptionSideHint::Bottom
} else {
CaptionSideHint::Top
}
let style = &style_val[..style_end];
// caption-side: bottom または caption-side:bottom を検出
let normalized = style.replace(' ', "");
if normalized.contains("caption-side:bottom") {
CaptionSideHint::Bottom
} else {
CaptionSideHint::Top
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/fulgur/src/engine.rs` around lines 153 - 157, The detection for
"caption-side: bottom" is too loose because it only checks for the substrings
"caption-side" and "bottom" in the style string (the style variable used in this
block) and can misfire for unrelated "bottom" values; update the logic in the
same block that returns CaptionSideHint::Bottom/Top to perform stricter
matching—either split the style string into declarations and check that a
declaration's property equals "caption-side" and its value equals "bottom"
(trimmed, case-insensitive), or use a precise regex that matches
"caption-side\s*:\s*bottom\b"—then return CaptionSideHint::Bottom only when that
exact property:value pair is present, otherwise return CaptionSideHint::Top.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 4 additional findings in Devin Review.

Open in Devin Review

}
pos += 8;
} else {
pos += 1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Panic on non-ASCII HTML content due to byte-level pos += 1 in find_matching_close_table

In find_matching_close_table, the else branch increments pos by 1 byte (pos += 1 at line 87). If the character at pos is a multi-byte UTF-8 character (e.g., ö, , emoji), this lands pos in the middle of a character. The next iteration's html[pos..] slice at line 77 panics because the index is not on a UTF-8 character boundary. This is triggered by any non-ASCII content between <table> and </table>, which is extremely common (any non-English text). Since rewrite_table_captions is called unconditionally for all render_html invocations at crates/fulgur/src/engine.rs:193, this will crash on real-world HTML like <table><caption>Ärzte</caption>...</table>.

Prompt for agents
In crates/fulgur/src/engine.rs, function find_matching_close_table (line 73-91), the else branch at line 87 uses pos += 1 which breaks on multi-byte UTF-8 characters. Replace the byte-level scanning with a UTF-8-safe approach. One option: change the else branch to advance by the current character's byte length:

  } else {
      pos += html[pos..].chars().next().map_or(1, |c| c.len_utf8());
  }

Alternatively, since we only care about ASCII tags (< character), you can search for the next '<' to skip over non-ASCII content efficiently:

  } else {
      match html[pos..].find('<') {
          Some(offset) => pos += offset,
          None => break,
      }
  }
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +153 to +154
if style.contains("caption-side") && style.contains("bottom") {
CaptionSideHint::Bottom
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 detect_caption_side false positive when style contains other properties with "bottom" substring

The check style.contains("caption-side") && style.contains("bottom") at line 153 is too loose. It will incorrectly detect CaptionSideHint::Bottom when the style attribute contains caption-side: top alongside any other property containing the substring "bottom", such as margin-bottom, padding-bottom, or border-bottom. For example, <caption style="caption-side: top; margin-bottom: 10px"> would be misidentified as bottom-side, causing the caption to be placed after the table instead of before it.

Suggested change
if style.contains("caption-side") && style.contains("bottom") {
CaptionSideHint::Bottom
if style.contains("caption-side") && style.contains("caption-side: bottom") {
CaptionSideHint::Bottom
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@mitsuru
Copy link
Copy Markdown
Owner Author

mitsuru commented Mar 22, 2026

パイプライン設計(fulgur-1f9)の後にDOM処理ベースで再実装予定のため一旦保留。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant