Conversation
Blitz/Taffy does not compute layout for <caption> (returns 0x0).
Work around this by rewriting <caption> into an inline-block wrapper
structure before passing HTML to Blitz:
<div style="display: inline-block">
<div>caption text</div> <!-- top or bottom based on caption-side -->
<table>...</table>
</div>
This preserves table page splitting and header repetition since the
wrapper becomes a BlockPageable containing the TablePageable.
Supports caption-side: top (default) and bottom.
Handles nested tables correctly.
📝 WalkthroughWalkthroughHTMLのテーブル要素を処理する新機能が追加されました。キャプション要素を検出してインラインブロックの Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/fulgur/src/engine.rs`:
- Around line 111-119: The current extract_caption function uses
table_inner.find("<caption") which can incorrectly match tags like
"<captionxyz>"; change the detection to mirror find_matching_close_table by only
accepting a true caption start when it is either "<caption>" or "<caption "
(i.e. check the substring at cap_start to ensure the following character is '>'
or whitespace) before proceeding with caption extraction and the nested-table
check; update references around cap_start and the nested_table check in
extract_caption to use this stricter check.
- Around line 145-151: The current parsing around cap_tag only looks for
style="..." (find("style=\"")) and returns CaptionSideHint::Top when
double-quoted style isn't found; update the logic in the code that computes
style_start, style_val and style_end so it supports both double- and
single-quoted attributes: locate the substring "style=" in cap_tag, then read
the next character to determine the quote char ('"' or '\''), slice style_val
starting after that quote, and search for the matching closing quote to extract
the attribute value before parsing caption-side; keep behavior identical
(fallback to CaptionSideHint::Top) on missing or malformed values and update
references to cap_tag, style_start, style_val and style_end accordingly.
- Around line 153-157: The detection for "caption-side: bottom" is too loose
because it only checks for the substrings "caption-side" and "bottom" in the
style string (the style variable used in this block) and can misfire for
unrelated "bottom" values; update the logic in the same block that returns
CaptionSideHint::Bottom/Top to perform stricter matching—either split the style
string into declarations and check that a declaration's property equals
"caption-side" and its value equals "bottom" (trimmed, case-insensitive), or use
a precise regex that matches "caption-side\s*:\s*bottom\b"—then return
CaptionSideHint::Bottom only when that exact property:value pair is present,
otherwise return CaptionSideHint::Top.
- Around line 77-79: The current prefix check if
html[pos..].starts_with("<table") can false-positive on tags like <tabledata>;
change the condition in the parsing logic that updates depth/pos (the block
referencing html, pos and depth) to require a stricter tag boundary: after
matching "<table" ensure the following character (peek at html[pos+6]) is either
whitespace, '>', '/', or the end of input (and consider case-insensitive
matching if needed) before incrementing depth and advancing pos.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 74eccbab-3ad1-4be0-b78b-cb2e959cbfd7
📒 Files selected for processing (2)
crates/fulgur/src/engine.rscrates/fulgur/tests/table_header_test.rs
| if html[pos..].starts_with("<table") { | ||
| depth += 1; | ||
| pos += 6; |
There was a problem hiding this comment.
<tableの誤マッチングの可能性
"<table"のプレフィックスチェックは、<tabledata>や<tablexyz>などの異なるタグにも誤ってマッチする可能性があります。より厳密なマッチングを検討してください。
🛡️ 提案する修正
while pos < html.len() {
- if html[pos..].starts_with("<table") {
+ let rest = &html[pos..];
+ if rest.starts_with("<table>") || rest.starts_with("<table ") {
depth += 1;
- pos += 6;
+ pos += 6; // "<table" の長さ分進める
} else if html[pos..].starts_with("</table>") {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/fulgur/src/engine.rs` around lines 77 - 79, The current prefix check
if html[pos..].starts_with("<table") can false-positive on tags like
<tabledata>; change the condition in the parsing logic that updates depth/pos
(the block referencing html, pos and depth) to require a stricter tag boundary:
after matching "<table" ensure the following character (peek at html[pos+6]) is
either whitespace, '>', '/', or the end of input (and consider case-insensitive
matching if needed) before incrementing depth and advancing pos.
| fn extract_caption(table_inner: &str) -> Option<CaptionInfo> { | ||
| let cap_start = table_inner.find("<caption")?; | ||
|
|
||
| // Ensure this caption appears before any nested <table> tag | ||
| if let Some(nested_table) = table_inner.find("<table") { | ||
| if cap_start > nested_table { | ||
| return None; | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
<captionのマッチングも同様の考慮が必要
find_matching_close_tableと同様に、"<caption"は<captionxyz>のようなタグにもマッチする可能性があります。一貫性のため、"<caption>"または"<caption "のチェックを検討してください。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/fulgur/src/engine.rs` around lines 111 - 119, The current
extract_caption function uses table_inner.find("<caption") which can incorrectly
match tags like "<captionxyz>"; change the detection to mirror
find_matching_close_table by only accepting a true caption start when it is
either "<caption>" or "<caption " (i.e. check the substring at cap_start to
ensure the following character is '>' or whitespace) before proceeding with
caption extraction and the nested-table check; update references around
cap_start and the nested_table check in extract_caption to use this stricter
check.
| let Some(style_start) = cap_tag.find("style=\"") else { | ||
| return CaptionSideHint::Top; | ||
| }; | ||
| let style_val = &cap_tag[style_start + 7..]; | ||
| let Some(style_end) = style_val.find('"') else { | ||
| return CaptionSideHint::Top; | ||
| }; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
シングルクォートのstyle属性が未サポート
style='caption-side: bottom'のようなシングルクォート形式のstyle属性は現在処理されません。実用上は稀ですが、堅牢性のために対応を検討してください。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/fulgur/src/engine.rs` around lines 145 - 151, The current parsing
around cap_tag only looks for style="..." (find("style=\"")) and returns
CaptionSideHint::Top when double-quoted style isn't found; update the logic in
the code that computes style_start, style_val and style_end so it supports both
double- and single-quoted attributes: locate the substring "style=" in cap_tag,
then read the next character to determine the quote char ('"' or '\''), slice
style_val starting after that quote, and search for the matching closing quote
to extract the attribute value before parsing caption-side; keep behavior
identical (fallback to CaptionSideHint::Top) on missing or malformed values and
update references to cap_tag, style_start, style_val and style_end accordingly.
| if style.contains("caption-side") && style.contains("bottom") { | ||
| CaptionSideHint::Bottom | ||
| } else { | ||
| CaptionSideHint::Top | ||
| } |
There was a problem hiding this comment.
caption-side: bottomの検出ロジックが緩すぎる可能性
現在の実装は"caption-side"と"bottom"が両方存在するかをチェックしていますが、style="caption-side: top; margin-bottom: 10px"のようなケースで誤検出する可能性があります。
🛡️ より厳密なパターンマッチング
let style = &style_val[..style_end];
- if style.contains("caption-side") && style.contains("bottom") {
+ // caption-side: bottom または caption-side:bottom を検出
+ let normalized = style.replace(' ', "");
+ if normalized.contains("caption-side:bottom") {
CaptionSideHint::Bottom
} else {
CaptionSideHint::Top
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if style.contains("caption-side") && style.contains("bottom") { | |
| CaptionSideHint::Bottom | |
| } else { | |
| CaptionSideHint::Top | |
| } | |
| let style = &style_val[..style_end]; | |
| // caption-side: bottom または caption-side:bottom を検出 | |
| let normalized = style.replace(' ', ""); | |
| if normalized.contains("caption-side:bottom") { | |
| CaptionSideHint::Bottom | |
| } else { | |
| CaptionSideHint::Top | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/fulgur/src/engine.rs` around lines 153 - 157, The detection for
"caption-side: bottom" is too loose because it only checks for the substrings
"caption-side" and "bottom" in the style string (the style variable used in this
block) and can misfire for unrelated "bottom" values; update the logic in the
same block that returns CaptionSideHint::Bottom/Top to perform stricter
matching—either split the style string into declarations and check that a
declaration's property equals "caption-side" and its value equals "bottom"
(trimmed, case-insensitive), or use a precise regex that matches
"caption-side\s*:\s*bottom\b"—then return CaptionSideHint::Bottom only when that
exact property:value pair is present, otherwise return CaptionSideHint::Top.
| } | ||
| pos += 8; | ||
| } else { | ||
| pos += 1; |
There was a problem hiding this comment.
🔴 Panic on non-ASCII HTML content due to byte-level pos += 1 in find_matching_close_table
In find_matching_close_table, the else branch increments pos by 1 byte (pos += 1 at line 87). If the character at pos is a multi-byte UTF-8 character (e.g., ö, 日, emoji), this lands pos in the middle of a character. The next iteration's html[pos..] slice at line 77 panics because the index is not on a UTF-8 character boundary. This is triggered by any non-ASCII content between <table> and </table>, which is extremely common (any non-English text). Since rewrite_table_captions is called unconditionally for all render_html invocations at crates/fulgur/src/engine.rs:193, this will crash on real-world HTML like <table><caption>Ärzte</caption>...</table>.
Prompt for agents
In crates/fulgur/src/engine.rs, function find_matching_close_table (line 73-91), the else branch at line 87 uses pos += 1 which breaks on multi-byte UTF-8 characters. Replace the byte-level scanning with a UTF-8-safe approach. One option: change the else branch to advance by the current character's byte length:
} else {
pos += html[pos..].chars().next().map_or(1, |c| c.len_utf8());
}
Alternatively, since we only care about ASCII tags (< character), you can search for the next '<' to skip over non-ASCII content efficiently:
} else {
match html[pos..].find('<') {
Some(offset) => pos += offset,
None => break,
}
}
Was this helpful? React with 👍 or 👎 to provide feedback.
| if style.contains("caption-side") && style.contains("bottom") { | ||
| CaptionSideHint::Bottom |
There was a problem hiding this comment.
🟡 detect_caption_side false positive when style contains other properties with "bottom" substring
The check style.contains("caption-side") && style.contains("bottom") at line 153 is too loose. It will incorrectly detect CaptionSideHint::Bottom when the style attribute contains caption-side: top alongside any other property containing the substring "bottom", such as margin-bottom, padding-bottom, or border-bottom. For example, <caption style="caption-side: top; margin-bottom: 10px"> would be misidentified as bottom-side, causing the caption to be placed after the table instead of before it.
| if style.contains("caption-side") && style.contains("bottom") { | |
| CaptionSideHint::Bottom | |
| if style.contains("caption-side") && style.contains("caption-side: bottom") { | |
| CaptionSideHint::Bottom |
Was this helpful? React with 👍 or 👎 to provide feedback.
|
パイプライン設計(fulgur-1f9)の後にDOM処理ベースで再実装予定のため一旦保留。 |
Summary
<caption>要素をPDF出力に反映する機能を追加<caption>のレイアウトを計算しない(0×0を返す)制約を、HTML前処理による書き換えで回避<caption>をinline-blockラッパー内の<div>に変換し、既存のBlockPageable/TablePageableのページ分割・ヘッダー繰り返し機構をそのまま活用caption-side: top(デフォルト)とbottomに対応Approach
WeasyPrintの
TableWrapperBoxパターンを参考に、テーブルとキャプションをinline-blockのラッパーdivで包む方式を採用。キャプションの幅がテーブル幅に連動する。Test plan
cargo test --lib(54 tests pass)cargo test -p fulgur --test table_header_test -- --test-threads=1(6 tests pass)cargo fmt --checkcleancargo clippycleanCloses fulgur-78o
Summary by CodeRabbit
リリースノート
新機能
caption-side: bottomスタイルに対応する形で配置されます。テスト