Skip to content

feat(drive): pre-flight 10000-rune total cap for +add-comment reply_elements#605

Merged
fangshuyu-768 merged 3 commits intomainfrom
feat/add-comment-length-precheck
Apr 30, 2026
Merged

feat(drive): pre-flight 10000-rune total cap for +add-comment reply_elements#605
fangshuyu-768 merged 3 commits intomainfrom
feat/add-comment-length-precheck

Conversation

@herbertliu
Copy link
Copy Markdown
Collaborator

@herbertliu herbertliu commented Apr 22, 2026

Summary

The open-platform comment endpoint /open-apis/drive/v1/files/{token}/new_comments rejects oversized payloads with an opaque [1069302] Invalid or missing parameters — no field name, no length information, no indication that length is even the cause. Callers (especially agents) end up binary-searching the content to figure out what hit the cap.

This PR adds a pre-flight check inside parseCommentReplyElements so drive +add-comment rejects over-cap input client-side with a structured, actionable error before any network round-trip.

The first version of this PR set a 300-byte per-text-element limit, justified by the empirical pattern "~80 Chinese chars OK, ~130 Chinese chars fail." A subsequent round of probing the live API showed that pattern doesn't reproduce and the actual contract is materially different (cap is 100× higher, on the total across elements, and rune-based not byte-based). The current implementation reflects the corrected findings; see "Empirical basis" below.

Behavior

  • Cap: combined character (rune) count across all type=text elements in --content must be ≤ 10000.
  • Counting: server counts raw runes; byte width and HTML escape state are irrelevant. 10000 ASCII = 10000 Chinese = 10000 < chars all sit at the limit, even though they encode to 10000 / 30000 / 40000 bytes respectively.
  • Multi-element: the cap is on the sum, not per element. Splitting one long element into multiple smaller ones does not bypass the cap; the hint explicitly says so.
  • Non-text elements: mention_user (open_id) and link (URL) don't count toward the budget.
  • Failure shape: structured text_too_long ExitError naming the offending element index and the running total at that point. Hint walks the caller through the fix.
$ lark-cli drive +add-comment --doc "$DOCX" --type docx --full-comment --content '[{"type":"text","text":"…10001 chars…"}]'

[validation] --content reply_elements text totals 10001 characters at element #1 (this element: 10001); the server caps the combined length at 10000 characters across ALL reply_elements

hint: shorten the comment so the combined text across all reply_elements fits within 10000 characters. The server enforces this cap on the TOTAL — splitting one long element into multiple smaller text elements does NOT help (they all add up against the same 10000-rune budget). Server returns an opaque [1069302] on overflow, so this check is pre-flight; no escape transform changes the count (server reads raw runes).

Empirical basis

Probed via drive file.comments create_v2 against a fresh docx, bypassing the shortcut so the server's actual responses are visible:

Input Result
10000 ASCII chars (10000 B) ✅ OK
10001 ASCII chars (10001 B) [1069302]
10000 Chinese chars (30000 B) ✅ OK
10001 Chinese chars (30003 B) [1069302]
10000 < chars (would-be 40000 B if escaped) ✅ OK
10000 & chars ✅ OK
5000 a + 5000 b across two elements (total 10000) ✅ OK
5000 a + 5001 b across two elements (total 10001) [1069302]
9999 a + 1 b across two elements (total 10000) ✅ OK
4000 + 4000 + 4000 across three elements (total 12000) [1069302]

Two takeaways: (1) the cap is on total runes, not per-element bytes; (2) splitting doesn't bypass the cap, so the previous "split into multiple elements" advice was actively misleading and is no longer in the code, the tests, or the skill md.

The schema doc currently advertises "1-1000 characters" for text elements. The live API accepts 10× that. The discrepancy is documented in a comment on maxCommentTotalRunes; if the schema is later corrected to 10000 (or any other value), re-probe with drive file.comments create_v2 and update the constant.

Changes

  • shortcuts/drive/drive_add_comment.goparseCommentReplyElements now tracks a running rune total across reply_elements and rejects at the first element pushing the cumulative count past 10000. Counts the raw input via utf8.RuneCountInString; escape (<&lt;, >&gt;) still happens for rendering but no longer participates in the length check. The maxCommentTotalRunes = 10000 constant carries an inline doc comment with the probe matrix.
  • shortcuts/drive/drive_add_comment_test.goTestParseCommentReplyElementsTextLength rewritten with 11 cases: single-element boundaries (ASCII / Chinese / angle brackets at exactly 10000 and 10001), multi-element total-cap (5000+5000 OK, 5000+5001 rejected with the rejection index pointing at element readme 优化建议 #2), early-element overshoot (first element at 10001 reports index Can I join the meeting via CLI? #1, not the trailing element), and mention_user not double-counting. New TestParseCommentReplyElementsHintForbidsSplitAdvice watchdog fails CI if anyone re-introduces the discredited "split into multiple text elements" advice into the hint.
  • skills/lark-drive/references/lark-drive-add-comment.md — replaces the old "300-byte per-element / split as workaround" bullets with two short ones describing the 10000-rune total cap and the mention_user / link exemption.

Test Plan

  • go build ./... clean
  • go vet ./... clean
  • go test ./shortcuts/drive/... -count=1 passes; race-mode pass clean
  • golangci-lint run --new-from-rev=origin/main — 0 issues
  • node scripts/skill-format-check/index.js skills passes
  • Full unit suite (go test $(go list ./... | grep -v cli_e2e) -count=1) — all packages green
  • Live API smoke against a real docx with the new binary: 100 / 5000 / 10000-char single elements accepted, 10001 / 10100 single rejected pre-flight, 5000+5000 split accepted, 5000+5001 split rejected pre-flight (matches the empirical matrix above)

@github-actions github-actions Bot added domain/ccm PR touches the ccm domain size/L Large or sensitive change across domains or core paths labels Apr 22, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Enforces a server-aligned global rune budget for comment replies: all type:"text" elements’ raw rune counts are summed and must not exceed 10000 runes. If the cumulative total exceeds the limit, validation fails with an error that reports the offending element index, running total, current element rune count, and a hint that splitting into more text elements does NOT avoid the total-cap.

Changes

Cohort / File(s) Summary
Comment Validation Logic
shortcuts/drive/drive_add_comment.go
Rework parseCommentReplyElements to accumulate utf8.RuneCountInString across reply_elements[].text and enforce a combined cap maxCommentTotalRunes = 10000. On overflow, return output.ErrWithHint (text_too_long) including the failing element index, running total, current element rune count, and an explicit hint that splitting into multiple text elements does NOT bypass the cap. Adds a documented constant explaining server behavior and opaque error mapping.
Validation Tests
shortcuts/drive/drive_add_comment_test.go
Add comprehensive unit tests for total-rune budgeting: boundary acceptance/rejection at 10000 vs 10001 runes, multi-element accumulation, correct offending-index reporting (including early-element overshoot), non-text element exclusion (e.g., mention_user), and hint content requirements.
Documentation
skills/lark-drive/references/lark-drive-add-comment.md
Document that reply_elements type=text uses raw rune counting with a combined limit ≤10000 (splitting into multiple text elements does not bypass limit), explain that exceeding triggers error [1069302] with an index pointing to the cumulative overflow, and note that non-text elements are excluded. Also update locate-doc operational notes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • fangshuyu-768

Poem

🐇 I tally runes beneath the moon,
Ten thousand hops, then stop too soon.
If letters spill and totals climb,
I point the hop that ran out of time.
Split your thoughts — but not these lines, dear friend of mine.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title 'feat(drive): pre-flight 10000-rune total cap for +add-comment reply_elements' accurately and concisely describes the main change: adding a client-side pre-flight validation for a 10000-rune total cap on reply_elements text.
Description check ✅ Passed The PR description is comprehensive and follows the template structure with Summary, Changes, Test Plan, and Related Issues sections. All required information is provided with detailed rationale, empirical basis, and implementation details.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/add-comment-length-precheck

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.60%. Comparing base (4d68e09) to head (0ef4749).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #605      +/-   ##
==========================================
+ Coverage   64.19%   64.60%   +0.41%     
==========================================
  Files         507      516       +9     
  Lines       44869    45791     +922     
==========================================
+ Hits        28803    29583     +780     
- Misses      13524    13624     +100     
- Partials     2542     2584      +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

🚀 PR Preview Install Guide

🧰 CLI update

npm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@0ef47490b50c886cf92df00188950d0705e7b77b

🧩 Skill update

npx skills add larksuite/cli#feat/add-comment-length-precheck -y -g

@herbertliu herbertliu force-pushed the feat/add-comment-length-precheck branch from 4740361 to 1216876 Compare April 22, 2026 08:18
Copy link
Copy Markdown
Collaborator

@fangshuyu-768 fangshuyu-768 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two concerns on the byte-limit pre-flight.

1. Byte check happens before HTML escaping (shortcuts/drive/drive_add_comment.go, lines ~579 and ~590). len(input.Text) is measured before escapeCommentText runs, but each < / > becomes 4 bytes (&lt; / &gt;) after escaping. A payload with even one < at exactly 300 bytes raw becomes 303 bytes after escaping, which the server may still reject with the same opaque [1069302]. Suggest measuring the escaped output:
```go
escaped := escapeCommentText(input.Text)
if byteLen := len(escaped); byteLen > maxCommentTextElementBytes {
// ...
}
```
and storing the already-escaped value to avoid double work later.

2. mention_user and link elements bypass the byte limit (lines ~569–586). If the server enforces a per-element limit regardless of type, those should be validated too. Worth confirming whether the limit is text-only, and either adding the same check or documenting the exemption.

Copy link
Copy Markdown
Collaborator

@fangshuyu-768 fangshuyu-768 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline note; expands on the earlier summary review.

Comment thread shortcuts/drive/drive_add_comment.go Outdated
@herbertliu herbertliu force-pushed the feat/add-comment-length-precheck branch from 1216876 to 8aa340e Compare April 28, 2026 08:46
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
shortcuts/drive/drive_add_comment_test.go (1)

352-360: Optional: add a symmetric escaped-byte overflow case for >

You already cover < escaped-length overflow; adding > would harden both escape branches equally.

Optional diff
 		{
 			// Regression: byte check must measure the post-escape length so
 			// '<' / '>' (which expand to '&lt;' / '&gt;', 4 bytes each)
 			// don't slip through the limit into the opaque server-side
 			// [1069302]. 99 ASCII chars + 75 '<' = raw 174 bytes, escaped
 			// 99 + 75*4 = 399 bytes (over 300).
 			name:    "raw under but escaped over byte limit is rejected",
 			input:   `[{"type":"text","text":"` + strings.Repeat("a", 99) + strings.Repeat("<", 75) + `"}]`,
 			wantErr: "(399 bytes)",
 		},
+		{
+			name:    "greater-than chars also overflow by escaped byte length",
+			input:   `[{"type":"text","text":"` + strings.Repeat("a", 99) + strings.Repeat(">", 75) + `"}]`,
+			wantErr: "(399 bytes)",
+		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/drive/drive_add_comment_test.go` around lines 352 - 360, Add a
symmetric test case to drive_add_comment_test.go that mirrors the existing "raw
under but escaped over byte limit is rejected" case but uses '>' characters
instead of '<' to exercise the other escape branch; duplicate the entry with
name like "raw under but escaped over byte limit is rejected (gt)" and set input
to strings.Repeat("a", 99) + strings.Repeat(">", 75) and wantErr "(399 bytes)"
so the test validates that '>'-to-'&gt;' escaping also triggers the byte-limit
rejection in the same way the existing case using '<' does.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@shortcuts/drive/drive_add_comment_test.go`:
- Around line 352-360: Add a symmetric test case to drive_add_comment_test.go
that mirrors the existing "raw under but escaped over byte limit is rejected"
case but uses '>' characters instead of '<' to exercise the other escape branch;
duplicate the entry with name like "raw under but escaped over byte limit is
rejected (gt)" and set input to strings.Repeat("a", 99) + strings.Repeat(">",
75) and wantErr "(399 bytes)" so the test validates that '>'-to-'&gt;' escaping
also triggers the byte-limit rejection in the same way the existing case using
'<' does.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d83495a6-acb4-44f1-b634-e1ef20a067fd

📥 Commits

Reviewing files that changed from the base of the PR and between 1216876 and 8aa340e.

📒 Files selected for processing (2)
  • shortcuts/drive/drive_add_comment.go
  • shortcuts/drive/drive_add_comment_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • shortcuts/drive/drive_add_comment.go

herbertliu and others added 2 commits April 30, 2026 18:05
The open-platform comment API returns an opaque [1069302] Invalid or
missing parameters whenever a single reply_elements[i] text exceeds
its implicit byte budget. The error does not name which element failed
or that length is the cause, so callers resort to binary-search
debugging.

Empirically: Chinese text up to ~80 chars (~240 bytes) lands; ~130
chars (~390 bytes) fails. Set the pre-flight limit to 300 bytes which
sits safely inside the known-good zone.

- parseCommentReplyElements now rejects any text element whose UTF-8
  byte length exceeds 300, with an ExitError naming the element index
  (#N, 1-based) and both the rune and byte counts, plus an ErrWithHint
  recommending the correct remediation (split into multiple text
  elements — the comment UI renders them as one contiguous comment).
- The previous 1000-rune check is removed: it was too lenient (a
  Chinese text under that cap would still fail server-side).
- skills/lark-drive/references/lark-drive-add-comment.md documents
  the per-element limit and the correct split pattern so agents
  avoid constructing oversized single elements upstream.

Addresses Case 12 in the 踩坑列表 doc.
`escapeCommentText` only expands `<` and `>` (each → 4 bytes via
`&lt;` / `&gt;`); `&` is intentionally left as-is. Both the over-limit
hint and the inline comment in `parseCommentReplyElements` previously
claimed `&` was also escaped, with a "4-5 bytes each" range that
implicitly assumed `&amp;` (5 bytes) — a string of 300 `&` chars
would actually fit in the budget, but a user reading the hint would
think otherwise and pre-emptively split it.

Code:
- Hint string ends with `Note: '<' and '>' are HTML-escaped and
  counted in their escaped form (4 bytes each).` (was: included `&`
  and "4-5 bytes")
- Inline comment above the budget check now matches:
  `escapeCommentText only expands '<' and '>' (each becomes 4 bytes:
  &lt; / &gt;); '&' is intentionally left as-is.`

Tests (regression):
- New `300 ampersands accepted (escapeCommentText leaves '&' as-is)`
  subtest pins that 300 `&` chars stay within budget. Without the fix
  this also passed (function was always correct), but the hint was
  lying — the test pins the budget contract loud and clear.
- New `TestParseCommentReplyElementsHintMatchesEscape` asserts the
  hint string itself: must mention `'<' and '>'` / `4 bytes`, must NOT
  mention `'&'` / `&amp;` / `4-5 bytes`. Catches a future drift if
  `escapeCommentText` is changed without updating the hint, or
  vice-versa.

The skill md (`skills/lark-drive/references/lark-drive-add-comment.md`)
already had the right wording (`每个 < 或 > 占 4 字节`), so it was the
in-Go strings that drifted; this commit aligns code with doc.
@fangshuyu-768 fangshuyu-768 force-pushed the feat/add-comment-length-precheck branch from 8aa340e to 8069a39 Compare April 30, 2026 10:09
…vior

The original PR set a 300-byte per-element pre-flight check, justified
by the empirical pattern "~80 Chinese chars succeeds, ~130 fails". A
fresh round of probing the live `/open-apis/drive/v1/files/{token}/
new_comments` endpoint with a real docx shows that pattern does not
reproduce, and the actual contract is very different:

  - 10000 ASCII / 10000 Chinese / 10000 '<' (escaped to 40000 bytes)
    in a single text element: all OK
  - 10001 of any of the above in a single text element: [1069302]
  - 5000 + 5000 across two text elements (total 10000): OK
  - 5000 + 5001 across two text elements (total 10001): [1069302]
  - 4000 + 4000 + 4000 across three (total 12000): [1069302]

Two consequences:

1. The cap is *10000 runes total across all reply_elements text*, not
   300 bytes per element. The old check rejected legitimate input
   anywhere from ~100 to 10000 Chinese chars (≈100x too aggressive).

2. The hint that recommended "split the content across multiple
   {\"type\":\"text\",\"text\":\"...\"} elements" was actively wrong —
   splitting doesn't bypass a total cap. A user told to split a
   10001-char message into 5000+5001 hits the same opaque [1069302].

This commit:

- Replaces `maxCommentTextElementBytes = 300` with
  `maxCommentTotalRunes = 10000`. The constant's doc comment records
  the probe matrix above so future maintainers know how it was
  derived.
- Switches the measurement from `len(escapeCommentText(input.Text))`
  to `utf8.RuneCountInString(input.Text)`. Server counts raw runes;
  byte width and post-escape form are irrelevant. The escape itself
  still happens — `<` and `>` still get rendered literally — but it
  no longer participates in the length check.
- Tracks a running `totalRunes` across the whole reply_elements array
  and bails at the first element that pushes the cumulative total
  over the 10000-rune budget, with index reporting that points at the
  offending element.
- Rewrites the over-cap hint to (a) name the actual 10000-rune budget,
  (b) explicitly say splitting does NOT help, (c) drop the wrong
  "comment UI still renders them as one contiguous comment" framing
  that implied splitting was a workaround.
- Adds a `TestParseCommentReplyElementsHintForbidsSplitAdvice`
  watchdog that fails if any future drift puts the discredited split
  advice back into the hint.

Tests: 11 cases on TestParseCommentReplyElementsTextLength covering
single-element boundary (ASCII / Chinese / angle brackets at exactly
10000 and at 10001), multi-element total cap (5000+5000 OK, 5000+5001
rejected with index pointing at element #2), early-element-overshoot
indexing (first element at 10001 reports index #1, not the trailing
element), and mention_user not double-counting toward the cap.

Skill md updated: removes the 300-byte / "split into multiple
elements" advice; documents the 10000-rune total cap with a note that
the schema currently advertises 1-1000 chars and is out of date,
plus a procedure for re-probing if the server-side limit ever moves.

Manual API verification: rebuilt binary and posted comments at
boundary lengths — all OK cases (100 / 5000 / 10000 chars, 5000+5000
split) accepted by server; over-cap cases (10001 / 10100 single, and
5000+5001 split) rejected by the new pre-flight before reaching the
network.
@fangshuyu-768 fangshuyu-768 force-pushed the feat/add-comment-length-precheck branch from 87fed6f to 0ef4749 Compare April 30, 2026 10:44
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@shortcuts/drive/drive_add_comment.go`:
- Around line 653-658: The error and hint returned from the validation branch
using output.ErrWithHint("text_too_long", ...) incorrectly state the cap applies
across "ALL reply_elements" even though the validator only sums elements with
type:"text"; update both the main message (the fmt.Sprintf that mentions "across
ALL reply_elements") and the hint (the fmt.Sprintf that tells the user about the
cap and splitting) to explicitly state the limit applies to text elements (e.g.,
"type=text" or "text elements") and reference the same maxCommentTotalRunes
variable and existing context variables (totalRunes, index, runes) so the
message remains accurate and consistent with the validator behavior in
drive_add_comment.go.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f0816e71-9e7e-49b2-95f5-92530af28db3

📥 Commits

Reviewing files that changed from the base of the PR and between 8069a39 and 87fed6f.

📒 Files selected for processing (3)
  • shortcuts/drive/drive_add_comment.go
  • shortcuts/drive/drive_add_comment_test.go
  • skills/lark-drive/references/lark-drive-add-comment.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • skills/lark-drive/references/lark-drive-add-comment.md

Comment thread shortcuts/drive/drive_add_comment.go
@fangshuyu-768 fangshuyu-768 changed the title feat(drive): pre-flight per-text-element byte limit for +add-comment feat(drive): pre-flight 10000-rune total cap for +add-comment reply_elements Apr 30, 2026
@fangshuyu-768 fangshuyu-768 merged commit 020aeb8 into main Apr 30, 2026
22 checks passed
@fangshuyu-768 fangshuyu-768 deleted the feat/add-comment-length-precheck branch April 30, 2026 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain/ccm PR touches the ccm domain size/L Large or sensitive change across domains or core paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants