fix(doc): align word statistics compound tokens by fangshuyu-768 · Pull Request #1706 · larksuite/cli

fangshuyu-768 · 2026-07-01T11:21:41Z

Summary

Treat URLs and path-like/code ASCII compounds as one semantic word while preserving per-character counting.
Count single standalone ASCII symbol runs and visible Han-adjacent slash separators consistently with Lark GUI behavior.
Update the lark-doc word statistics reference to describe the script-aligned counting behavior.

Verification

python3 -m py_compile skills/lark-doc/scripts/doc_word_stat.py
Target cached XML JSON fixture: script output 2172 / 4219, matching GUI 2172 / 4219
git diff --check

Summary by CodeRabbit

Bug Fixes
- Improved word and character counting for document statistics, including URL-like text, mixed ASCII tokens, and slash-separated Chinese text.
- Refined symbol counting so single visible symbols are handled more consistently.
Documentation
- Clarified the rules for reading word and character statistics, with more precise counting scope and exceptions.

coderabbitai · 2026-07-01T11:21:57Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5f99b502-8868-4436-ba54-a6364b185cd5

📥 Commits

Reviewing files that changed from the base of the PR and between a6797ac and bfb2e7f.

📒 Files selected for processing (2)

skills/lark-doc/references/lark-doc-word-stat.md
skills/lark-doc/scripts/doc_word_stat.py

📝 Walkthrough

Walkthrough

Modifies the word/character counting script to recognize ASCII compound tokens (URLs, alphanumeric-separator patterns) and Han/Han slash separators as single counted units rather than character-by-character, lowers the symbol-run word threshold from 2 to 1, and updates related documentation describing counting rules.

Changes

Compound Token Counting Logic

Layer / File(s)	Summary
Token detection constants `skills/lark-doc/scripts/doc_word_stat.py`	New compiled regex constants `URL_TOKEN_RE` and `ASCII_COMPOUND_TOKEN_RE` added to detect URL-like and compound alphanumeric-separator tokens.
Index-based write loop and token handling `skills/lark-doc/scripts/doc_word_stat.py`	`Counter.write()` reworked to an index-based loop attempting compound token and Han slash separator matches before falling back to `_write_char`; new helper methods `_write_visible_ascii_separator`, `_write_ascii_compound_token`, `_match_ascii_compound_token` classify matched characters and update word/char stats; `_end_symbol_run` threshold for counting a symbol run as a word lowered from length >= 2 to >= 1.
Counting rule documentation `skills/lark-doc/references/lark-doc-word-stat.md`	Documentation for `word_count` and `char_count` updated to describe refined counting rules, including exceptions for English punctuation attached to words and visible symbol handling.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant CounterWrite as Counter.write
  participant TokenMatcher as _match_ascii_compound_token
  participant SeparatorHandler as _write_visible_ascii_separator
  participant CharWriter as _write_char

  Caller->>CounterWrite: write(text)
  loop for each index in text
    CounterWrite->>TokenMatcher: try match ASCII compound token
    alt token matched
      TokenMatcher->>CounterWrite: consume token, classify chars
    else Han slash separator
      CounterWrite->>SeparatorHandler: detect Han/Han "/" separator
      SeparatorHandler->>CounterWrite: end unit, count separator as word
    else
      CounterWrite->>CharWriter: _write_char(char)
    end
  end

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR has Summary and verification, but it omits the required Changes section and Related Issues, and the Test Plan is not in the requested format.	Add a Changes section, rewrite Test Plan using the template’s checkbox format, and include a Related Issues entry.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: updating doc word statistics for compound tokens.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/doc-word-stat-compound-token-counting

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

codecov · 2026-07-01T11:26:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.51%. Comparing base (a6797ac) to head (bfb2e7f).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1706   +/-   ##
=======================================
  Coverage   74.51%   74.51%           
=======================================
  Files         850      850           
  Lines       87070    87070           
=======================================
  Hits        64879    64879           
  Misses      17223    17223           
  Partials     4968     4968

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-07-01T11:27:37Z

🚀 PR Preview Install Guide

🧰 CLI update

npm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@bfb2e7feb30d9776a49ec685becc52311f5c9ece

🧩 Skill update

npx skills add larksuite/cli#feat/doc-word-stat-compound-token-counting -y -g

fix(doc): align word statistics compound tokens

bfb2e7f

github-actions Bot added domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact labels Jul 1, 2026

SunPeiYang996 approved these changes Jul 2, 2026

View reviewed changes

fangshuyu-768 merged commit 3788405 into main Jul 2, 2026
38 checks passed

fangshuyu-768 deleted the feat/doc-word-stat-compound-token-counting branch July 2, 2026 03:43

liangshuo-1 mentioned this pull request Jul 2, 2026

chore: release v1.0.64 #1725

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(doc): align word statistics compound tokens#1706

fix(doc): align word statistics compound tokens#1706
fangshuyu-768 merged 1 commit into
mainfrom
feat/doc-word-stat-compound-token-counting

fangshuyu-768 commented Jul 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

fangshuyu-768 commented Jul 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Jul 1, 2026

Codecov Report

Uh oh!

github-actions Bot commented Jul 1, 2026

🚀 PR Preview Install Guide

🧰 CLI update

🧩 Skill update

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fangshuyu-768 commented Jul 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading