fix(doc): align word statistics compound tokens#1706
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughModifies the word/character counting script to recognize ASCII compound tokens (URLs, alphanumeric-separator patterns) and Han/Han slash separators as single counted units rather than character-by-character, lowers the symbol-run word threshold from 2 to 1, and updates related documentation describing counting rules. ChangesCompound Token Counting Logic
Estimated code review effort: 3 (Moderate) | ~25 minutes Sequence Diagram(s)sequenceDiagram
participant Caller
participant CounterWrite as Counter.write
participant TokenMatcher as _match_ascii_compound_token
participant SeparatorHandler as _write_visible_ascii_separator
participant CharWriter as _write_char
Caller->>CounterWrite: write(text)
loop for each index in text
CounterWrite->>TokenMatcher: try match ASCII compound token
alt token matched
TokenMatcher->>CounterWrite: consume token, classify chars
else Han slash separator
CounterWrite->>SeparatorHandler: detect Han/Han "/" separator
SeparatorHandler->>CounterWrite: end unit, count separator as word
else
CounterWrite->>CharWriter: _write_char(char)
end
end
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1706 +/- ##
=======================================
Coverage 74.51% 74.51%
=======================================
Files 850 850
Lines 87070 87070
=======================================
Hits 64879 64879
Misses 17223 17223
Partials 4968 4968 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
🚀 PR Preview Install Guide🧰 CLI updatenpm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@bfb2e7feb30d9776a49ec685becc52311f5c9ece🧩 Skill updatenpx skills add larksuite/cli#feat/doc-word-stat-compound-token-counting -y -g |
Summary
Verification
python3 -m py_compile skills/lark-doc/scripts/doc_word_stat.py2172 / 4219, matching GUI2172 / 4219git diff --checkSummary by CodeRabbit