fix(taxonomy): use utf8.DecodeRuneInString in GroupByLetter to handle non-ASCII names#96
Conversation
…r Function/Class/Type Slug collision: when a base slug (e.g. "fn-handler-go-run") was already taken and a "-2" suffix was appended, the new slug was never recorded in usedSlugs. A later node naturally producing "fn-handler-go-run-2" would pass the uniqueness check and get the same slug, causing one output file to silently overwrite the other. Fixed by marking the resolved slug used. line_count: Function, Class, and Type frontmatter computed endLine - startLine + 1 without guarding against startLine == 0 (the sentinel for "not provided"). This produced line_count = endLine + 1 instead of endLine. File nodes already defaulted startLine to 1; the three symbol node types now do the same. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… non-ASCII names rune(entry.Name[0]) reads only the first byte of the entry name and misinterprets it as a Unicode code point. For any name starting with a multi-byte UTF-8 character (é, ñ, ü, etc.) the first byte (e.g. 0xC3) is decoded as a different Latin-1 character (Ã), so those entries land in the wrong letter group in the A–Z navigation. Replace with utf8.DecodeRuneInString to correctly extract the first rune. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 6 minutes and 51 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR fixes three distinct bugs in documentation generation: preventing slug collisions when multiple nodes resolve to the same slug path, correcting line count calculations when documentation source lines lack start information, and improving UTF-8 character support in taxonomy grouping. Test coverage is added for all changes. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Remove extra spaces used to manually align inline comments; gofmt uses tab-based alignment, not space-based, so the extra spaces failed linting. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/archdocs/graph2md/graph2md_test.go`:
- Around line 124-127: The test currently calls os.ReadDir(outDir) and discards
the error; update the test in graph2md_test.go to capture and check the error
returned by os.ReadDir (e.g., entries, err := os.ReadDir(outDir)) and call
t.Fatalf or t.Fatalf-like assertion if err != nil, including the err text; keep
the subsequent length check (len(entries) != 1) but only after ensuring err is
nil so failures show the real ReadDir error rather than hiding it.
In `@internal/archdocs/graph2md/graph2md.go`:
- Around line 373-377: The current disambiguation only appends "-{n+1}" once and
can collide if that suffixed slug already exists; update the logic around the
usedSlugs map and slug variable so that when a clash is detected you loop:
compute a candidate = fmt.Sprintf("%s-%d", baseSlug, next), increment next until
candidate is not present in usedSlugs, then register that candidate in
usedSlugs; ensure you preserve/increment the counter for the original base key
so future collisions continue from the correct next value. Reference the
usedSlugs map and the slug/baseSlug variables in the collision branch and
replace the single-assignment change with a while/for loop that finds a free
slug before assigning usedSlugs[candidate]=1.
In `@internal/archdocs/pssg/taxonomy/taxonomy_test.go`:
- Around line 43-46: Run goimports (or gofmt -w followed by goimports) on the
test file to fix formatting/import ordering so golangci-lint passes;
specifically reformat the block containing the struct literals with Name/Slug
(the entries "Étoile", "Ñoño", "Über", "English") so spacing/indentation and
import grouping conform to goimports, then stage the updated file and push the
change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7bc30a93-b826-4596-b798-bf5c5ac85690
📒 Files selected for processing (4)
internal/archdocs/graph2md/graph2md.gointernal/archdocs/graph2md/graph2md_test.gointernal/archdocs/pssg/taxonomy/taxonomy.gointernal/archdocs/pssg/taxonomy/taxonomy_test.go
| entries, _ := os.ReadDir(outDir) | ||
| if len(entries) != 1 { | ||
| t.Fatalf("expected 1 output file, got %d", len(entries)) | ||
| } |
There was a problem hiding this comment.
Don’t ignore ReadDir errors in the test.
At Line 124, swallowing the error can hide the real failure and make the test flaky to debug.
Proposed fix
- entries, _ := os.ReadDir(outDir)
+ entries, err := os.ReadDir(outDir)
+ if err != nil {
+ t.Fatalf("ReadDir: %v", err)
+ }
if len(entries) != 1 {
t.Fatalf("expected 1 output file, got %d", len(entries))
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| entries, _ := os.ReadDir(outDir) | |
| if len(entries) != 1 { | |
| t.Fatalf("expected 1 output file, got %d", len(entries)) | |
| } | |
| entries, err := os.ReadDir(outDir) | |
| if err != nil { | |
| t.Fatalf("ReadDir: %v", err) | |
| } | |
| if len(entries) != 1 { | |
| t.Fatalf("expected 1 output file, got %d", len(entries)) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/archdocs/graph2md/graph2md_test.go` around lines 124 - 127, The test
currently calls os.ReadDir(outDir) and discards the error; update the test in
graph2md_test.go to capture and check the error returned by os.ReadDir (e.g.,
entries, err := os.ReadDir(outDir)) and call t.Fatalf or t.Fatalf-like assertion
if err != nil, including the err text; keep the subsequent length check
(len(entries) != 1) but only after ensuring err is nil so failures show the real
ReadDir error rather than hiding it.
| if n, ok := usedSlugs[slug]; ok { | ||
| usedSlugs[slug] = n + 1 | ||
| slug = fmt.Sprintf("%s-%d", slug, n+1) | ||
| usedSlugs[slug] = 1 // register the disambiguated slug so a later node can't claim it | ||
| } else { |
There was a problem hiding this comment.
Slug disambiguation can still collide with pre-existing suffixed slugs.
Nice improvement, but there’s still a correctness gap at Line 373–377: if fn-x-3 already exists naturally, and a later collision on fn-x generates fn-x-3, you’ll still get duplicate output paths.
Proposed fix (keep incrementing until an unused slug is found)
- // Handle slug collisions
- if n, ok := usedSlugs[slug]; ok {
- usedSlugs[slug] = n + 1
- slug = fmt.Sprintf("%s-%d", slug, n+1)
- usedSlugs[slug] = 1 // register the disambiguated slug so a later node can't claim it
- } else {
- usedSlugs[slug] = 1
- }
+ // Handle slug collisions
+ baseSlug := slug
+ if _, exists := usedSlugs[baseSlug]; !exists {
+ usedSlugs[baseSlug] = 1
+ } else {
+ next := usedSlugs[baseSlug] + 1
+ for {
+ candidate := fmt.Sprintf("%s-%d", baseSlug, next)
+ if _, taken := usedSlugs[candidate]; !taken {
+ slug = candidate
+ usedSlugs[baseSlug] = next
+ usedSlugs[slug] = 1
+ break
+ }
+ next++
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/archdocs/graph2md/graph2md.go` around lines 373 - 377, The current
disambiguation only appends "-{n+1}" once and can collide if that suffixed slug
already exists; update the logic around the usedSlugs map and slug variable so
that when a clash is detected you loop: compute a candidate =
fmt.Sprintf("%s-%d", baseSlug, next), increment next until candidate is not
present in usedSlugs, then register that candidate in usedSlugs; ensure you
preserve/increment the counter for the original base key so future collisions
continue from the correct next value. Reference the usedSlugs map and the
slug/baseSlug variables in the collision branch and replace the
single-assignment change with a while/for loop that finds a free slug before
assigning usedSlugs[candidate]=1.
Summary
GroupByLetterwas callingrune(entry.Name[0])to get the first character for alphabetical grouping.entry.Name[0]is abyte, not a rune — for any name that starts with a multi-byte UTF-8 character (é, ñ, ü, ā, etc.) the first byte is misread as a different Latin-1 character. For example:All three entries would be silently grouped under 'Ã' instead of their correct letters, corrupting A–Z navigation pages.
Fix: replace
rune(entry.Name[0])withutf8.DecodeRuneInString(entry.Name)which correctly decodes the first Unicode code point regardless of byte length.Test plan
TestGroupByLetterNonASCII: entries starting with É, Ñ, Ü are grouped under their correct Unicode letters, not under 'Ã'.TestGroupByLetterASCII: ASCII entries still group correctly.TestGroupByLetterEmpty: empty names are skipped.TestTopEntries:TopEntriesreturns correct order without mutating the original.go test ./...— all existing tests pass.🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests