Skip to content

test: add 36 edge case tests for all tokenizer variants#16

Merged
AmitMY merged 1 commit into
mainfrom
test/edge-cases
Apr 8, 2026
Merged

test: add 36 edge case tests for all tokenizer variants#16
AmitMY merged 1 commit into
mainfrom
test/edge-cases

Conversation

@AmitMY
Copy link
Copy Markdown
Contributor

@AmitMY AmitMY commented Apr 8, 2026

Summary

  • 36 parametrized tests across all 4 tokenizer variants
  • Empty text, single char, repeated chars, whitespace, emoji, mixed scripts, newlines

Stacked on #15.

What improved

  • All edge cases verified for all tokenizer variants
  • Test count grows significantly (+36)

Test plan

  • All 36 new tests pass
  • ruff check . passes

🤖 Generated with Claude Code

@AmitMY AmitMY force-pushed the test/edge-cases branch 14 times, most recently from ad8048b to 7f0df82 Compare April 8, 2026 17:24
- Test empty text, single char, all same chars, whitespace-only,
  multiple empty texts for all 4 tokenizer variants
- Test emoji, mixed scripts, newlines for all variants
- Parametrized across BPE, BNE, Boundless BPE, Super BPE

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@AmitMY AmitMY force-pushed the test/edge-cases branch from 7f0df82 to f3b55cb Compare April 8, 2026 17:25
@AmitMY AmitMY merged commit b76fb88 into main Apr 8, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant