Skip to content

tokcount: replace bytes/3.4 stub with tiktoken-rs BPE (ILO-413)#716

Merged
danieljohnmorris merged 1 commit into
mainfrom
feature/tokcount-real-bpe
May 22, 2026
Merged

tokcount: replace bytes/3.4 stub with tiktoken-rs BPE (ILO-413)#716
danieljohnmorris merged 1 commit into
mainfrom
feature/tokcount-real-bpe

Conversation

@danieljohnmorris
Copy link
Copy Markdown
Collaborator

Summary

  • Adds tiktoken-rs 0.11 as a native-only dependency, gated via [target.'cfg(not(target_arch = "wasm32"))'.dependencies] so the WASM build is unaffected
  • Replaces the bytes/3.4 approximation in tokcount_impl with cl100k_base()?.encode_with_special_tokens(text).len() for exact BPE counts on native targets
  • wasm32 retains the bytes/3.4 fallback — tiktoken-rs uses fancy-regex which is not wasm32-compatible
  • Updates examples/tokcount-basic.ilo expected outputs ("hello world" changes from 42, matching real BPE)
  • Re-measures and tightens all per-module token caps in scripts/check-skill-tokens.ilo using the real tokeniser (e.g. ilo-language 1475→1720, measured at 1654)

WASM caveat

tiktoken-rs → fancy-regex → regex crate is not wasm32-compatible. The #[cfg(target_arch = "wasm32")] fallback keeps the bytes/3.4 approximation for the npm/WASM package. This is documented in builtins.rs and verify.rs.

Test plan

  • cargo test — 3342 + 397 + all other suites pass, 0 failures
  • ilo examples/tokcount-basic.ilo hello2
  • ilo examples/tokcount-basic.ilo empty0
  • ilo examples/tokcount-basic.ilo single1
  • ilo run scripts/check-skill-tokens.ilo → all modules within cap, total 11742

Closes ILO-413. Follow-up to ILO-47 (#705).

🤖 Generated with Claude Code

@danieljohnmorris danieljohnmorris added the mini Created by mini PC autonomous workflow label May 22, 2026
@danieljohnmorris
Copy link
Copy Markdown
Collaborator Author

needs manual rebase (conflicts in: src/builtins.rs)

@danieljohnmorris danieljohnmorris force-pushed the feature/tokcount-real-bpe branch from 13de76d to b347fb8 Compare May 22, 2026 09:10
@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

⚠️ JUnit XML file not found

The CLI was unable to find any JUnit XML files to upload.
For more help, visit our troubleshooting guide.

@danieljohnmorris danieljohnmorris force-pushed the feature/tokcount-real-bpe branch from 2a66ddd to 7a3f442 Compare May 22, 2026 10:33
@danieljohnmorris
Copy link
Copy Markdown
Collaborator Author

needs manual rebase — touches src/builtins.rs

Adds a `tokcount s > n` builtin (bytes/3.4 approximation of cl100k_base
token count) so skill-file budget checks can be written in ilo itself,
removing the last Python file from the build chain.

- New Builtin::Tokcount in builtins.rs, verify.rs, and vm.rs (tree-bridge
  eligible, appended to ALL to preserve on-wire tags)
- tokcount_impl in interpreter/mod.rs: ceil(bytes / 3.4)
- scripts/check-skill-tokens.ilo: ilo port of the deleted Python script,
  matching output format; caps set for the bytes/3.4 approximation
- scripts/check-skill-tokens.py: deleted
- .github/workflows/rust.yml: replace tiktoken/python step with
  `cargo run -- run scripts/check-skill-tokens.ilo`
- examples/tokcount-basic.ilo: cross-engine regression test
- SPEC.md / ai.txt / skills/ilo/ilo-builtins-text.md: doc touch-points

Deferred (ILO-47 follow-up): replace bytes/3.4 stub with tiktoken-rs BPE
once crate WASM and licence questions are resolved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@danieljohnmorris danieljohnmorris force-pushed the feature/tokcount-real-bpe branch from 7a3f442 to 7f03a85 Compare May 22, 2026 17:19
@danieljohnmorris
Copy link
Copy Markdown
Collaborator Author

mini pc is reviewing this

@danieljohnmorris danieljohnmorris merged commit 5309d9e into main May 22, 2026
7 of 10 checks passed
@danieljohnmorris danieljohnmorris deleted the feature/tokcount-real-bpe branch May 22, 2026 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mini Created by mini PC autonomous workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant