tokcount: replace bytes/3.4 stub with tiktoken-rs BPE (ILO-413)#716
Merged
Conversation
Collaborator
Author
|
needs manual rebase (conflicts in: src/builtins.rs) |
13de76d to
b347fb8
Compare
|
2a66ddd to
7a3f442
Compare
Collaborator
Author
|
needs manual rebase — touches src/builtins.rs |
Adds a `tokcount s > n` builtin (bytes/3.4 approximation of cl100k_base token count) so skill-file budget checks can be written in ilo itself, removing the last Python file from the build chain. - New Builtin::Tokcount in builtins.rs, verify.rs, and vm.rs (tree-bridge eligible, appended to ALL to preserve on-wire tags) - tokcount_impl in interpreter/mod.rs: ceil(bytes / 3.4) - scripts/check-skill-tokens.ilo: ilo port of the deleted Python script, matching output format; caps set for the bytes/3.4 approximation - scripts/check-skill-tokens.py: deleted - .github/workflows/rust.yml: replace tiktoken/python step with `cargo run -- run scripts/check-skill-tokens.ilo` - examples/tokcount-basic.ilo: cross-engine regression test - SPEC.md / ai.txt / skills/ilo/ilo-builtins-text.md: doc touch-points Deferred (ILO-47 follow-up): replace bytes/3.4 stub with tiktoken-rs BPE once crate WASM and licence questions are resolved. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7a3f442 to
7f03a85
Compare
Collaborator
Author
|
mini pc is reviewing this |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tiktoken-rs 0.11as a native-only dependency, gated via[target.'cfg(not(target_arch = "wasm32"))'.dependencies]so the WASM build is unaffectedbytes/3.4approximation intokcount_implwithcl100k_base()?.encode_with_special_tokens(text).len()for exact BPE counts on native targetsexamples/tokcount-basic.iloexpected outputs ("hello world"changes from4→2, matching real BPE)scripts/check-skill-tokens.ilousing the real tokeniser (e.g.ilo-language1475→1720, measured at 1654)WASM caveat
tiktoken-rs → fancy-regex → regex crate is not wasm32-compatible. The
#[cfg(target_arch = "wasm32")]fallback keeps the bytes/3.4 approximation for the npm/WASM package. This is documented in builtins.rs and verify.rs.Test plan
cargo test— 3342 + 397 + all other suites pass, 0 failuresilo examples/tokcount-basic.ilo hello→2ilo examples/tokcount-basic.ilo empty→0ilo examples/tokcount-basic.ilo single→1ilo run scripts/check-skill-tokens.ilo→ all modules within cap, total 11742Closes ILO-413. Follow-up to ILO-47 (#705).
🤖 Generated with Claude Code