[Feat] Allow tokenizer_name override#321
Conversation
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Code Review
This pull request introduces the ability to override the default tokenizer by adding a tokenizer_name field to the ModelParams configuration. The benchmark execution logic now checks for this override, and corresponding updates have been made to the configuration templates and unit tests. Feedback suggests adding a CLI alias for the new field to maintain consistency with other parameters and improve command-line usability.
There was a problem hiding this comment.
Pull request overview
Adds support for configuring a tokenizer identifier independently of the configured model name, enabling token-metrics/ISL token counting to use a HuggingFace tokenizer repo/path even when the endpoint’s model naming differs (e.g., OpenRouter/SambaNova naming conventions).
Changes:
- Extend
ModelParamsschema with an optionaltokenizer_nameoverride. - Update benchmark setup to prefer
tokenizer_namewhen probing for tokenizer availability. - Add unit coverage for the new schema field and update full YAML templates to include it.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
tests/unit/config/test_schema.py |
Verifies tokenizer_name default and explicit override behavior in ModelParams. |
src/inference_endpoint/config/templates/online_template_full.yaml |
Documents the new model_params.tokenizer_name config option in the online full template. |
src/inference_endpoint/config/templates/offline_template_full.yaml |
Documents the new model_params.tokenizer_name config option in the offline full template. |
src/inference_endpoint/config/templates/concurrency_template_full.yaml |
Documents the new model_params.tokenizer_name config option in the concurrency full template. |
src/inference_endpoint/config/schema.py |
Adds tokenizer_name to ModelParams. |
src/inference_endpoint/commands/benchmark/execute.py |
Uses the override (when set) as the source for tokenizer existence probing and subsequent token-metrics enablement. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Review Council — Multi-AI Code ReviewReviewed by: Claude | Depth: quick | Codex: unavailable (branch not on remote) No new issues found. The one concern identified (silent discard of an invalid user-supplied |
|
@tianmu-li can you update this so that if a tokenizer override is incorrect, it is a hard failure. Silently using the default tokenizer can be challenging if the assumption is that a different tokenizer was expected to be used. |
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
Added. Also added regression testing. |
arekay-nv
left a comment
There was a problem hiding this comment.
Review Council — follow-up
Reviewing resolved status of prior comments. 2 items remain unresolved (posted inline). 1 item (smaller templates missing tokenizer_name) cannot be posted inline since those files aren't in this diff — see the existing comment on online_template_full.yaml for context.
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
What does this PR do?
Allow specifying a tokenizer name that's different from model name in the yaml file.
Example: OpenRouter keeps all model names lower case, so Qwen/Qwen3.6-35B-A3B becomes qwen/qwen3.6-35b. SambaNova strips org name, so it becomes Qwen3.6-35B-A3B. tokenizer name override allows using those endpoints.
Type of change
Related issues
Testing
Checklist