-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Problem
apr pull hard-requires tokenizer.json and fails validation when it returns 404. Several HuggingFace models use alternative tokenizer formats (tokenizer.model SentencePiece, or only tokenizer_config.json with a slow tokenizer).
Affected models (from QA campaign):
| Model | Weights | Tokenizer Issue |
|---|---|---|
internlm/internlm2_5-7b-chat |
8 shards, all cached ✓ | tokenizer.json 404 |
teknium/OpenHermes-2.5-Mistral-7B |
2 shards, all cached ✓ | tokenizer.json 404 |
microsoft/Phi-3-small-8k-instruct |
4 shards, all cached ✓ | tokenizer.json 404 |
Error:
error: Validation failed: tokenizer.json is required for inference but download failed:
Network error: Download failed: .../tokenizer.json: status code 404
All three models download weights successfully — only the tokenizer validation step fails.
Expected Behavior
apr pull should support a tokenizer fallback chain:
- Try
tokenizer.json(fast tokenizer, preferred) - Fall back to
tokenizer.model(SentencePiece) - Fall back to
tokenizer_config.json(slow tokenizer, reconstruct at runtime)
If none are available, then fail with a clear error.
Workaround
These models can be used if the tokenizer file is manually downloaded or converted from tokenizer.model using the tokenizers Python library:
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("internlm/internlm2_5-7b-chat")
tok.save_pretrained("./") # writes tokenizer.jsonImpact
3 of 86 models in the QA campaign fail at the pull stage despite weights being fully available. These are otherwise functional models (InternLM 2.5, OpenHermes 2.5, Phi-3-small).
References
- Discovered during model QA campaign (PMAT-034)