Skip to content

apr pull: handle models without tokenizer.json (sentencepiece/tokenizer.model) #356

@noahgift

Description

@noahgift

Problem

apr pull hard-requires tokenizer.json and fails validation when it returns 404. Several HuggingFace models use alternative tokenizer formats (tokenizer.model SentencePiece, or only tokenizer_config.json with a slow tokenizer).

Affected models (from QA campaign):

Model Weights Tokenizer Issue
internlm/internlm2_5-7b-chat 8 shards, all cached ✓ tokenizer.json 404
teknium/OpenHermes-2.5-Mistral-7B 2 shards, all cached ✓ tokenizer.json 404
microsoft/Phi-3-small-8k-instruct 4 shards, all cached ✓ tokenizer.json 404

Error:

error: Validation failed: tokenizer.json is required for inference but download failed:
  Network error: Download failed: .../tokenizer.json: status code 404

All three models download weights successfully — only the tokenizer validation step fails.

Expected Behavior

apr pull should support a tokenizer fallback chain:

  1. Try tokenizer.json (fast tokenizer, preferred)
  2. Fall back to tokenizer.model (SentencePiece)
  3. Fall back to tokenizer_config.json (slow tokenizer, reconstruct at runtime)

If none are available, then fail with a clear error.

Workaround

These models can be used if the tokenizer file is manually downloaded or converted from tokenizer.model using the tokenizers Python library:

from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("internlm/internlm2_5-7b-chat")
tok.save_pretrained("./")  # writes tokenizer.json

Impact

3 of 86 models in the QA campaign fail at the pull stage despite weights being fully available. These are otherwise functional models (InternLM 2.5, OpenHermes 2.5, Phi-3-small).

References

  • Discovered during model QA campaign (PMAT-034)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High prioritybugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions