apr pull: handle models without tokenizer.json (sentencepiece/tokenizer.model)

## Problem

`apr pull` hard-requires `tokenizer.json` and fails validation when it returns 404. Several HuggingFace models use alternative tokenizer formats (`tokenizer.model` SentencePiece, or only `tokenizer_config.json` with a slow tokenizer).

**Affected models (from QA campaign):**

| Model | Weights | Tokenizer Issue |
|-------|---------|-----------------|
| `internlm/internlm2_5-7b-chat` | 8 shards, all cached ✓ | `tokenizer.json` 404 |
| `teknium/OpenHermes-2.5-Mistral-7B` | 2 shards, all cached ✓ | `tokenizer.json` 404 |
| `microsoft/Phi-3-small-8k-instruct` | 4 shards, all cached ✓ | `tokenizer.json` 404 |

**Error:**
```
error: Validation failed: tokenizer.json is required for inference but download failed:
  Network error: Download failed: .../tokenizer.json: status code 404
```

All three models download weights successfully — only the tokenizer validation step fails.

## Expected Behavior

`apr pull` should support a tokenizer fallback chain:
1. Try `tokenizer.json` (fast tokenizer, preferred)
2. Fall back to `tokenizer.model` (SentencePiece)
3. Fall back to `tokenizer_config.json` (slow tokenizer, reconstruct at runtime)

If none are available, then fail with a clear error.

## Workaround

These models can be used if the tokenizer file is manually downloaded or converted from `tokenizer.model` using the `tokenizers` Python library:
```python
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("internlm/internlm2_5-7b-chat")
tok.save_pretrained("./")  # writes tokenizer.json
```

## Impact

3 of 86 models in the QA campaign fail at the pull stage despite weights being fully available. These are otherwise functional models (InternLM 2.5, OpenHermes 2.5, Phi-3-small).

## References

- Discovered during model QA campaign (PMAT-034)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apr pull: handle models without tokenizer.json (sentencepiece/tokenizer.model) #356

Problem

Expected Behavior

Workaround

Impact

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model	Weights	Tokenizer Issue
`internlm/internlm2_5-7b-chat`	8 shards, all cached ✓	`tokenizer.json` 404
`teknium/OpenHermes-2.5-Mistral-7B`	2 shards, all cached ✓	`tokenizer.json` 404
`microsoft/Phi-3-small-8k-instruct`	4 shards, all cached ✓	`tokenizer.json` 404

apr pull: handle models without tokenizer.json (sentencepiece/tokenizer.model) #356

Description

Problem

Expected Behavior

Workaround

Impact

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions