CLI tool that fetches and displays HuggingFace model information at a glance.
Given a HuggingFace repo, model-scout shows you: architecture, context length, recommended system prompt, sampling parameters, available GGUF quantizations, special flags, and license info.
pip install model-scout
# Or install from source
pip install -e .# Basic model lookup
model-scout Qwen/Qwen3-8B
# List GGUF files with sizes and quantization types
model-scout bartowski/Qwen3-8B-GGUF --files
# Gated models (requires HuggingFace token)
HF_TOKEN=hf_xxx model-scout meta-llama/Llama-3-8B+-- Qwen/Qwen3-8B ----------------------------------------+
| Type: instruct |
| Architecture: qwen3 |
| Parameters: 8B |
| Context Length: 32,768 |
| License: apache-2.0 |
| |
| System Prompt: "You are Qwen, created by Alibaba..." |
| |
| Special Flags: (none) |
| GGUF Variants: See Qwen/Qwen3-8B-GGUF |
+----------------------------------------------------------+
- Type: instruct, coding, embedding, reasoning, ocr, vision, reranker
- Architecture: model architecture from config.json
- Parameters: model size
- Context Length: from GGUF metadata, config.json, or base model config
- License: from model card metadata
- System Prompt: extracted from tokenizer_config.json, model card sections, or code examples
- Sampling Parameters: temperature, top_p, etc. if documented
- Special Flags: llama-server flags needed (--embedding, --jinja, etc.)
- GGUF Variants: links to GGUF repos or lists files with --files
Set HF_TOKEN environment variable to access gated models or avoid rate limits:
export HF_TOKEN=hf_your_token_here- Python 3.9+
requests
MIT