An AI/LLM workbench for running, configuring, and fine-tuning open-source language models on consumer GPUs.
This repository collects everything needed to go from a stock GPU to a productive local AI setup: model selection guidance, Ollama configuration, quantization strategies, and a full training plan for distilling domain-specialized models.
The Ollama Guide covers everything needed to run models locally:
- Model selection principles (parameter sizing by VRAM, context minimums, tool calling requirements)
- Quantization formats (Q4_K_M, Q5_K_M) and KV cache quantization (q8_0, q4_0)
- Environment variables and Windows setup
- Context size configuration and limits
- GPU offload verification
- API usage (REST and Python)
| Guide | VRAM | Example GPUs | Dense Model Range |
|---|---|---|---|
| 24GB GPU Guide | 24 GB | RTX 4090, RTX 3090, RTX A5000 | 14-25B params |
| 32GB GPU Guide | 32 GB | RTX 5090 | 20-32B params |
| Guide | Description |
|---|---|
| Creating Ollama Models from Hugging Face | Download GGUFs from Hugging Face and create custom Ollama models with your own configuration |
The Training directory contains a complete plan for multi-teacher distillation into a domain-specialized local model.
Goal: Distill coding capabilities from three cloud-scale teacher models into Qwen3.5-27B (dense) for local inference on 32GB VRAM with 204K token context. The resulting model is specialized for full-stack web development with F#, Svelte, TypeScript, .NET, Docker, and Kubernetes.
Teacher models: Kimi K2.5, MiniMax M2.7, DeepSeek V3.2 -- each assigned to the domains where they are strongest.
| Document | Contents |
|---|---|
| 00-overview.md | Project overview, multi-teacher strategy, student model specs |
| 01-teacher-models.md | Teacher model specs, strengths/weaknesses, comparison matrix |
| 02-domain-specialization.md | Training data composition, F# libraries, training topics |
| 03-distillation-pipeline.md | Data generation, F# compiler verification, doc sources, training steps |
| 04-training-config.md | LoRA config, cloud GPU providers, cost estimates |
| 05-resolved-questions.md | Resolved and open questions |
models/
├── README.md # This file -- repository overview
├── OLLAMA-GUIDE.md # Ollama setup, quantization, KV cache, API reference
├── 24GB-GPU.md # Models and config for 24GB GPUs
├── 32GB-GPU.md # Models and config for 32GB GPUs
├── creating-ollama-models.md # Guide: custom Ollama models from HF
├── Training/
│ ├── 00-overview.md # Distillation project overview
│ ├── 01-teacher-models.md # Teacher model analysis
│ ├── 02-domain-specialization.md# Training data and domains
│ ├── 03-distillation-pipeline.md# Pipeline and verification
│ ├── 04-training-config.md # LoRA, hardware, costs
│ └── 05-resolved-questions.md # Q&A log
└── LICENSE