Models

An AI/LLM workbench for running, configuring, and fine-tuning open-source language models on consumer GPUs.

This repository collects everything needed to go from a stock GPU to a productive local AI setup: model selection guidance, Ollama configuration, quantization strategies, and a full training plan for distilling domain-specialized models.

Local Inference with Ollama

The Ollama Guide covers everything needed to run models locally:

Model selection principles (parameter sizing by VRAM, context minimums, tool calling requirements)
Quantization formats (Q4_K_M, Q5_K_M) and KV cache quantization (q8_0, q4_0)
Environment variables and Windows setup
Context size configuration and limits
GPU offload verification
API usage (REST and Python)

GPU-Specific Guides

Guide	VRAM	Example GPUs	Dense Model Range
24GB GPU Guide	24 GB	RTX 4090, RTX 3090, RTX A5000	14-25B params
32GB GPU Guide	32 GB	RTX 5090	20-32B params

Custom Models

Guide	Description
Creating Ollama Models from Hugging Face	Download GGUFs from Hugging Face and create custom Ollama models with your own configuration

Training & Distillation

The Training directory contains a complete plan for multi-teacher distillation into a domain-specialized local model.

Goal: Distill coding capabilities from three cloud-scale teacher models into Qwen3.5-27B (dense) for local inference on 32GB VRAM with 204K token context. The resulting model is specialized for full-stack web development with F#, Svelte, TypeScript, .NET, Docker, and Kubernetes.

Teacher models: Kimi K2.5, MiniMax M2.7, DeepSeek V3.2 -- each assigned to the domains where they are strongest.

Document	Contents
00-overview.md	Project overview, multi-teacher strategy, student model specs
01-teacher-models.md	Teacher model specs, strengths/weaknesses, comparison matrix
02-domain-specialization.md	Training data composition, F# libraries, training topics
03-distillation-pipeline.md	Data generation, F# compiler verification, doc sources, training steps
04-training-config.md	LoRA config, cloud GPU providers, cost estimates
05-resolved-questions.md	Resolved and open questions

Repository Structure

models/
├── README.md                      # This file -- repository overview
├── OLLAMA-GUIDE.md                # Ollama setup, quantization, KV cache, API reference
├── 24GB-GPU.md                    # Models and config for 24GB GPUs
├── 32GB-GPU.md                    # Models and config for 32GB GPUs
├── creating-ollama-models.md      # Guide: custom Ollama models from HF
├── Training/
│   ├── 00-overview.md             # Distillation project overview
│   ├── 01-teacher-models.md       # Teacher model analysis
│   ├── 02-domain-specialization.md# Training data and domains
│   ├── 03-distillation-pipeline.md# Pipeline and verification
│   ├── 04-training-config.md      # LoRA, hardware, costs
│   └── 05-resolved-questions.md   # Q&A log
└── LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
Training		Training
cards		cards
configs		configs
pipeline		pipeline
.gitignore		.gitignore
24GB-GPU.md		24GB-GPU.md
32GB-GPU.md		32GB-GPU.md
LICENSE		LICENSE
OLLAMA-GUIDE.md		OLLAMA-GUIDE.md
README.md		README.md
creating-ollama-models.md		creating-ollama-models.md
ollama_logprob_investigation.md		ollama_logprob_investigation.md
requirements.txt		requirements.txt
second_pass_plan.md		second_pass_plan.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Models

Local Inference with Ollama

GPU-Specific Guides

Custom Models

Training & Distillation

Repository Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Models

Local Inference with Ollama

GPU-Specific Guides

Custom Models

Training & Distillation

Repository Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages