A small fine-tuned language model (Qwen3.5-4B + LoRA) with a FAISS+BM25 retrieval layer, deployed locally via Ollama. Helps users write microdata.no scripts and look up SSB variable metadata.
The RAG index is pre-built and shipped in this repo. The model weights live on Hugging Face. No training, no scraping, no Docker required.
git clone https://github.com/forlop/microdata-no-copilot
cd microdata-no-copilot
pip install -r requirements.txt streamlit
# Install Ollama (one-time, OS-specific):
# Linux/WSL: curl -fsSL https://ollama.com/install.sh | sh
# macOS: brew install ollama (or download from ollama.com)
# Windows: download OllamaSetup.exe from ollama.com
ollama pull hf.co/forlop/microdata-copilot-v2:Q4_K_M
ollama create microdata-copilot -f deploy/Modelfile
streamlit run rag/app.pyThe pull grabs the GGUF (~2.7 GB) from Hugging Face; ollama create then applies the SYSTEM prompt, refusal few-shots, and stop-token parameters from deploy/Modelfile — without this step the model bleeds <|endoftext|> tokens and loops.
Streamlit prints a http://localhost:8501 URL — open it in your browser and ask a microdata.no question. On CPU expect ~10-15 s per response; on a recent GPU, ~1-2 s.
Model weights: huggingface.co/forlop/microdata-copilot-v2 (q4_k_m GGUF, 2.7 GB).
| Document | What it is | For whom |
|---|---|---|
TECHNICAL_NOTE.md |
Comprehensive technical record — architecture, design choices, evaluation, lessons, deployment story. Has a reader's preamble with glossary for non-ML-expert readers. | SSB partners, future maintainers, anyone evaluating the project |
IMPLEMENTATION_PLAN.md |
The original forward-looking plan (mostly historical now) | Reference |
*/README.md |
Per-phase technical detail (scrape/, cards/, train/, eval/, rag/, deploy/) |
Reproducers, contributors |
If you read only one file: TECHNICAL_NOTE.md — its §F has a reading-guide that points to the right sections by audience and goal.
| Folder | Purpose |
|---|---|
scrape/ |
Variable / example / manual scrapers (re-runnable, version-aware) |
cards/ |
Training-card generation from scraped sources |
train/ |
Unsloth QLoRA training + merge + GGUF export |
eval/ |
Eval sets (v1 iteration + v2 held-out + adversarial) + three scorers (substring, LLM-judge, syntax-validator) |
rag/ |
FAISS + BM25 index build + retrieval + Ollama wrapper |
deploy/ |
Ollama Modelfile |
configs/ |
yaml configs for scrape and train |
| Path | Holds |
|---|---|
D:\Work\microdata_LoRA\repo\v2\ |
This codebase (~3,500 lines Python + Markdown) |
D:\Work\microdata_LoRA\data_raw\ |
Scraped JSON, manual text, PDF (~12 MB) |
D:\Work\microdata_LoRA\data_processed\ |
Cards JSONL, FAISS index, embeddings (~80 MB) |
D:\Work\microdata_LoRA\models\ |
LoRA adapters, merged safetensors, GGUFs (~12 GB total) |
D:\Work\microdata_LoRA\logs\ |
Training + eval logs |
Dropbox microdata_LoRA\archive_v1\ |
Frozen v1 reference (read-only) |
- v2.0 LoRA adapter trained on Qwen3.5-4B base, quantized to q4_k_m GGUF (2.7 GB)
- Served via Ollama as
microdata-copilotonlocalhost:11434 - FAISS + BM25 indexes built over 729 variables + ~100 manual sections + 40 examples
- Internal iteration eval: 82.6% pass rate (lenient substring scoring)
- Strict held-out + LLM-judge eval: 53.8% pass rate — this is the honest measurement
- 100% jailbreak resistance, 80% RAG-class on held-out, 0% on stale-fact probes (calibration weakness)
See TECHNICAL_NOTE.md §17 and §21 for the full results breakdown.
End-to-end on a 16 GB GPU machine: ~3 hours of compute + ~30 min human attention. See per-phase READMEs:
scrape/README.md— Phase 1 (~25 min)cards/(no README; seegenerate_cards_v22.pysource comments) — Phase 2 (~1 min)train/README.md— Phase 3 (~1.5h train + ~5 min export)eval/README.md— Phase 4rag/README.md— Phase 5 (~2 min after Phase 1)deploy/README.md— Phase 6 (~1 min)