Fine-tune coding language models on domain-specific data using QLoRA
This repository provides a complete pipeline for fine-tuning the Qwen2.5-Coder-3B-Instruct model on custom datasets. Train models that understand your specific codebase, coding patterns, and domain knowledge using memory-efficient QLoRA on accessible GPU hardware.
- Memory-efficient training - QLoRA 8-bit quantization requires only 11GB VRAM
- Multi-source data processing - Combine conversation logs, git history, and code completion examples
- Production-ready tools - Comprehensive monitoring, evaluation, and comparison utilities
- Automated workflows - One-command training with integrated monitoring via tmux
- Hardware: NVIDIA RTX 2080 Ti or better (11GB+ VRAM)
- Software: Python 3.12+, CUDA 11.8+
- Storage: 50GB+ available space
- Memory: 16GB+ RAM recommended
git clone https://github.com/your-username/Model_Finetuning.git
cd Model_Finetuning
uv sync && source .venv/bin/activate
# Verify installation
# (No verification script available)Model_Finetuning/
├── data/ # Training datasets (gitignored)
│ ├── claude_logs/ # Processed Claude conversation logs
│ ├── git_history/ # Git commit → SFT examples
│ ├── actual_code/ # FIM examples from codebases
│ └── staging/ # Final train/eval datasets
├── scripts/
│ ├── data_prep/ # Data processing pipeline
│ │ ├── claude_logs/ # Claude log → SFT conversion
│ │ └── actual_code/ # Code → FIM generation
│ ├── training/ # QLoRA training scripts
│ └── testing/ # Model evaluation tools
├── outputs/ # Model checkpoints (gitignored)
├── runs/ # TensorBoard logs (gitignored)
└── logs/ # Training logs (gitignored)
# 1. Prepare training data
python scripts/data_prep/claude_logs/claude2sft.py --src data/claude_logs/ --out data/train.clean.jsonl
python scripts/data_prep/actual_code/repo2fim.py --repo /path/to/repo --out data/train.fim.jsonl
python scripts/data_prep/merge_and_split.py data/*.jsonl --name full --eval_frac 0.15
# 2. Start training with monitoring
./scripts.sh tmux-train
# 3. Evaluate results
python scripts/testing/compare_baseline_vs_lora.py \
--prompt "How do I fix a KeyError in Python?" \
--ckpt outputs/qwen3b_lora_8bit/checkpoint-650For detailed workflows and options, see scripts documentation.
| Parameter | Value | Description |
|---|---|---|
| Base Model | Qwen/Qwen2.5-Coder-3B-Instruct | Pre-trained coding language model |
| Method | 8-bit QLoRA | Memory-efficient fine-tuning |
| LoRA Config | r=32, α=64, dropout=0.05 | Low-rank adaptation parameters |
| Learning Rate | 2e-5 (cosine schedule) | Training rate with warm-up |
| Batch Size | 24 (gradient accumulation) | Effective batch size |
| Max Length | 2048 tokens | Maximum sequence length |
Training progress is monitored via TensorBoard and tmux sessions:
# Start training with integrated monitoring
./scripts.sh tmux-train
# Manual monitoring commands
./scripts.sh gpu # GPU utilization
./scripts.sh logs # Training logs
tensorboard --logdir runs/qwen3b_8bit --port 6006Start the local API server (OpenAI-compatible via vLLM) and WebUI (defaults tuned for 11GB GPUs):
# Start vLLM + WebUI with defaults (ctx=4096, max-seqs=2, gpu-mem=0.92)
scripts/serving/setup_webui.sh
# Only vLLM (headless)
scripts/serving/setup_webui.sh start_vllm_only
# Configure context and soft caps
scripts/serving/setup_webui.sh \
--ctx=8192 \
--max-seqs=8 \
--rope=type=linear,factor=2.0 \
--max-output-tokens=256 \
--max-completions=1Notes:
--ctxmaps to vLLM--max-model-len. You can enable RoPE scaling with--rope, but quality may degrade.--max-output-tokensand--max-completionsare passed to the WebUI as defaults; vLLM does not enforce these without a proxy.- To avoid 400 errors from overlong prompts, prefer truncation in your client (e.g., Continue) and keep added context lean.
Continue (VS Code) quick tips to stay within context:
- Set a model-level input cap and truncation in
~/.continue/config.json(names vary by version: look fortruncateToFit,maxInputTokens,maxPromptTokens, or similar). - Limit context providers (disable repo-wide or diff providers by default) and unpin large contexts.
- Keep
n(completions) to 1 and setmaxTokensto a sane default (e.g., 256–512).
- Code completion - Generate contextually relevant code suggestions
- Debugging assistance - Provide domain-specific error analysis and fixes
- Code documentation - Generate documentation matching project style
- Codebase Q&A - Answer questions about specific codebases and patterns
scripts/data_prep/claude_logs/claude2sft.py- Convert conversation logs to training formatscripts/data_prep/actual_code/repo2fim.py- Generate fill-in-the-middle examples from codescripts/data_prep/merge_and_split.py- Combine datasets and create train/eval splitsscripts/data_prep/qc_and_dedupe.py- Quality control and deduplication
scripts/training/sft_qlora_8bit.py- Production 8-bit QLoRA trainingscripts/training/sft_qlora_4bit.py- Development 4-bit QLoRA training
scripts/testing/baseline.py- Test baseline model performancescripts/testing/compare_baseline_vs_lora.py- Compare baseline vs fine-tuned modelsscripts/testing/check_lengths.py- Analyze dataset token length distribution
scripts.sh- Convenience wrapper with tmux session management- Run
./scripts.sh helpfor all available commands
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Qwen Team for the base Qwen2.5-Coder model
- Hugging Face for transformers, datasets, and PEFT libraries
- QLoRA authors for memory-efficient fine-tuning methodology