Skip to content

shandutta/Model_Finetuning

Repository files navigation

Model Fine-Tuning Pipeline

Python 3.12+ License: MIT GPU: RTX 2080 Ti+

Fine-tune coding language models on domain-specific data using QLoRA

This repository provides a complete pipeline for fine-tuning the Qwen2.5-Coder-3B-Instruct model on custom datasets. Train models that understand your specific codebase, coding patterns, and domain knowledge using memory-efficient QLoRA on accessible GPU hardware.

Features

  • Memory-efficient training - QLoRA 8-bit quantization requires only 11GB VRAM
  • Multi-source data processing - Combine conversation logs, git history, and code completion examples
  • Production-ready tools - Comprehensive monitoring, evaluation, and comparison utilities
  • Automated workflows - One-command training with integrated monitoring via tmux

Getting Started

Prerequisites

  • Hardware: NVIDIA RTX 2080 Ti or better (11GB+ VRAM)
  • Software: Python 3.12+, CUDA 11.8+
  • Storage: 50GB+ available space
  • Memory: 16GB+ RAM recommended

Installation

git clone https://github.com/your-username/Model_Finetuning.git
cd Model_Finetuning
uv sync && source .venv/bin/activate

# Verify installation
# (No verification script available)

Project Structure

Model_Finetuning/
├── data/                           # Training datasets (gitignored)
│   ├── claude_logs/               # Processed Claude conversation logs
│   ├── git_history/               # Git commit → SFT examples
│   ├── actual_code/               # FIM examples from codebases
│   └── staging/                   # Final train/eval datasets
├── scripts/
│   ├── data_prep/                 # Data processing pipeline
│   │   ├── claude_logs/          # Claude log → SFT conversion
│   │   └── actual_code/          # Code → FIM generation
│   ├── training/                  # QLoRA training scripts
│   └── testing/                   # Model evaluation tools
├── outputs/                       # Model checkpoints (gitignored)
├── runs/                          # TensorBoard logs (gitignored)
└── logs/                          # Training logs (gitignored)

Quick Start

# 1. Prepare training data
python scripts/data_prep/claude_logs/claude2sft.py --src data/claude_logs/ --out data/train.clean.jsonl
python scripts/data_prep/actual_code/repo2fim.py --repo /path/to/repo --out data/train.fim.jsonl
python scripts/data_prep/merge_and_split.py data/*.jsonl --name full --eval_frac 0.15

# 2. Start training with monitoring
./scripts.sh tmux-train

# 3. Evaluate results
python scripts/testing/compare_baseline_vs_lora.py \
  --prompt "How do I fix a KeyError in Python?" \
  --ckpt outputs/qwen3b_lora_8bit/checkpoint-650

For detailed workflows and options, see scripts documentation.

Training Configuration

Parameter Value Description
Base Model Qwen/Qwen2.5-Coder-3B-Instruct Pre-trained coding language model
Method 8-bit QLoRA Memory-efficient fine-tuning
LoRA Config r=32, α=64, dropout=0.05 Low-rank adaptation parameters
Learning Rate 2e-5 (cosine schedule) Training rate with warm-up
Batch Size 24 (gradient accumulation) Effective batch size
Max Length 2048 tokens Maximum sequence length

Monitoring

Training progress is monitored via TensorBoard and tmux sessions:

# Start training with integrated monitoring
./scripts.sh tmux-train

# Manual monitoring commands
./scripts.sh gpu                    # GPU utilization
./scripts.sh logs                   # Training logs
tensorboard --logdir runs/qwen3b_8bit --port 6006

Serving (vLLM + WebUI) and Limits

Start the local API server (OpenAI-compatible via vLLM) and WebUI (defaults tuned for 11GB GPUs):

# Start vLLM + WebUI with defaults (ctx=4096, max-seqs=2, gpu-mem=0.92)
scripts/serving/setup_webui.sh

# Only vLLM (headless)
scripts/serving/setup_webui.sh start_vllm_only

# Configure context and soft caps
scripts/serving/setup_webui.sh \
  --ctx=8192 \
  --max-seqs=8 \
  --rope=type=linear,factor=2.0 \
  --max-output-tokens=256 \
  --max-completions=1

Notes:

  • --ctx maps to vLLM --max-model-len. You can enable RoPE scaling with --rope, but quality may degrade.
  • --max-output-tokens and --max-completions are passed to the WebUI as defaults; vLLM does not enforce these without a proxy.
  • To avoid 400 errors from overlong prompts, prefer truncation in your client (e.g., Continue) and keep added context lean.

Continue (VS Code) quick tips to stay within context:

  • Set a model-level input cap and truncation in ~/.continue/config.json (names vary by version: look for truncateToFit, maxInputTokens, maxPromptTokens, or similar).
  • Limit context providers (disable repo-wide or diff providers by default) and unpin large contexts.
  • Keep n (completions) to 1 and set maxTokens to a sane default (e.g., 256–512).

Use Cases

  • Code completion - Generate contextually relevant code suggestions
  • Debugging assistance - Provide domain-specific error analysis and fixes
  • Code documentation - Generate documentation matching project style
  • Codebase Q&A - Answer questions about specific codebases and patterns

Scripts Reference

Data Processing

  • scripts/data_prep/claude_logs/claude2sft.py - Convert conversation logs to training format
  • scripts/data_prep/actual_code/repo2fim.py - Generate fill-in-the-middle examples from code
  • scripts/data_prep/merge_and_split.py - Combine datasets and create train/eval splits
  • scripts/data_prep/qc_and_dedupe.py - Quality control and deduplication

Training

  • scripts/training/sft_qlora_8bit.py - Production 8-bit QLoRA training
  • scripts/training/sft_qlora_4bit.py - Development 4-bit QLoRA training

Evaluation

  • scripts/testing/baseline.py - Test baseline model performance
  • scripts/testing/compare_baseline_vs_lora.py - Compare baseline vs fine-tuned models
  • scripts/testing/check_lengths.py - Analyze dataset token length distribution

Utilities

  • scripts.sh - Convenience wrapper with tmux session management
  • Run ./scripts.sh help for all available commands

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

About

QLoRA fine-tuning pipeline for Qwen2.5-Coder-3B-Instruct using multi-source data (Claude logs, Git history, FIM examples)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors