-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
title: "Feature Request: Add Model Fine-tuning Support via Unsloth Backend"
labels: ["enhancement", "roadmap", "backends"]
Summary
This feature request proposes adding native model fine-tuning support to LocalAI by integrating Unsloth as a backend. This would enable users to fine-tune models directly through LocalAI's API and UI, with support for background jobs and progress tracking.
Motivation
Currently, LocalAI does not support fine-tuning endpoints. Users must manually fine-tune models using external tools (like Axolotl, as documented in docs/content/advanced/fine-tuning.md), then convert and import the resulting models. This workflow is:
- Complex and error-prone
- Requires external tooling knowledge
- Not integrated with LocalAI's model management UI
- Lacks progress tracking and job management
Integrating Unsloth would provide:
- Native fine-tuning API accessible via HTTP/gRPC
- UI integration with background job support
- Efficient fine-tuning using Unsloth's optimized implementations (up to 2x faster, 60% less memory)
- Seamless workflow from fine-tuning to model deployment
Proposed Implementation
1. Backend Protocol Updates (backend/backend.proto)
Add a new gRPC service/method for fine-tuning operations:
// Fine-tuning service
service FineTuning {
// Start a fine-tuning job
rpc StartFineTuning(FineTuningRequest) returns (FineTuningJob) {}
// Get fine-tuning job status
rpc GetFineTuningJobStatus(FineTuningJobStatusRequest) returns (FineTuningJob) {}
// List all fine-tuning jobs
rpc ListFineTuningJobs(ListFineTuningJobsRequest) returns (ListFineTuningJobsResponse) {}
// Cancel a fine-tuning job
rpc CancelFineTuningJob(CancelFineTuningJobRequest) returns (Result) {}
}
// Fine-tuning request message
message FineTuningRequest {
string base_model = 1; // Base model to fine-tune (e.g., "llama-3-8b")
string dataset_path = 2; // Path to training dataset
string dataset_format = 3; // Dataset format (e.g., "alpaca", "conversational", "completion")
string output_path = 4; // Output directory for fine-tuned model
FineTuningConfig config = 5; // Fine-tuning configuration
}
message FineTuningConfig {
string technique = 1; // Fine-tuning technique: "qlora", "lora", "full"
int32 epochs = 2; // Number of training epochs
float learning_rate = 3; // Learning rate
int32 batch_size = 4; // Training batch size
int32 gradient_accumulation = 5;
string quantization = 6; // "4bit", "8bit", "none"
map<string, string> extra_params = 7; // Additional unsloth parameters
}
message FineTuningJob {
string job_id = 1;
string status = 2; // "pending", "running", "completed", "failed", "cancelled"
string base_model = 3;
string output_model = 4; // Path to fine-tuned model (when completed)
double progress = 5; // 0.0 to 1.0
string error_message = 6; // Error details if failed
int64 created_at = 7;
int64 completed_at = 8;
}
message FineTuningJobStatusRequest {
string job_id = 1;
}
message ListFineTuningJobsRequest {
int32 limit = 1;
string status_filter = 2; // Optional: filter by status
}
message ListFineTuningJobsResponse {
repeated FineTuningJob jobs = 1;
int32 total = 2;
}
message CancelFineTuningJobRequest {
string job_id = 1;
}2. Python Backend Implementation (backend/python/unsloth/)
Create a new Unsloth backend following the existing Python backend pattern:
Directory structure:
backend/python/unsloth/
├── backend.py # gRPC server implementing FineTuning service
├── Makefile
├── install.sh
├── protogen.sh
├── requirements.txt # unsloth, torch, accelerate, etc.
├── run.sh
└── test.py
Key implementation details:
- Use Unsloth's
FastLanguageModelandunsloth.trainerfor efficient fine-tuning - Support QLoRA, LoRA, and full fine-tuning techniques
- Integrate with LocalAI's gRPC infrastructure
- Support hardware detection (CUDA, MLX, CPU) similar to other Python backends
- Implement streaming progress updates during training
3. HTTP API Endpoints
Add new HTTP endpoints in core/http/routes/localai.go:
// POST /v1/fine-tuning/jobs - Start a fine-tuning job
// GET /v1/fine-tuning/jobs - List fine-tuning jobs
// GET /v1/fine-tuning/jobs/{job_id} - Get job status
// POST /v1/fine-tuning/jobs/{job_id}/cancel - Cancel a jobThese endpoints should:
- Validate input parameters
- Submit jobs to the backend via gRPC
- Return job IDs for tracking
- Support async operation with status polling
4. UI Integration (React UI)
Add fine-tuning UI components in core/http/react-ui/:
New pages/views:
/fine-tuning- Main fine-tuning page with job listing/fine-tuning/new- Create new fine-tuning job form/fine-tuning/{job_id}- Job status and progress view
Features:
- Select base model from available models
- Upload or specify dataset path
- Configure fine-tuning parameters (epochs, learning rate, quantization, etc.)
- Real-time progress tracking (loss curve, ETA, current step)
- Job history with ability to download/use fine-tuned models
- Background job indicators in the UI
5. Background Job Service
Integrate with LocalAI's existing job management:
- Use the existing agent job service (
/api/agent/jobs/*) or create a dedicated fine-tuning job service - Support job persistence and recovery
- Provide webhooks or notifications for job completion
Technical Considerations
Unsloth Integration Benefits
- Memory efficiency: 60% less VRAM usage compared to standard training
- Speed: Up to 2x faster training with optimized kernels
- Compatibility: Supports popular models (Llama, Mistral, Gemma, Qwen, etc.)
- Quantization: Native 4-bit and 8-bit quantization support
Dataset Formats
Support common dataset formats:
- Alpaca/Instruction format
- Conversational format
- Completion format
- JSON/JSONL
- Hugging Face datasets
Model Export
- Output in GGUF format for direct LocalAI consumption
- Optional: Export in original format (Hugging Face)
- Automatic model registration after fine-tuning
Resource Management
- GPU memory monitoring and warnings
- Support for multi-GPU training (via Unsloth's distributed training)
- Configurable resource limits
Documentation Updates Required
docs/content/advanced/fine-tuning.md- Update with native API usagedocs/content/features/- Add fine-tuning feature documentation- API documentation (Swagger/OpenAPI)
- UI user guide for fine-tuning workflow
- Example datasets and use cases
References
- Unsloth GitHub
- Existing LocalAI Fine-tuning Guide
- Issue #596 - Previous discussion on fine-tuning support
- Python Backend README - Backend implementation guide
- Adding Backends Guide - Steps for adding new backends
Priority
This feature would significantly enhance LocalAI's capabilities, enabling a complete MLOps workflow from model fine-tuning to deployment. Given the growing demand for customizable models and Unsloth's efficiency gains, this is a high-value addition to the platform.
Next Steps
- Create prototype Unsloth backend
- Implement gRPC service definition
- Add HTTP API endpoints
- Build React UI components
- Write documentation and examples
- Add CI/CD build configurations
Labels: enhancement, roadmap, backends, fine-tuning
Priority: High (aligns with LocalAI's goal of being a complete local AI platform)