Skip to content

Feature Request: Add Model Fine-tuning Support via Unsloth Backend #9054

@localai-bot

Description

@localai-bot

title: "Feature Request: Add Model Fine-tuning Support via Unsloth Backend"
labels: ["enhancement", "roadmap", "backends"]

Summary

This feature request proposes adding native model fine-tuning support to LocalAI by integrating Unsloth as a backend. This would enable users to fine-tune models directly through LocalAI's API and UI, with support for background jobs and progress tracking.

Motivation

Currently, LocalAI does not support fine-tuning endpoints. Users must manually fine-tune models using external tools (like Axolotl, as documented in docs/content/advanced/fine-tuning.md), then convert and import the resulting models. This workflow is:

  • Complex and error-prone
  • Requires external tooling knowledge
  • Not integrated with LocalAI's model management UI
  • Lacks progress tracking and job management

Integrating Unsloth would provide:

  • Native fine-tuning API accessible via HTTP/gRPC
  • UI integration with background job support
  • Efficient fine-tuning using Unsloth's optimized implementations (up to 2x faster, 60% less memory)
  • Seamless workflow from fine-tuning to model deployment

Proposed Implementation

1. Backend Protocol Updates (backend/backend.proto)

Add a new gRPC service/method for fine-tuning operations:

// Fine-tuning service
service FineTuning {
  // Start a fine-tuning job
  rpc StartFineTuning(FineTuningRequest) returns (FineTuningJob) {}
  
  // Get fine-tuning job status
  rpc GetFineTuningJobStatus(FineTuningJobStatusRequest) returns (FineTuningJob) {}
  
  // List all fine-tuning jobs
  rpc ListFineTuningJobs(ListFineTuningJobsRequest) returns (ListFineTuningJobsResponse) {}
  
  // Cancel a fine-tuning job
  rpc CancelFineTuningJob(CancelFineTuningJobRequest) returns (Result) {}
}

// Fine-tuning request message
message FineTuningRequest {
  string base_model = 1;      // Base model to fine-tune (e.g., "llama-3-8b")
  string dataset_path = 2;    // Path to training dataset
  string dataset_format = 3;  // Dataset format (e.g., "alpaca", "conversational", "completion")
  string output_path = 4;     // Output directory for fine-tuned model
  FineTuningConfig config = 5; // Fine-tuning configuration
}

message FineTuningConfig {
  string technique = 1;       // Fine-tuning technique: "qlora", "lora", "full"
  int32 epochs = 2;           // Number of training epochs
  float learning_rate = 3;    // Learning rate
  int32 batch_size = 4;       // Training batch size
  int32 gradient_accumulation = 5;
  string quantization = 6;    // "4bit", "8bit", "none"
  map<string, string> extra_params = 7; // Additional unsloth parameters
}

message FineTuningJob {
  string job_id = 1;
  string status = 2;          // "pending", "running", "completed", "failed", "cancelled"
  string base_model = 3;
  string output_model = 4;    // Path to fine-tuned model (when completed)
  double progress = 5;        // 0.0 to 1.0
  string error_message = 6;   // Error details if failed
  int64 created_at = 7;
  int64 completed_at = 8;
}

message FineTuningJobStatusRequest {
  string job_id = 1;
}

message ListFineTuningJobsRequest {
  int32 limit = 1;
  string status_filter = 2;   // Optional: filter by status
}

message ListFineTuningJobsResponse {
  repeated FineTuningJob jobs = 1;
  int32 total = 2;
}

message CancelFineTuningJobRequest {
  string job_id = 1;
}

2. Python Backend Implementation (backend/python/unsloth/)

Create a new Unsloth backend following the existing Python backend pattern:

Directory structure:

backend/python/unsloth/
├── backend.py          # gRPC server implementing FineTuning service
├── Makefile
├── install.sh
├── protogen.sh
├── requirements.txt    # unsloth, torch, accelerate, etc.
├── run.sh
└── test.py

Key implementation details:

  • Use Unsloth's FastLanguageModel and unsloth.trainer for efficient fine-tuning
  • Support QLoRA, LoRA, and full fine-tuning techniques
  • Integrate with LocalAI's gRPC infrastructure
  • Support hardware detection (CUDA, MLX, CPU) similar to other Python backends
  • Implement streaming progress updates during training

3. HTTP API Endpoints

Add new HTTP endpoints in core/http/routes/localai.go:

// POST /v1/fine-tuning/jobs - Start a fine-tuning job
// GET /v1/fine-tuning/jobs - List fine-tuning jobs
// GET /v1/fine-tuning/jobs/{job_id} - Get job status
// POST /v1/fine-tuning/jobs/{job_id}/cancel - Cancel a job

These endpoints should:

  • Validate input parameters
  • Submit jobs to the backend via gRPC
  • Return job IDs for tracking
  • Support async operation with status polling

4. UI Integration (React UI)

Add fine-tuning UI components in core/http/react-ui/:

New pages/views:

  • /fine-tuning - Main fine-tuning page with job listing
  • /fine-tuning/new - Create new fine-tuning job form
  • /fine-tuning/{job_id} - Job status and progress view

Features:

  • Select base model from available models
  • Upload or specify dataset path
  • Configure fine-tuning parameters (epochs, learning rate, quantization, etc.)
  • Real-time progress tracking (loss curve, ETA, current step)
  • Job history with ability to download/use fine-tuned models
  • Background job indicators in the UI

5. Background Job Service

Integrate with LocalAI's existing job management:

  • Use the existing agent job service (/api/agent/jobs/*) or create a dedicated fine-tuning job service
  • Support job persistence and recovery
  • Provide webhooks or notifications for job completion

Technical Considerations

Unsloth Integration Benefits

  • Memory efficiency: 60% less VRAM usage compared to standard training
  • Speed: Up to 2x faster training with optimized kernels
  • Compatibility: Supports popular models (Llama, Mistral, Gemma, Qwen, etc.)
  • Quantization: Native 4-bit and 8-bit quantization support

Dataset Formats

Support common dataset formats:

  • Alpaca/Instruction format
  • Conversational format
  • Completion format
  • JSON/JSONL
  • Hugging Face datasets

Model Export

  • Output in GGUF format for direct LocalAI consumption
  • Optional: Export in original format (Hugging Face)
  • Automatic model registration after fine-tuning

Resource Management

  • GPU memory monitoring and warnings
  • Support for multi-GPU training (via Unsloth's distributed training)
  • Configurable resource limits

Documentation Updates Required

  • docs/content/advanced/fine-tuning.md - Update with native API usage
  • docs/content/features/ - Add fine-tuning feature documentation
  • API documentation (Swagger/OpenAPI)
  • UI user guide for fine-tuning workflow
  • Example datasets and use cases

References

Priority

This feature would significantly enhance LocalAI's capabilities, enabling a complete MLOps workflow from model fine-tuning to deployment. Given the growing demand for customizable models and Unsloth's efficiency gains, this is a high-value addition to the platform.

Next Steps

  1. Create prototype Unsloth backend
  2. Implement gRPC service definition
  3. Add HTTP API endpoints
  4. Build React UI components
  5. Write documentation and examples
  6. Add CI/CD build configurations

Labels: enhancement, roadmap, backends, fine-tuning
Priority: High (aligns with LocalAI's goal of being a complete local AI platform)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions