QTinker - Distill & Quantize with TorchAO

A modern, production-ready application for distilling and quantizing language models using TorchAO with intelligent GPU/CPU management and Pinokio launcher support.

Features

🎯 Flexible Model Loading: Support for HuggingFace models, Stable Diffusion models, and PyTorch checkpoints
🖼️ Stable Diffusion & Diffusers Support: Full support for all SD models (1.5, 2.x, SDXL) and other diffusers models
- Complete pipeline loading (model_index.json)
- Individual component loading (UNet, VAE, Text Encoder)
- Automatic model architecture detection
- Raw state_dict support with intelligent wrapping
🤗 Comprehensive BERT Support: 15+ BERT model variants (Large, Base, Small, Multilingual) - NO HuggingFace token required
- Automatic model downloading from Google Cloud Storage
- BERT-Large variants for teacher models (uncased, cased, whole-word masking)
- BERT-Small variants for student models (small, mini, tiny, medium)
- Multilingual support (104 languages, Chinese)
- DistilBERT variants documentation
- Automatic MODEL_REGISTRY.md generation
🧪 Advanced Distillation Strategies: Multiple knowledge distillation methods including:
- Logit-based Knowledge Distillation (KD)
- Patient Knowledge Distillation (matching specific layers)
- Feature-based Distillation (intermediate layer matching)
- AttentionDistillationKD: Matches attention maps between teacher and student models
- Custom projection layers for dimension matching
- Configurable temperature parameters
⚡ TorchAO Quantization: Professional-grade quantization with multiple options:
- INT4 Weight-Only (group_size configurable)
- INT8 Dynamic Quantization
- SmoothQuant (INT4): Optimized for model with activation outliers
- NormalFloat-4 (NF4): Support for QLoRA-style NormalFloat data types
- GPTQ (4-bit): Post-training quantization with group-wise weight adjustments
- FP8 and other advanced formats
- Dynamic parameter UI (Alpha for SmoothQuant, Group Size for GPTQ)
- Model-specific quantization configurations
📂 Enhanced File Browser: Native file dialog integration for easy model selection with smart default paths
🎨 Gradio Web UI: Beautiful, responsive web interface with real-time log streaming and custom dark theme
📦 Modular Architecture: Clean separation of concerns with pluggable components
🖥️ Smart GPU/CPU Management: Automatic device selection and switching with CUDA support for inference engines
💾 Memory Efficient Processing: Intelligent VRAM monitoring and fallback strategies
🚀 Pinokio Launcher Integration: One-click installation, start, update, and reset with robust cross-platform support
🔧 Model Selection UI: Interactive file pickers for teacher, student, and target models
📊 Registry System: Comprehensive model registry for tracking supported architectures
🔗 Symbolic Linking: Automatic model linking for seamless integration

Project Structure

QTinker/
├── app/
│   ├── app.py              # Main entry point (Pinokio compatible)
│   ├── gradio_ui.py        # Full Gradio web interface
│   ├── distillation.py     # Advanced distillation strategies (KD, Patient-KD, etc.)
│   ├── distill_quant_app.py # Legacy desktop UI (Tkinter)
│   ├── model_loader.py     # Unified model loading utilities
│   ├── download_models.py  # Model download and management
│   ├── registry.py         # Model architecture registry
│   ├── run_distillation.py # Distillation pipeline executor
│   ├── bert_models/        # BERT model implementations
│   ├── distilled/          # Output: Distilled models
│   └── quantized/          # Output: Quantized models
├── config/
│   ├── paths.yaml          # Path configuration
│   ├── quant_presets.yaml  # Quantization presets
│   └── settings.yaml       # Application settings
├── configs/
│   └── torchao_configs.py  # TorchAO quantization configurations
├── core/
│   ├── device_manager.py   # GPU/CPU management
│   ├── distillation.py     # Core distillation logic
│   ├── local_llm.py        # Local LLM utilities
│   └── logic.py            # Main pipeline logic
├── settings/
│   └── app_settings.py     # Global application settings
├── data/
│   └── train_prompts.txt   # Training prompts for distillation
├── outputs/                # Output directories
│   ├── distilled/          # Distilled model artifacts
│   └── quantized/          # Quantized model artifacts
├── install.js             # Pinokio installation script
├── start.js               # Pinokio launcher script
├── update.js              # Pinokio update script
├── reset.js               # Pinokio reset script
├── pinokio.js             # Pinokio UI definition
├── select_teacher_model.js # Teacher model selector
├── select_student_model.js # Student model selector
├── select_quantize_model.js # Quantization target selector
├── distill_quantize.js    # Combined distill & quantize trigger
├── link.js                # Model symbolic linking
├── requirements.txt       # Python dependencies
├── pyproject.toml         # Project configuration
├── pinokio_meta.json      # Metadata and state persistence
└── README.md              # This file

Installation

Automatic Installation (Recommended via Pinokio)

Simply open the project in Pinokio and click the "Install" button. The launcher will automatically:

Create a Python virtual environment
Install all dependencies using uv pip
Set up PyTorch with CUDA support (if available)
Configure the application

Manual Installation with pip

pip install -r requirements.txt

Manual Installation with uv (recommended for Pinokio)

uv pip install -r requirements.txt

CUDA Support

If you need specific CUDA-enabled PyTorch wheels:

# For CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Supported Model Types

QTinker supports a wide variety of model formats and automatically detects the model type:

🤗 HuggingFace Models

Transformers: BERT, GPT-2, LLaMA, Mistral, Phi, any AutoModel
Sentence Transformers: Embedding models
Vision: ViT, CLIP, DINOv2
Audio: Whisper, Wav2Vec2
Any HuggingFace model with config.json

How to use: Provide path to HuggingFace model folder containing config.json

🖼️ Stable Diffusion & Diffusers

Stable Diffusion 1.5: All community checkpoints
Stable Diffusion 2.x: v2.0, v2.1, variations
SDXL: Stable Diffusion XL and variants
Other Diffusers: ControlNet, LoRA-compatible models, custom pipelines
Components: Individual UNet, VAE, or Text Encoder models

How to use:

Full pipeline (recommended): Provide folder with model_index.json
Components: Provide folder with unet/, vae/, or text_encoder/ subfolders
Raw state_dict: Provide .bin file - the app will auto-detect and wrap it

📦 PyTorch Checkpoints

Raw .pt or .bin files containing model weights (state_dict)
Custom model architectures
Fine-tuned model checkpoints

How to use: Provide path to .pt or .bin file - the app will intelligently wrap it

🔄 Auto-Detection

The app automatically detects the model type by examining:

Directory structure and config files
JSON metadata files
File extensions and content
State dict keys (for raw weights)

No need to manually select the model type - just provide the path!

Usage

Using Pinokio Launcher (Easiest)

Open QTinker in Pinokio
Click "Install" (first time only) to set up dependencies
Click "Start" to launch the Gradio web interface
The interface will open automatically in your browser
Use the web UI to select models and run distillation/quantization
Use the model selector tools in the sidebar to configure your models:
- Select Teacher Model: Choose the teacher model for knowledge distillation
- Select Student Model: Choose the student model to be distilled
- Select Quantize Model: Choose the model to quantize
- Distill & Quantize: Run the complete pipeline

Running the Application Directly

python app/app.py

Or directly access the Gradio UI:

cd app
python gradio_ui.py

The Gradio interface will be available at http://localhost:7860

Using the Web Interface

Model Selection: Use the sidebar buttons to select teacher, student, and quantization target models
Model Path: Enter the path to your model (HuggingFace folder, Stable Diffusion folder, or PyTorch checkpoint)
Model Type: The type is auto-detected, but you can override it from the dropdown:
- HuggingFace folder
- Diffusers (Stable Diffusion and other diffusers models)
- PyTorch .pt/.bin file
Quantization Type: Choose your quantization method:
- INT4 (weight-only) - More aggressive compression
- INT8 (dynamic) - Better accuracy with moderate compression
Distillation Strategy (if applicable):
- Logit KD - Match output logits
- Patient KD - Match intermediate layers
Run: Click "Run Distill + Quantize" to start the pipeline
Monitor: Watch real-time log output for progress and debugging

Programmatic Usage

from core.logic import run_pipeline

# Run the complete pipeline
distilled_path, quantized_path = run_pipeline(
    model_path="microsoft/phi-2",
    model_type="HuggingFace folder",
    quant_type="INT8 (dynamic)",
    log_fn=print
)

For advanced usage with custom distillation strategies:

from app.distillation import LogitKD, PatientKD
from core.device_manager import DeviceManager

# Create device manager
device_manager = DeviceManager()
device = device_manager.get_device()

# Load teacher and student models
teacher_model = load_model("teacher_path")
student_model = load_model("student_path")

# Apply distillation strategy
strategy = LogitKD(teacher_model, student_model, temperature=3.0)
loss = strategy.compute_loss(student_outputs, teacher_outputs)

Configuration

TorchAO Quantization Configs

Edit configs/torchao_configs.py to customize quantization settings:

from torchao.quantization.configs import Int4WeightOnlyConfig, Int8DynamicConfig

# INT4 Configuration
Int4WeightOnlyConfig(
    group_size=128,          # Default: 128 (lower = more granular, slower)
    inner_k_tiles=8,         # Tiling for optimization
    padding_allowed=True     # Allow padding for performance
)

# INT8 Configuration  
Int8DynamicConfig(
    act_range_method="minmax"  # Range calculation method
)

Application Settings

Edit settings/app_settings.py to customize:

Output directories
Default model/quantization types
GPU/CPU management thresholds
Device switching behavior
Memory limits

Example:

# Device Management
MIN_VRAM_GB = 2.0              # Minimum VRAM to use GPU
VRAM_THRESHOLD = 0.9           # Use CPU if model > 90% of VRAM
AUTO_DEVICE_SWITCHING = True   # Enable automatic switching

# Output Directories
DISTILLED_OUTPUT_DIR = "outputs/distilled/"
QUANTIZED_OUTPUT_DIR = "outputs/quantized/"

# Model Defaults
DEFAULT_MODEL_TYPE = "HuggingFace folder"
DEFAULT_QUANT_TYPE = "INT8 (dynamic)"

Model Registry

The registry.py file maintains a comprehensive registry of supported model architectures with their optimal configurations:

SUPPORTED_MODELS = {
    "phi-2": {
        "type": "causal-lm",
        "default_quant": "INT4",
        "supports_distillation": True
    },
    "bert-base-uncased": {
        "type": "masked-lm",
        "default_quant": "INT8",
        "supports_distillation": True
    },
    # ... more models
}

Advanced Features

Knowledge Distillation Strategies

QTinker supports multiple knowledge distillation methods:

1. Logit-based Knowledge Distillation (LogitKD)

Matches the output logits between teacher and student models using KL divergence with temperature scaling.

from app.distillation import LogitKD

strategy = LogitKD(teacher_model, student_model, temperature=3.0)
loss = strategy.compute_loss(student_outputs, teacher_outputs)

Best for: General-purpose distillation, good baseline for most architectures

2. Patient Knowledge Distillation (PatientKD)

Matches hidden states at specific layers between teacher and student models. Useful when student architecture differs significantly from teacher.

from app.distillation import PatientKD

strategy = PatientKD(
    teacher_model, 
    student_model,
    student_layers=[2, 4, 6],      # Layers to extract from student
    teacher_layers=[4, 8, 12],     # Corresponding teacher layers
    loss_fn=F.mse_loss
)
loss = strategy.compute_loss(student_outputs, teacher_outputs)

Best for: Custom architectures, fine-grained control, layer-specific matching

3. Projection Layer Matching

Automatically handles dimension mismatches between teacher and student hidden states:

from app.distillation import ProjectionLayer

projection = ProjectionLayer(student_dim=768, teacher_dim=1024)
projected_student = projection(student_hidden_states)

Best for: Distilling to significantly smaller models

Device Management System

The intelligent device manager ensures optimal GPU/CPU utilization:

Automatic GPU Detection: Detects CUDA (NVIDIA), MPS (Apple Silicon), or CPU
VRAM Monitoring: Real-time GPU memory tracking
Automatic Fallback: Seamlessly switches to CPU when:
- Less than 2GB VRAM available
- Model size exceeds 90% of available VRAM
- GPU runs out of memory during processing
Memory Efficiency: Models loaded on CPU first, then moved to GPU if appropriate
Cache Management: Automatic GPU cache clearing between operations

from core.device_manager import DeviceManager

device_manager = DeviceManager()
device = device_manager.get_device()
print(f"Using device: {device}")  # Outputs: cuda, mps, or cpu

Model Loading Utilities

Unified model loading with automatic format detection:

from app.model_loader import load_model

# Supports multiple formats
model = load_model("facebook/opt-350m")  # HuggingFace
model = load_model("./local_model.pt")   # Local PyTorch
model = load_model("./model/")           # Local folder

Model Registry System

Track and manage supported model architectures:

from app.registry import ModelRegistry

registry = ModelRegistry()
supported_models = registry.get_supported_models()
config = registry.get_model_config("phi-2")

Dependencies

torch>=2.0.0 - PyTorch deep learning framework
torchao>=0.1.0 - TorchAO quantization library
transformers>=4.30.0 - HuggingFace transformers for model loading
gradio>=4.0.0 - Web UI framework for interactive interface
accelerate>=0.20.0 - Model acceleration utilities
pyyaml - Configuration file handling
numpy - Numerical computing

Full dependency list available in requirements.txt

GPU/CPU Management

The application automatically manages device selection:

GPU Detection: Automatically detects and uses CUDA (NVIDIA) or MPS (Apple Silicon) when available
VRAM Monitoring: Monitors GPU memory usage and switches to CPU when VRAM is limited
Automatic Fallback: Falls back to CPU if:
- Less than 2GB VRAM is available
- Model size exceeds 90% of available VRAM
- GPU runs out of memory during processing
Memory Efficient: Loads models on CPU first, then moves to GPU if appropriate
Cache Management: Automatically clears GPU cache between operations

Device Settings

You can adjust device management behavior in settings/app_settings.py:

MIN_VRAM_GB = 2.0  # Minimum VRAM required to use GPU
VRAM_THRESHOLD = 0.9  # Use CPU if model size > VRAM * threshold
AUTO_DEVICE_SWITCHING = True  # Enable automatic device switching

Output Directories

All output models are saved in standard HuggingFace format for easy reuse:

Distilled Models: outputs/distilled/ - Models after knowledge distillation
Quantized Models: outputs/quantized/ - Models after quantization

Each output includes:

Model weights and architecture
Tokenizers (when available)
Configuration files
Quantization metadata

GPU/CPU Management

Automatic Device Selection

The application automatically manages device selection based on available hardware:

GPU Detection:

NVIDIA CUDA GPUs
Apple Silicon (MPS)
CPU fallback

Memory Management:

Monitors GPU VRAM in real-time
Prevents out-of-memory errors
Switches to CPU when necessary

Threshold Settings (configurable in settings/app_settings.py):

MIN_VRAM_GB: Minimum VRAM required (default: 2.0)
VRAM_THRESHOLD: Use CPU if model > X% of VRAM (default: 0.9 = 90%)
AUTO_DEVICE_SWITCHING: Enable/disable automatic switching (default: True)

Device Switching Behavior

The system falls back to CPU if:

Less than 2GB VRAM available
Estimated model size exceeds 90% of available VRAM
GPU runs out of memory during processing

Manual Device Configuration

from core.device_manager import DeviceManager

device_manager = DeviceManager(
    min_vram_gb=2.0,
    vram_threshold=0.9,
    auto_switching=True
)

device = device_manager.get_device()

Troubleshooting

Stable Diffusion Model Loading Errors

Problem: "Loaded object is not a torch.nn.Module" when loading Stable Diffusion models Solution: The app now automatically detects and handles Stable Diffusion models in multiple formats:

Full Pipeline (with model_index.json): Automatically loaded as StableDiffusionPipeline
Component-based (separate UNet/VAE/TextEncoder folders): Loaded as individual components
Raw state_dict files (.bin, .pt): Intelligently wrapped based on component type
Different SD versions: Supports SD 1.5, 2.x, SDXL, and other diffusers models

The loader will:

Detect model architecture from directory structure
Load UNet, VAE, Text Encoder as appropriate components
Handle raw state_dict files by analyzing keys and wrapping them
Fall back to alternative loading methods if primary method fails

Example: If loading a UNet component at path/to/unet/pytorch_model.bin:

The app detects it's a UNet component
Wraps it appropriately for distillation/quantization
Moves it to the correct device (GPU/CPU)

Out of Memory (OOM) Errors

Problem: Model loading fails with CUDA out of memory Solution:

The app will automatically switch to CPU
Or reduce model size by using smaller teacher/student models
Or increase VRAM threshold in settings to force CPU usage earlier

Model Loading Issues

Problem: Model fails to load from HuggingFace or from file Solution:

Ensure you have internet connection (for HuggingFace models)
Verify model name/path is correct
Check HuggingFace authentication if using private models
For Stable Diffusion models, ensure the folder structure is intact:
- Either model_index.json exists (full pipeline)
- Or subfolders like unet/, vae/, text_encoder/ exist (components)
Try loading from a local model path instead of HuggingFace Hub

Distillation Not Starting

Problem: Distillation script fails to execute Solution:

Ensure both teacher and student models are loaded
Check that training data is available in data/train_prompts.txt
Verify CUDA/device availability in logs
For Stable Diffusion models, note that distillation adapts to component type
Check logs folder for detailed error messages

Performance Issues

Problem: Quantization/distillation is slow Solution:

Reduce batch size in configuration
Use INT4 quantization for faster processing
Ensure GPU is available and not occupied by other processes
Use smaller models for testing
For Stable Diffusion UNet models, quantization may take longer due to component size

Output Models

Models are saved in the following structure:

outputs/
├── distilled/
│   └── model_name/
│       ├── config.json
│       ├── pytorch_model.bin
│       └── tokenizer.*
└── quantized/
    └── model_name_quantized/
        ├── config.json
        ├── pytorch_model.bin
        └── tokenizer.*

All saved models are compatible with HuggingFace transformers and can be loaded with:

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("outputs/distilled/model_name")
tokenizer = AutoTokenizer.from_pretrained("outputs/distilled/model_name")

Pinokio Launcher Scripts

QTinker is fully integrated with Pinokio for easy one-click operations:

Available Commands

Install: Automatically sets up Python environment and installs dependencies
Start: Launches the Gradio web interface
Update: Updates QTinker and dependencies to the latest version
Reset: Clears virtual environment and cached files for a fresh start

Model Selection Tools

Sidebar buttons for easy model management:

Select Teacher Model: Pick teacher model for knowledge distillation
Select Student Model: Pick student model to be distilled
Select Quantize Model: Pick model to quantize
Link Models: Create symbolic links for model references

Metadata Persistence

Model selections and settings are saved in pinokio_meta.json:

{
  "teacher_model": "/path/to/teacher",
  "student_model": "/path/to/student",
  "quantize_model": "/path/to/model"
}

API Documentation

Python API

Run Complete Pipeline

from core.logic import run_pipeline

result = run_pipeline(
    model_path="microsoft/phi-2",
    model_type="HuggingFace folder",
    quant_type="INT8 (dynamic)",
    distill_type="LogitKD",
    temperature=3.0,
    log_fn=print
)

Load Model

from app.model_loader import load_model

model, tokenizer = load_model(
    model_path="facebook/opt-350m",
    model_type="HuggingFace folder",
    device="cuda"
)

Quantize Model

from torchao.quantization import quantize_
from torchao.quantization.configs import Int8DynamicConfig

quantize_(model, Int8DynamicConfig())
model.save_pretrained("outputs/quantized/model_name")

Apply Distillation

from app.distillation import LogitKD

strategy = LogitKD(teacher_model, student_model, temperature=3.0)
for batch in dataloader:
    loss = strategy.compute_loss(
        student_model(**batch),
        teacher_model(**batch)
    )
    loss.backward()

REST API (via Gradio)

When running the app, a Gradio interface provides HTTP endpoints:

# Gradio automatically generates REST endpoints
# Example: POST to Gradio endpoint with model parameters
curl -X POST "http://localhost:7860/api/predict" \
  -H "Content-Type: application/json" \
  -d '{"model_path": "phi-2", "quant_type": "INT8"}'

CLI Usage

# Start the application
python app/app.py

# Direct Gradio launch
python app/gradio_ui.py

# Run distillation only
python app/run_distillation.py --teacher-path /path/to/teacher \
                                --student-path /path/to/student

# Download models
python app/download_models.py --model-name "phi-2" --output-dir "./models"

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
app		app
.clinerules		.clinerules
.cursorrules		.cursorrules
.geminiignore		.geminiignore
.gitignore		.gitignore
.windsurfrules		.windsurfrules
AGENTS.md		AGENTS.md
BERT_COMPLETE_SUMMARY.md		BERT_COMPLETE_SUMMARY.md
BERT_IMPLEMENTATION_CHECKLIST.md		BERT_IMPLEMENTATION_CHECKLIST.md
BERT_MODELS.md		BERT_MODELS.md
BERT_MODELS_SUMMARY.md		BERT_MODELS_SUMMARY.md
BERT_QUICKSTART.md		BERT_QUICKSTART.md
BERT_VISUAL_OVERVIEW.md		BERT_VISUAL_OVERVIEW.md
CHANGELOG.md		CHANGELOG.md
CHANGES_SUMMARY.md		CHANGES_SUMMARY.md
CLAUDE.md		CLAUDE.md
COMPLETION_SUMMARY.md		COMPLETION_SUMMARY.md
COMPRESSION_GUIDE.md		COMPRESSION_GUIDE.md
ENVIRONMENT		ENVIRONMENT
EXIT_CHECKLIST.md		EXIT_CHECKLIST.md
GEMINI.md		GEMINI.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
INTEGRATION_GUIDE.md		INTEGRATION_GUIDE.md
PINOKIO_EXAMPLES_GUIDE.md		PINOKIO_EXAMPLES_GUIDE.md
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
QUICK_REFERENCE.md		QUICK_REFERENCE.md
QUICK_REFERENCE_MODELS.md		QUICK_REFERENCE_MODELS.md
QWEN.md		QWEN.md
README.md		README.md
README_BERT.md		README_BERT.md
README_DOCUMENTATION.md		README_DOCUMENTATION.md
STABLE_DIFFUSION_DOCUMENTATION_INDEX.md		STABLE_DIFFUSION_DOCUMENTATION_INDEX.md
STABLE_DIFFUSION_FIX_SUMMARY.md		STABLE_DIFFUSION_FIX_SUMMARY.md
STABLE_DIFFUSION_GUIDE.md		STABLE_DIFFUSION_GUIDE.md
TESTING_GUIDE.md		TESTING_GUIDE.md
distill_quantize.js		distill_quantize.js
icon.png		icon.png
install.js		install.js
link.js		link.js
pinokio.js		pinokio.js
pinokio_meta.json		pinokio_meta.json
pyproject.toml		pyproject.toml
reset.js		reset.js
select_quantize_model.js		select_quantize_model.js
select_student_model.js		select_student_model.js
select_teacher_model.js		select_teacher_model.js
start.js		start.js
torch.js		torch.js
update.js		update.js

manat0912/QTinker

Folders and files

Latest commit

History

Repository files navigation

QTinker - Distill & Quantize with TorchAO

Features

Project Structure

Installation

Automatic Installation (Recommended via Pinokio)

Manual Installation with pip

Manual Installation with uv (recommended for Pinokio)

CUDA Support

Supported Model Types

🤗 HuggingFace Models

🖼️ Stable Diffusion & Diffusers

📦 PyTorch Checkpoints

🔄 Auto-Detection

Usage

Using Pinokio Launcher (Easiest)

Running the Application Directly

Using the Web Interface

Programmatic Usage

Configuration

TorchAO Quantization Configs

Application Settings

Model Registry

Advanced Features

Knowledge Distillation Strategies

1. Logit-based Knowledge Distillation (LogitKD)

2. Patient Knowledge Distillation (PatientKD)

3. Projection Layer Matching

Device Management System

Model Loading Utilities

Model Registry System

Dependencies

GPU/CPU Management

Device Settings

Output Directories

GPU/CPU Management

Automatic Device Selection

Device Switching Behavior

Manual Device Configuration

Troubleshooting

Stable Diffusion Model Loading Errors

Out of Memory (OOM) Errors

Model Loading Issues

Distillation Not Starting

Performance Issues

Output Models

Pinokio Launcher Scripts

Available Commands

Model Selection Tools

Metadata Persistence

API Documentation

Python API

Run Complete Pipeline

Load Model

Quantize Model

Apply Distillation

REST API (via Gradio)

CLI Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages