ComfyUI nodes for adaptive FP8/BF16 mixed precision quantization.
- Automatic model quantization - Quantize any diffusion model to a target size
- Adaptive threshold selection - Automatically detects model "fragility" and adjusts quantization strategy
- SNR-based layer selection - Protects sensitive layers, quantizes robust ones
- Outlier protection - Layers with extreme weights are kept in BF16
- Progress bar - Shows quantization progress in ComfyUI console
- CSV report - Generates detailed report showing which layers were quantized
- Re-quantization safe - Already quantized FP8 layers are preserved (no size increase on re-run)
- Clone or copy this folder to
ComfyUI/custom_nodes/SNR-quant - Restart ComfyUI
Dependencies (usually already installed):
torch>=2.1.0(for FP8 support)safetensors>=0.4.0numpy>=1.24.0
Quantize a model to fit within a target size.
Inputs:
model_name- Select fromComfyUI/models/diffusion_modelstarget_size_GB- Target size in GB (e.g., 22.0)
Output:
- Saved directly to
ComfyUI/output/diffusion_models/quant_{model_name}_{target}GB.safetensors - CSV report:
ComfyUI/output/diffusion_models/quant_{model_name}_{target}GB_report.csv
Example:
model_name: wan2.2_fp16.safetensors
target_size_GB: 22.0
→ output: quant_wan2.2_fp16_22.0GB.safetensors
→ report: quant_wan2.2_fp16_22.0GB_report.csv
Analyze a model's SNR and outlier distribution.
Input:
model_name- Select fromComfyUI/models/diffusion_models
Outputs:
report- Text report with statistics and quantization recommendations
Load model → Analyze layers → Detect fragility → Select layers → Quantize → Save
↓ ↓ ↓
SNR + Outlier fragile/moderate/ FP8 or BF16
robust per layer
| Fragility | SNR Range | SNR Median | Strategy |
|---|---|---|---|
| Fragile | < 2 dB | < 32 dB | Protect more (90th percentile) |
| Moderate | - | - | Balanced (75th percentile) |
| Robust | > 5 dB | > 34 dB | Quantize more (50th percentile) |
- Filter layers by outlier threshold (adaptive percentile)
- Sort by SNR (descending - highest SNR = safest to quantize)
- Select layers until target size is reached
- Baseline: All 2D layers as BF16 (2 bytes/param) + 1D layers as BF16
- Minimum: All 2D layers as FP8 (1 byte/param) + 1D layers as BF16
- If target < minimum: auto-adjust to minimum
When quantizing an already quantized model:
- Existing FP8 layers are preserved (not converted back to BF16)
- Only remaining BF16 layers are considered for further quantization
- File size will not increase on re-run
Quantized models are saved as .safetensors with mixed precision:
- FP8 layers:
torch.float8_e4m3fn - BF16 layers:
torch.bfloat16 - 1D layers (bias, norms): always BF16
The generated report includes:
layer_name- Full layer pathprecision- Final precision (FP8, BF16, or FP8 (existing))params_numel- Number of parametersSNR_dB- Signal-to-Noise ratio (higher = safer to quantize)outlier- Outlier index (lower = safer to quantize)precision_change- Whether layer was quantized or preserved
Apache-2.0