Skip to content

High-performance Swift library for ML image preprocessing & postprocessing. One-line APIs for TFLite, CoreML, PyTorch/ExecuTorch. Supports classification, detection (YOLO), segmentation (DeepLabV3), quantization, augmentation, and visualization.

License

Notifications You must be signed in to change notification settings

manishkumar03/SwiftPixelUtils

Repository files navigation

SwiftPixelUtils

Swift Platforms SPM License

High-performance Swift library for image preprocessing optimized for ML/AI inference pipelines on iOS/macOS

InstallationQuick StartAPI ReferenceFeatures📚 Docs


High-performance Swift library for image preprocessing optimized for ML/AI inference pipelines. Native implementations using Core Image, Accelerate, and Core ML for pixel extraction, tensor conversion, quantization, augmentation, and model-specific preprocessing (YOLO, MobileNet, etc.)

✨ Features

Core Preprocessing

  • 🚀 High Performance: Native implementations using Apple frameworks (Core Image, Accelerate, vImage, Core ML)
  • 🔢 Raw Pixel Data: Extract pixel values as typed arrays (Float, Float16, Int32, UInt8) ready for ML inference
  • 🎨 Multiple Color Formats: RGB, RGBA, BGR, BGRA, Grayscale, HSV, HSL, LAB, YUV, YCbCr
  • 📐 Flexible Resizing: Cover, contain, stretch, and letterbox strategies with automatic transform metadata
  • ✂️ ROI Pipeline: Crop → resize → normalize in a single call via ROI options
  • 🔢 ML-Ready Normalization: ImageNet, TensorFlow, custom presets
  • 📊 Multiple Data Layouts: HWC, CHW, NHWC, NCHW (PyTorch/TensorFlow compatible)
  • 🖼️ Multiple Sources: local file URLs, data, base64, assets, photo library
  • 📱 Orientation Handling: Opt-in UIImage/EXIF orientation normalization to fix silent rotation issues

ML Framework Integration

  • 🤖 Simplified ML APIs: One-line preprocessing (getModelInput) and postprocessing (ClassificationOutput, DetectionOutput, SegmentationOutput, DepthEstimationOutput) for all major frameworks
  • 🤖 Model Presets: Pre-configured settings for YOLO (v8/v9/v10), RT-DETR, MobileNet, EfficientNet, ResNet, ViT, CLIP, SAM/SAM2, DINO, DETR, Mask2Former, UNet, DeepLab, SegFormer, FCN, PSPNet
  • 🎯 Framework Targets: Automatic configuration for PyTorch, TensorFlow, TFLite, CoreML, ONNX Runtime, ExecuTorch, OpenCV
  • 🏷️ Label Database: Built-in labels for COCO, ImageNet, VOC, CIFAR, Places365, ADE20K, Open Images, LVIS, Objects365, Kinetics

ONNX Runtime Integration

  • 🔌 ONNX Helper: Streamlined tensor creation for ONNX Runtime with ONNXHelper.createTensorData()
  • 📊 ONNX Data Types: Support for Float32, Float16, UInt8, Int8, Int32, Int64 tensor types
  • 🎯 ONNX Model Configs: Pre-configured settings for YOLOv8, RT-DETR, ResNet, MobileNetV2, ViT, CLIP
  • 🔍 Output Parsing: Built-in parsers for YOLOv8, YOLOv5, RT-DETR, SSD detection outputs
  • 🧮 Segmentation Output: Parse ONNX segmentation model outputs with argmax

Quantization

  • 🎯 Native Quantization: Float→Int8/UInt8/Int16/INT4 with per-tensor and per-channel support (TFLite/ExecuTorch compatible)
  • 🔢 INT4 Quantization: 4-bit quantization (8× compression) for LLM weights and edge deployment
  • 📊 Per-Channel Quantization: Channel-wise scale/zeroPoint for higher accuracy (CNN, Transformer weights)
  • 🔄 Float16 Conversion: IEEE 754 half-precision ↔ Float32 utilities for CVPixelBuffer processing
  • 🎥 CVPixelBuffer Formats: BGRA/RGBA, NV12, and RGB565 conversion to tensor data

Detection & Segmentation

  • 📦 Bounding Box Utilities: Format conversion (xyxy/xywh/cxcywh), scaling, clipping, IoU, NMS
  • 🖼️ Letterbox Padding: YOLO-style letterbox preprocessing with automatic transform metadata for reverse coordinate mapping
  • 📏 Depth Estimation: Process MiDaS, DPT, ZoeDepth, Depth Anything outputs with scientific colormaps (Viridis, Plasma, Turbo) and custom colormaps

Data Augmentation

  • 🔄 Image Augmentation: Rotation, flip, brightness, contrast, saturation, blur
  • 🎨 Color Jitter: Granular brightness/contrast/saturation/hue control with range support and seeded randomness
  • ✂️ Cutout/Random Erasing: Mask random regions with constant/noise fill for robustness training
  • 🎲 Random Crop with Seed: Reproducible random crops for data augmentation pipelines

Tensor Operations

  • 🧮 Tensor Operations: Channel extraction, patch extraction, permutation, batch concatenation
  • 🔙 Tensor to Image: Convert processed tensors back to images
  • 🔲 Grid/Patch Extraction: Extract image patches in grid patterns for sliding window inference
  • Tensor Validation: Validate tensor shapes, dtypes, and value ranges before inference
  • 📦 Batch Assembly: Combine multiple images into NCHW/NHWC batch tensors
  • 📦 Batch Processing: Process multiple images with concurrency control

Visualization & Analysis

  • 🎨 Drawing/Visualization: Draw boxes, keypoints, masks, and heatmaps for debugging
  • 📈 Image Analysis: Statistics, metadata, validation, blur detection

📱 Example App

A comprehensive iOS example app is included in the Example/ directory, demonstrating all major features:

  • TensorFlow Lite Classification - MobileNetV2 with TopK results
  • ExecuTorch Classification - MobileNetV3 with TopK results
  • Object Detection - YOLOv8 with NMS and bounding box visualization
  • Semantic Segmentation - DeepLabV3 with colored mask overlay
  • Depth Estimation - Depth Anything with colormaps and overlay visualization
  • Pixel Extraction - Model presets (YOLO, RT-DETR, MobileNet, ResNet, ViT, CLIP, SAM2, DeepLab, etc.) and custom options
  • Bounding Box Utilities - Format conversion, IoU calculation, NMS, scaling, clipping
  • Image Augmentation - Rotation, flip, brightness, contrast, saturation, blur
  • Tensor Operations - Channel extraction, permutation, batch assembly
  • Drawing & Visualization - Boxes, labels, masks, and overlays
  • Comprehensive UI Tests - 50+ UI tests covering all features

Example App Screenshot

To run the example app:

cd Example/SwiftPixelUtilsExampleApp
pod install
open SwiftPixelUtilsExampleApp.xcworkspace

To run UI tests, select the SwiftPixelUtilsExampleAppUITests target and press ⌘U.

📦 Installation

Swift Package Manager

Add SwiftPixelUtils to your Package.swift:

dependencies: [
    .package(url: "https://github.com/manishkumar03/SwiftPixelUtils.git", from: "1.0.0")
]

Or add it via Xcode:

  1. File → Add Package Dependencies
  2. Enter the repository URL
  3. Select version/branch

🚀 Quick Start

⚠️ Important: SwiftPixelUtils functions are synchronous and use throws (not async throws). Remote URLs (http, https) are not supported - download images first and use .data(Data) or .file(URL) instead.

Raw Pixel Data Extraction

import SwiftPixelUtils

// From local file
let result = try PixelExtractor.getPixelData(
    source: .file(URL(fileURLWithPath: "/path/to/image.jpg")),
    options: PixelDataOptions()
)

print(result.data) // Float array of pixel values
print(result.width) // Image width
print(result.height) // Image height
print(result.shape) // [height, width, channels]

// From downloaded data (for remote images)
let (data, _) = try await URLSession.shared.data(from: URL(string: "https://example.com/image.jpg")!)
let result2 = try PixelExtractor.getPixelData(
    source: .data(data),
    options: PixelDataOptions()
)

Using Model Presets

import SwiftPixelUtils

// Use pre-configured YOLO settings
let result = try PixelExtractor.getPixelData(
    source: .file(URL(fileURLWithPath: "/path/to/image.jpg")),
    options: ModelPresets.yolov8
)
// Automatically configured: 640x640, letterbox resize, RGB, scale normalization, NCHW layout

// Or MobileNet
let mobileNetResult = try PixelExtractor.getPixelData(
    source: .file(URL(fileURLWithPath: "/path/to/image.jpg")),
    options: ModelPresets.mobilenet
)
// Configured: 224x224, cover resize, RGB, ImageNet normalization, NHWC layout

Available Model Presets

Classification Models

Preset Size Resize Normalization Layout
mobilenet / mobilenet_v2 / mobilenet_v3 224×224 cover ImageNet NHWC
efficientnet 224×224 cover ImageNet NHWC
resnet / resnet50 224×224 cover ImageNet NCHW
vit 224×224 cover ImageNet NCHW
clip 224×224 cover CLIP-specific NCHW
dino 224×224 cover ImageNet NCHW

Detection Models

Preset Size Resize Normalization Layout Notes
yolo / yolov8 640×640 letterbox scale NCHW Standard YOLO
yolov9 640×640 letterbox scale NCHW PGI + GELAN
yolov10 / yolov10_n/s/m/l/x 640×640 letterbox scale NCHW NMS-free
rtdetr / rtdetr_l / rtdetr_x 640×640 letterbox scale NCHW Real-time DETR
detr 800×800 contain ImageNet NCHW Transformer detection

Segmentation Models

Preset Size Resize Normalization Layout Notes
sam 1024×1024 contain ImageNet NCHW Segment Anything
sam2 / sam2_t/s/b_plus/l 1024×1024 contain ImageNet NCHW SAM2 with video
mask2former / mask2former_swin_t/l 512×512 contain ImageNet NCHW Universal segmentation
deeplab / deeplabv3 / deeplabv3_plus 513×513 contain ImageNet NCHW ASPP module
deeplab_769 / deeplab_1025 769/1025 contain ImageNet NCHW Higher resolution
unet / unet_256 / unet_1024 512×512 contain scale NCHW Classic encoder-decoder
segformer / segformer_b0/b5 512×512 contain ImageNet NCHW Transformer + MLP
fcn 512×512 contain ImageNet NCHW Fully convolutional
pspnet 473×473 contain ImageNet NCHW Pyramid pooling

ONNX Runtime Models

Preset Size Resize Normalization Layout Notes
onnx_yolov8 / onnx_yolov8n/s/m/l/x 640×640 letterbox scale NCHW ONNX YOLOv8 detection
onnx_rtdetr 640×640 letterbox ImageNet NCHW ONNX RT-DETR detection
onnx_resnet / onnx_resnet50 224×224 cover ImageNet NCHW ONNX ResNet classification
onnx_mobilenetv2 224×224 cover ImageNet NCHW ONNX MobileNetV2
onnx_vit 224×224 cover ImageNet NCHW ONNX Vision Transformer
onnx_clip 224×224 cover CLIP-specific NCHW ONNX CLIP vision encoder
onnx_quantized_uint8 224×224 cover raw NCHW Quantized UInt8 models
onnx_quantized_int8 224×224 cover raw NCHW Quantized Int8 models
onnx_float16 224×224 cover ImageNet NCHW Float16 optimized models

ExecuTorch Compatibility

All presets with NCHW layout work directly with ExecuTorch models exported from PyTorch. For quantized ExecuTorch models, use the output with Quantizer to convert to Int8:

// Preprocess for ExecuTorch quantized model
let pixels = try PixelExtractor.getPixelData(
    source: .file(imageURL),
    options: PixelDataOptions(
        resize: .fit(width: 224, height: 224),
        normalization: .imagenet,
        dataLayout: .nchw  // PyTorch/ExecuTorch convention
    )
)

// Quantize for Int8 ExecuTorch model
let quantized = try Quantizer.quantize(
    data: pixels.pixelData,
    options: QuantizationOptions(
        mode: .perTensor,
        dtype: .int8,
        scale: [model_scale],
        zeroPoint: [0]  // Symmetric quantization
    )
)
// Pass quantized.int8Data to ExecuTorch tensor

ONNX Runtime Integration

SwiftPixelUtils provides comprehensive helpers for ONNX Runtime integration.

⚠️ Important: ONNX Runtime is not included in the example app due to XNNPACK symbol conflicts with TensorFlow Lite. To use ONNX Runtime features in your own project, you'll need to add the ONNX Runtime dependency separately. See the ONNX Runtime Integration Guide for setup instructions.

// Create tensor data using model config
let tensorInput = try ONNXHelper.createTensorData(
    from: .uiImage(image),
    config: .yolov8  // Pre-configured for YOLOv8 ONNX model
)

// Use with onnxruntime-objc
let inputTensor = try ORTValue(
    tensorData: NSMutableData(data: tensorInput.data),
    elementType: .float,
    shape: tensorInput.shape.map { NSNumber(value: $0) }
)

// Parse detection output
let detections = ONNXHelper.parseYOLOv8Output(
    data: outputData,
    numClasses: 80,
    confidenceThreshold: 0.25,
    nmsThreshold: 0.45
)

// Scale detections back to original image coordinates
let scaledDetections = ONNXHelper.scaleDetections(
    detections,
    modelSize: CGSize(width: 640, height: 640),
    originalSize: image.size
)

ONNX Framework Variants:

Framework Layout Normalization Output Type
.onnx NCHW ImageNet Float32
.onnxRaw NCHW [0,1] scale Float32
.onnxQuantizedUInt8 NCHW raw [0,255] UInt8
.onnxQuantizedInt8 NCHW raw [0,255] Int8
.onnxFloat16 NCHW ImageNet Float16

🤖 Simplified ML APIs

SwiftPixelUtils provides high-level APIs that handle all preprocessing and postprocessing decisions automatically based on your target ML framework.

getModelInput() - One-Line Preprocessing

Instead of manually configuring color format, normalization, layout, and output format, just specify your framework:

// TensorFlow Lite quantized model - one line!
let input = try PixelExtractor.getModelInput(
    source: .uiImage(image),
    framework: .tfliteQuantized,
    width: 224,
    height: 224
)
// input.data is raw Data containing UInt8 values in NHWC layout

// PyTorch model
let input = try PixelExtractor.getModelInput(
    source: .uiImage(image),
    framework: .pytorch,
    width: 224,
    height: 224
)
// input.data contains Float32 values in NCHW layout with ImageNet normalization

Available Frameworks

Framework Layout Normalization Output Type
.pytorch NCHW ImageNet Float32
.pytorchRaw NCHW [0,1] scale Float32
.tensorflow NHWC [-1,1] Float32
.tensorflowImageNet NHWC ImageNet Float32
.tfliteQuantized NHWC raw [0,255] UInt8
.tfliteFloat NHWC [0,1] scale Float32
.coreML NHWC [0,1] scale Float32
.coreMLImageNet NHWC ImageNet Float32
.onnx NCHW ImageNet Float32
.execuTorch NCHW ImageNet Float32
.execuTorchQuantized NCHW raw [0,255] Int8
.openCV HWC/BGR [0,255] UInt8

ClassificationOutput.process() - One-Line Postprocessing

Process classification model output with automatic dequantization, softmax, and label mapping:

// Process TFLite quantized model output in one line
let result = try ClassificationOutput.process(
    outputData: outputTensor.data,
    quantization: .uint8(scale: quantParams.scale, zeroPoint: quantParams.zeroPoint),
    topK: 5,
    labels: .imagenet(hasBackgroundClass: true)
)

// Access results
for prediction in result.predictions {
    print("\(prediction.label): \(String(format: "%.1f%%", prediction.confidence * 100))")
}

// Or get the top prediction directly
if let top = result.topPrediction {
    print("Top: \(top.label) (\(top.confidence))")
}

Quantization Types

  • .none - Float32 model output
  • .uint8(scale:zeroPoint:) - TFLite quantized
  • .int8(scale:zeroPoint:) - ExecuTorch/ONNX quantized

Label Sources

  • .imagenet(hasBackgroundClass:) - ImageNet-1K (1000 classes)
  • .coco - COCO (80 classes)
  • .cifar10 - CIFAR-10 (10 classes)
  • .cifar100 - CIFAR-100 (100 classes)
  • .custom([String]) - Your own label array
  • .none - Returns "Class N" as labels

DetectionOutput.process() - One-Line Detection Postprocessing

Process object detection model output (YOLO, SSD, etc.) with automatic parsing, NMS, and label mapping:

// Process YOLOv5 output in one line
let result = try DetectionOutput.process(
    floatOutput: outputArray,
    format: .yolov5(numClasses: 80),
    confidenceThreshold: 0.25,
    iouThreshold: 0.45,
    maxDetections: 20,
    labels: .coco,
    imageSize: originalImage.size,
    modelInputSize: CGSize(width: 320, height: 320),
    outputCoordinateSpace: .normalized  // TFLite YOLO outputs 0-1 coords
)

// Access detections
for detection in result.detections {
    print("\(detection.label): \(String(format: "%.1f%%", detection.confidence * 100))")
    print("  Box: \(detection.boundingBox)")        // Normalized 0-1
    print("  Pixels: \(detection.pixelBoundingBox!)") // Pixel coordinates
}

Supported Detection Formats

  • .yolov5(numClasses:) - YOLOv5 format: [1, N, 5+classes]
  • .yolov8(numClasses:) - YOLOv8 format: [1, 4+classes, N]
  • .yolov8Transposed(numClasses:) - YOLOv8 already transposed
  • .ssd(numClasses:) - SSD MobileNet format
  • .efficientDet(numClasses:) - EfficientDet format

Output Coordinate Space

Different models output coordinates in different spaces. Use outputCoordinateSpace to handle this:

// TFLite YOLO models typically output normalized (0-1) coordinates
outputCoordinateSpace: .normalized

// Original PyTorch YOLO exports output pixel coordinates (0-640)
outputCoordinateSpace: .pixelSpace  // default

Converting to Drawable Boxes

Use toDrawableBoxes() to convert detections directly to visualization format:

// One-line conversion to drawable boxes
let boxes = result.toDrawableBoxes(imageSize: image.size)

// Draw on image
let annotated = try Drawing.drawBoxes(
    on: .uiImage(image),
    boxes: boxes,
    options: BoxDrawingOptions(lineWidth: 3, drawLabels: true, drawScores: true)
)

The toDrawableBoxes() method:

  • Converts normalized coordinates to pixel coordinates
  • Applies DetectionColorPalette for per-class colors (20 distinct colors)
  • Returns [DrawableBox] ready for Drawing.drawBoxes()

Complete TFLite Classification Example

Here's a complete example using both simplified APIs for image classification:

import SwiftPixelUtils
import TensorFlowLite

func classifyImage(_ image: UIImage) throws -> [ClassificationPrediction] {
    // 1. Preprocess - one line
    let input = try PixelExtractor.getModelInput(
        source: .uiImage(image),
        framework: .tfliteQuantized,
        width: 224,
        height: 224
    )
    
    // 2. Run inference
    let interpreter = try Interpreter(modelPath: modelPath)
    try interpreter.allocateTensors()
    try interpreter.copy(input.data, toInputAt: 0)
    try interpreter.invoke()
    
    // 3. Postprocess - one line
    let output = try interpreter.output(at: 0)
    let quantParams = output.quantizationParameters!
    
    let result = try ClassificationOutput.process(
        outputData: output.data,
        quantization: .uint8(scale: quantParams.scale, zeroPoint: quantParams.zeroPoint),
        topK: 5,
        labels: .imagenet(hasBackgroundClass: true)
    )
    
    return result.predictions
}

Complete TFLite YOLO Detection Example

Here's a complete example for YOLOv5 object detection with visualization:

import SwiftPixelUtils
import TensorFlowLite

func detectObjects(_ image: UIImage) throws -> UIImage? {
    let modelWidth = 320
    let modelHeight = 320
    
    // 1. Preprocess - use .stretch for YOLO (direct coordinate mapping)
    let input = try PixelExtractor.getModelInput(
        source: .uiImage(image),
        framework: .tfliteFloat,
        width: modelWidth,
        height: modelHeight,
        resizeStrategy: .stretch
    )
    
    // 2. Run inference
    let interpreter = try Interpreter(modelPath: yoloModelPath)
    try interpreter.allocateTensors()
    try interpreter.copy(input.data, toInputAt: 0)
    try interpreter.invoke()
    
    // 3. Get output as float array
    let output = try interpreter.output(at: 0)
    let floatOutput = output.data.withUnsafeBytes { Array($0.bindMemory(to: Float.self)) }
    
    // 4. Postprocess - one line with outputCoordinateSpace
    let result = try DetectionOutput.process(
        floatOutput: floatOutput,
        format: .yolov5(numClasses: 80),
        confidenceThreshold: 0.25,
        iouThreshold: 0.45,
        maxDetections: 20,
        labels: .coco,
        imageSize: image.size,
        modelInputSize: CGSize(width: modelWidth, height: modelHeight),
        outputCoordinateSpace: .normalized  // TFLite YOLO outputs 0-1 coords
    )
    
    // 5. Draw boxes - one line conversion to drawable format
    let boxes = result.toDrawableBoxes(imageSize: image.size)
    
    let drawingResult = try Drawing.drawBoxes(
        on: .uiImage(image),
        boxes: boxes,
        options: BoxDrawingOptions(lineWidth: 4, drawLabels: true, drawScores: true)
    )
    
    return UIImage(cgImage: drawingResult.cgImage, scale: image.scale, orientation: image.imageOrientation)
}

SegmentationOutput.process() - One-Line Segmentation Postprocessing

Process semantic segmentation model output (DeepLabV3, etc.) with automatic parsing and visualization:

// Process DeepLabV3 output in one line
let result = try SegmentationOutput.process(
    floatOutput: outputArray,
    format: .logits(height: 257, width: 257, numClasses: 21),
    labels: .voc
)

// Get detected classes with coverage percentages
for item in result.classSummary {
    print("\(item.label): \(String(format: "%.1f%%", item.percentage))")
}

// Access the class mask
let classAtCenter = result.classAt(x: 128, y: 128)
print("Class at center: \(result.labels?[classAtCenter] ?? "unknown")")

Supported Segmentation Formats

  • .logits(height:width:numClasses:) - Raw logits in NHWC format (DeepLabV3)
  • .logitsNCHW(height:width:numClasses:) - Raw logits in NCHW format (PyTorch)
  • .probabilities(height:width:numClasses:) - Softmax probabilities (NHWC)
  • .probabilitiesNCHW(height:width:numClasses:) - Softmax probabilities (NCHW)
  • .argmax(height:width:numClasses:) - Pre-computed class indices

Label Sources for Segmentation

  • .voc - Pascal VOC (21 classes with background)
  • .ade20k - ADE20K (150 classes)
  • .cityscapes - Cityscapes (19 classes)
  • .custom([String]) - Your own label array
  • .none - Returns "class_N" as labels

Visualizing Segmentation Results

Overlay colored segmentation mask on the original image:

// Overlay segmentation on image
let overlay = try Drawing.overlaySegmentation(
    on: .uiImage(image),
    segmentation: result,
    palette: .voc,           // Pascal VOC colors
    alpha: 0.5,              // 50% opacity
    excludeBackground: true  // Don't color background pixels
)

// Or create a standalone colored mask
let coloredMask = result.toColoredCGImage(palette: .voc)

Color Palettes

Built-in color palettes for common datasets:

  • SegmentationColorPalette.voc - Pascal VOC (21 colors)
  • SegmentationColorPalette.ade20k - ADE20K (150 colors)
  • SegmentationColorPalette.cityscapes - Cityscapes (19 colors)
  • SegmentationColorPalette.rainbow(numClasses:) - Generate custom palette

Complete TFLite Segmentation Example

Here's a complete example for DeepLabV3 semantic segmentation:

import SwiftPixelUtils
import TensorFlowLite

func segmentImage(_ image: UIImage) throws -> UIImage? {
    let modelSize = 257
    
    // 1. Preprocess
    let input = try PixelExtractor.getModelInput(
        source: .uiImage(image),
        framework: .tfliteFloat,
        width: modelSize,
        height: modelSize
    )
    
    // 2. Run inference
    let interpreter = try Interpreter(modelPath: deeplabModelPath)
    try interpreter.allocateTensors()
    try interpreter.copy(input.data, toInputAt: 0)
    try interpreter.invoke()
    
    // 3. Get output
    let output = try interpreter.output(at: 0)
    let floatOutput = output.data.withUnsafeBytes { Array($0.bindMemory(to: Float.self)) }
    
    // 4. Postprocess - one line
    let result = try SegmentationOutput.process(
        floatOutput: floatOutput,
        format: .logits(height: modelSize, width: modelSize, numClasses: 21),
        labels: .voc
    )
    
    // 5. Visualize - overlay on original image
    let overlay = try Drawing.overlaySegmentation(
        on: .uiImage(image),
        segmentation: result,
        palette: .voc,
        alpha: 0.5,
        excludeBackground: true
    )
    
    return UIImage(cgImage: overlay.cgImage, scale: image.scale, orientation: image.imageOrientation)
}

📖 API Reference

🔧 Core Functions

getPixelData(source:options:)

Extract pixel data from a single image.

let result = try PixelExtractor.getPixelData(
    source: .file(fileURL),
    options: PixelDataOptions(
        resize: ResizeOptions(width: 224, height: 224, strategy: .cover),
        colorFormat: .rgb,
        normalization: .imagenet,
        dataLayout: .nchw,
        outputFormat: .float32Array,
        normalizeOrientation: true  // Fix UIImage EXIF rotation issues
    )
)

// Access results
result.data           // [Float] - normalized pixel data
result.float16Data    // [UInt16]? - Float16 as bit patterns (when outputFormat is .float16Array)
result.letterboxInfo  // LetterboxInfo? - transform metadata (when using .letterbox resize)

Output Formats

// Float16 output for Core ML / Metal efficiency
let result = try PixelExtractor.getPixelData(
    source: .uiImage(image),
    options: PixelDataOptions(
        resize: ResizeOptions(width: 224, height: 224, strategy: .cover),
        outputFormat: .float16Array  // Efficient for Apple Silicon
    )
)
// result.float16Data contains UInt16 bit patterns
// Convert back: Float16(bitPattern: result.float16Data![i])

Letterbox with Transform Metadata

// Letterbox resize automatically captures transform info
let result = try PixelExtractor.getPixelData(
    source: .uiImage(image),
    options: PixelDataOptions(
        resize: ResizeOptions(width: 640, height: 640, strategy: .letterbox),
        colorFormat: .rgb,
        normalization: .scale,
        dataLayout: .nchw
    )
)

// Use letterboxInfo to reverse-transform detection coordinates
if let info = result.letterboxInfo {
    print("Scale: \(info.scale)")
    print("Offset: \(info.offset)")
    print("Original size: \(info.originalSize)")
    
    // Reverse transform a detection box from model output to original image
    let originalX = (modelX - Float(info.offset.x)) / info.scale
    let originalY = (modelY - Float(info.offset.y)) / info.scale
}

batchGetPixelData(sources:options:concurrency:)

Process multiple images with concurrency control.

// For local files
let results = try PixelExtractor.batchGetPixelData(
    sources: [
        .file(URL(fileURLWithPath: "/path/to/1.jpg")),
        .file(URL(fileURLWithPath: "/path/to/2.jpg")),
        .file(URL(fileURLWithPath: "/path/to/3.jpg"))
    ],
    options: ModelPresets.mobilenet,
    concurrency: 4
)

// For remote images, download first then process
let urls = ["https://example.com/1.jpg", "https://example.com/2.jpg", "https://example.com/3.jpg"]
var sources: [ImageSource] = []
for url in urls {
    let (data, _) = try await URLSession.shared.data(from: URL(string: url)!)
    sources.append(.data(data))
}
let results2 = try PixelExtractor.batchGetPixelData(
    sources: sources,
    options: ModelPresets.mobilenet,
    concurrency: 4
)

🔍 Image Analysis

ImageAnalyzer.getStatistics(source:)

Calculate image statistics for analysis and preprocessing decisions.

let stats = try ImageAnalyzer.getStatistics(source: .file(fileURL))
print(stats.mean) // [r, g, b] mean values (0-1)
print(stats.std) // [r, g, b] standard deviations
print(stats.min) // [r, g, b] minimum values
print(stats.max) // [r, g, b] maximum values
print(stats.histogram) // RGB histograms

ImageAnalyzer.getMetadata(source:)

Get image metadata without loading full pixel data.

let metadata = try ImageAnalyzer.getMetadata(source: .file(fileURL))
print(metadata.width)
print(metadata.height)
print(metadata.channels)
print(metadata.colorSpace)
print(metadata.hasAlpha)

ImageAnalyzer.validate(source:options:)

Validate an image against specified criteria.

let validation = try ImageAnalyzer.validate(
    source: .file(fileURL),
    options: ValidationOptions(
        minWidth: 224,
        minHeight: 224,
        maxWidth: 4096,
        maxHeight: 4096,
        requiredAspectRatio: 1.0,
        aspectRatioTolerance: 0.1
    )
)

print(validation.isValid)   // true if passes all checks
print(validation.issues)    // Array of validation issues

ImageAnalyzer.detectBlur(source:threshold:downsampleSize:)

Detect if an image is blurry using Laplacian variance analysis.

let result = try ImageAnalyzer.detectBlur(
    source: .file(fileURL),
    threshold: 100,
    downsampleSize: 500
)

print(result.isBlurry) // true if blurry
print(result.score) // Laplacian variance score

🎨 Image Augmentation

ImageAugmentor.applyAugmentations(to:options:)

Apply image augmentations for data augmentation pipelines.

let augmented = try ImageAugmentor.applyAugmentations(
    to: .file(fileURL),
    options: AugmentationOptions(
        rotation: 15, // Degrees
        horizontalFlip: true,
        verticalFlip: false,
        brightness: 1.2, // 1.0 = no change
        contrast: 1.1,
        saturation: 0.9,
        blur: BlurOptions(type: .gaussian, radius: 2)
    )
)

ImageAugmentor.colorJitter(source:options:)

Apply color jitter augmentation with granular control.

let result = try ImageAugmentor.colorJitter(
    source: .file(fileURL),
    options: ColorJitterOptions(
        brightness: 0.2,     // Random in [-0.2, +0.2]
        contrast: 0.2,       // Random in [0.8, 1.2]
        saturation: 0.3,     // Random in [0.7, 1.3]
        hue: 0.1,            // Random in [-0.1, +0.1]
        seed: 42             // For reproducibility
    )
)

ImageAugmentor.cutout(source:options:)

Apply cutout (random erasing) augmentation.

let result = try ImageAugmentor.cutout(
    source: .file(fileURL),
    options: CutoutOptions(
        numCutouts: 1,
        minSize: 0.02,
        maxSize: 0.33,
        fillMode: .constant,
        fillValue: [0, 0, 0],
        seed: 42
    )
)

📦 Bounding Box Utilities

BoundingBox.convertFormat(_:from:to:)

Convert between bounding box formats (xyxy, xywh, cxcywh).

let boxes = [[320.0, 240.0, 100.0, 80.0]] // [cx, cy, w, h]
let converted = BoundingBox.convertFormat(
    boxes,
    from: .cxcywh,
    to: .xyxy
)
// [[270, 200, 370, 280]] - [x1, y1, x2, y2]

BoundingBox.scale(_:from:to:format:)

Scale bounding boxes between different image dimensions.

let scaled = BoundingBox.scale(
    [[100, 100, 200, 200]],
    from: CGSize(width: 640, height: 640),
    to: CGSize(width: 1920, height: 1080),
    format: .xyxy
)

BoundingBox.clip(_:imageSize:format:)

Clip bounding boxes to image boundaries.

let clipped = BoundingBox.clip(
    [[-10, 50, 700, 500]],
    imageSize: CGSize(width: 640, height: 480),
    format: .xyxy
)

BoundingBox.calculateIoU(_:_:format:)

Calculate Intersection over Union between two boxes.

let iou = BoundingBox.calculateIoU(
    [100, 100, 200, 200],
    [150, 150, 250, 250],
    format: .xyxy
)

BoundingBox.nonMaxSuppression(detections:iouThreshold:scoreThreshold:maxDetections:)

Apply Non-Maximum Suppression to filter overlapping detections.

let detections = [
    Detection(box: [100, 100, 200, 200], score: 0.9, classIndex: 0),
    Detection(box: [110, 110, 210, 210], score: 0.8, classIndex: 0),
    Detection(box: [300, 300, 400, 400], score: 0.7, classIndex: 1)
]

let filtered = BoundingBox.nonMaxSuppression(
    detections: detections,
    iouThreshold: 0.5,
    scoreThreshold: 0.3,
    maxDetections: 100  // Optional limit
)

🧮 Tensor Operations

TensorOperations.extractChannel(data:width:height:channels:channelIndex:dataLayout:)

Extract a single channel from pixel data.

let redChannel = TensorOperations.extractChannel(
    data: result.data,
    width: result.width,
    height: result.height,
    channels: result.channels,
    channelIndex: 0, // 0=R, 1=G, 2=B
    dataLayout: result.dataLayout
)

TensorOperations.permute(data:shape:order:)

Transpose/permute tensor dimensions.

// Convert HWC to CHW
let permuted = TensorOperations.permute(
    data: result.data,
    shape: result.shape,
    order: [2, 0, 1] // new order: C, H, W
)

TensorOperations.assembleBatch(results:layout:padToSize:)

Assemble multiple results into a batch tensor.

let batch = TensorOperations.assembleBatch(
    results: [result1, result2, result3],
    layout: .nchw,
    padToSize: 4
)
print(batch.shape) // [3, 3, 224, 224] for NCHW

🎯 Quantization

Quantizer.quantize(data:options:)

Quantize float data to int8/uint8/int16/int4 format.

// Per-tensor quantization (standard)
let quantized = try Quantizer.quantize(
    data: result.data,
    options: QuantizationOptions(
        mode: .perTensor,
        dtype: .uint8,
        scale: [0.0078125],
        zeroPoint: [128]
    )
)

Per-Channel Quantization

Quantize with per-channel scale/zeroPoint for higher accuracy (ideal for CNN and Transformer weights):

// Per-channel quantization with calibration
let (scales, zeroPoints) = Quantizer.calibratePerChannel(
    data: weightsData,
    numChannels: 64,
    dtype: .int8,
    layout: .chw  // or .hwc
)

let quantized = try Quantizer.quantize(
    data: weightsData,
    options: QuantizationOptions(
        mode: .perChannel(axis: 0),  // Channel axis
        dtype: .int8,
        scale: scales,
        zeroPoint: zeroPoints
    )
)
// Result: int8Data with per-channel parameters for TFLite/ExecuTorch

INT4 Quantization (LLM/Edge Deployment)

4-bit quantization for 8× compression, ideal for LLM weights and edge devices:

// INT4 quantization (2 values packed per byte)
let int4Result = try Quantizer.quantize(
    data: llmWeights,
    options: QuantizationOptions(
        mode: .perTensor,
        dtype: .int4,  // or .uint4
        scale: [scale],
        zeroPoint: [zeroPoint]
    )
)

// Access packed data (50% size of INT8)
let packedBytes = int4Result.packedInt4Data!  // [UInt8] - 2 values per byte
let originalCount = int4Result.originalCount!  // Original element count
let compression = int4Result.compressionRatio  // 8.0 vs Float32

// Dequantize back to float
let restored = try Quantizer.dequantize(
    packedInt4Data: packedBytes,
    originalCount: originalCount,
    scale: [scale],
    zeroPoint: [zeroPoint],
    dtype: .int4
)

Quantization Types Comparison

Type Range Compression Use Case
uint8 [0, 255] TFLite, general inference
int8 [-128, 127] ExecuTorch, symmetric
int16 [-32768, 32767] High precision
int4 [-8, 7] LLM weights, edge
uint4 [0, 15] Activation quantization

Quantizer.dequantize(uint8Data:scale:zeroPoint:) / dequantize(int8Data:...)

Convert quantized data back to float.

let dequantized = try Quantizer.dequantize(
    uint8Data: quantizedArray,
    scale: [0.0078125],
    zeroPoint: [128]
)

🖼️ Letterbox Padding

Letterbox.apply(to:options:)

Apply letterbox padding to an image.

let options = LetterboxOptions(
    targetWidth: 640,
    targetHeight: 640,
    fillColor: (114, 114, 114),  // YOLO gray
    scaleUp: true,
    center: true
)

let result = try Letterbox.apply(
    to: .file(fileURL),
    options: options
)

print(result.cgImage)      // Letterboxed CGImage
print(result.scale)        // Scale factor applied
print(result.offsetX)      // X padding offset
print(result.offsetY)      // Y padding offset

Letterbox.reverseTransformBoxes(boxes:scale:offsetX:offsetY:)

Transform detection boxes back to original image coordinates.

let originalBoxes = Letterbox.reverseTransformBoxes(
    boxes: detectedBoxes,
    scale: letterboxResult.scale,
    offsetX: letterboxResult.offsetX,
    offsetY: letterboxResult.offsetY
)

🎨 Drawing & Visualization

Drawing.drawBoxes(on:boxes:options:)

Draw bounding boxes with labels on an image.

let result = try Drawing.drawBoxes(
    on: .uiImage(image),
    boxes: [
        DrawableBox(box: [100, 100, 200, 200], label: "person", score: 0.95, color: (255, 0, 0, 255)),
        DrawableBox(box: [300, 150, 400, 350], label: "dog", score: 0.87, color: (0, 255, 0, 255))
    ],
    options: BoxDrawingOptions(
        lineWidth: 3,
        fontSize: 14,
        drawLabels: true,
        drawScores: true
    )
)

let annotatedImage = result.cgImage

🏷️ Label Database

Built-in label databases for common ML classification and detection models.

LabelDatabase.getLabel(_:dataset:)

Get a label by its class index.

let label = LabelDatabase.getLabel(0, dataset: .coco) // Returns "person"

LabelDatabase.getTopLabels(scores:dataset:k:minConfidence:)

Get top-K labels from prediction scores.

let topLabels = LabelDatabase.getTopLabels(
    scores: modelOutput,
    dataset: .coco,
    k: 5,
    minConfidence: 0.1
)

Available Datasets

Dataset Classes Description
coco 80 COCO 2017 object detection labels
coco91 91 COCO original labels with background
imagenet 1000 ImageNet ILSVRC 2012 classification
imagenet21k 21841 ImageNet-21K full classification
voc 21 PASCAL VOC with background
cifar10 10 CIFAR-10 classification
cifar100 100 CIFAR-100 classification
places365 365 Places365 scene recognition
ade20k 150 ADE20K semantic segmentation

📝 Type Reference

Image Source Types

Note: Remote URLs (http, https, ftp) are not supported. Download images first and use .data(Data) instead.

public enum ImageSource {
    case file(URL)                   // Local file path only
    case data(Data)                  // Raw image data (use for downloaded images)
    case base64(String)              // Base64 encoded image
    case cgImage(CGImage)            // CGImage instance
    case uiImage(UIImage)            // UIImage (iOS)
    case nsImage(NSImage)            // NSImage (macOS)
}

Color Formats

public enum ColorFormat {
    case rgb        // 3 channels
    case rgba       // 4 channels
    case bgr        // 3 channels (OpenCV style)
    case bgra       // 4 channels
    case grayscale  // 1 channel
    case hsv        // 3 channels (Hue, Saturation, Value)
    case hsl        // 3 channels (Hue, Saturation, Lightness)
    case lab        // 3 channels (CIE LAB)
    case yuv        // 3 channels
    case ycbcr      // 3 channels
}

Resize Strategies

public enum ResizeStrategy {
    case cover      // Fill target, crop excess (default)
    case contain    // Fit within target, padding
    case stretch    // Stretch to fill (may distort)
    case letterbox  // Fit within target, letterbox padding (YOLO-style)
}

Normalization

public enum NormalizationPreset {
    case scale      // [0, 1] range
    case imagenet   // ImageNet mean/std
    case tensorflow // [-1, 1] range
    case raw        // No normalization (0-255)
    case custom(mean: [Float], std: [Float]) // Custom mean/std
}

Data Layouts

public enum DataLayout {
    case hwc   // Height × Width × Channels (default)
    case chw   // Channels × Height × Width (PyTorch)
    case nhwc  // Batch × Height × Width × Channels (TensorFlow/TFLite)
    case nchw  // Batch × Channels × Height × Width (PyTorch/ExecuTorch)
}

Output Formats

public enum OutputFormat {
    case array        // Default float array (result.data)
    case float32Array // Float array (result.data) - same as array
    case uint8Array   // UInt8 array (result.uint8Data) - for quantized models
    case int32Array   // Int32 array (result.int32Data) - for integer models
}

Usage:

  • result.data - Always populated with [Float] values
  • result.uint8Data - Populated when outputFormat: .uint8Array or normalization: .raw
  • result.int32Data - Populated when outputFormat: .int32Array

⚠️ Error Handling

All functions throw typed errors:

public enum PixelUtilsError: Error {
    case invalidSource(String)
    case loadFailed(String)
    case invalidROI(String)
    case processingFailed(String)
    case invalidOptions(String)
    case invalidChannel(String)
    case invalidPatch(String)
    case dimensionMismatch(String)
    case emptyBatch(String)
    case unknown(String)
}

⚡ Performance Tips

Tip Description
🤖 Simplified APIs Use getModelInput(), ClassificationOutput.process(), DetectionOutput.process(), and SegmentationOutput.process() for the easiest integration
🎯 Resize Strategies Use letterbox for YOLO, cover for classification models
📦 Batch Processing Process multiple images concurrently for better performance
⚙️ Model Presets Pre-configured settings are optimized for each model
🔄 Data Layout Choose the right dataLayout upfront to avoid conversions
🚀 Accelerate Framework Leverage Apple's Accelerate for optimal performance
🤖 ExecuTorch Use NCHW layout + Int8 quantization for PyTorch-exported models

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

MIT License - see LICENSE file for details.


ℹ️ AI Transparency

This project was built with AI assistance.

About

High-performance Swift library for ML image preprocessing & postprocessing. One-line APIs for TFLite, CoreML, PyTorch/ExecuTorch. Supports classification, detection (YOLO), segmentation (DeepLabV3), quantization, augmentation, and visualization.

Resources

License

Stars

Watchers

Forks

Sponsor this project

Packages

No packages published

Languages