High-performance Swift library for image preprocessing optimized for ML/AI inference pipelines on iOS/macOS
Installation • Quick Start • API Reference • Features • 📚 Docs
High-performance Swift library for image preprocessing optimized for ML/AI inference pipelines. Native implementations using Core Image, Accelerate, and Core ML for pixel extraction, tensor conversion, quantization, augmentation, and model-specific preprocessing (YOLO, MobileNet, etc.)
- 🚀 High Performance: Native implementations using Apple frameworks (Core Image, Accelerate, vImage, Core ML)
- 🔢 Raw Pixel Data: Extract pixel values as typed arrays (Float, Float16, Int32, UInt8) ready for ML inference
- 🎨 Multiple Color Formats: RGB, RGBA, BGR, BGRA, Grayscale, HSV, HSL, LAB, YUV, YCbCr
- 📐 Flexible Resizing: Cover, contain, stretch, and letterbox strategies with automatic transform metadata
- ✂️ ROI Pipeline: Crop → resize → normalize in a single call via ROI options
- 🔢 ML-Ready Normalization: ImageNet, TensorFlow, custom presets
- 📊 Multiple Data Layouts: HWC, CHW, NHWC, NCHW (PyTorch/TensorFlow compatible)
- 🖼️ Multiple Sources: local file URLs, data, base64, assets, photo library
- 📱 Orientation Handling: Opt-in UIImage/EXIF orientation normalization to fix silent rotation issues
- 🤖 Simplified ML APIs: One-line preprocessing (
getModelInput) and postprocessing (ClassificationOutput,DetectionOutput,SegmentationOutput,DepthEstimationOutput) for all major frameworks - 🤖 Model Presets: Pre-configured settings for YOLO (v8/v9/v10), RT-DETR, MobileNet, EfficientNet, ResNet, ViT, CLIP, SAM/SAM2, DINO, DETR, Mask2Former, UNet, DeepLab, SegFormer, FCN, PSPNet
- 🎯 Framework Targets: Automatic configuration for PyTorch, TensorFlow, TFLite, CoreML, ONNX Runtime, ExecuTorch, OpenCV
- 🏷️ Label Database: Built-in labels for COCO, ImageNet, VOC, CIFAR, Places365, ADE20K, Open Images, LVIS, Objects365, Kinetics
- 🔌 ONNX Helper: Streamlined tensor creation for ONNX Runtime with
ONNXHelper.createTensorData() - 📊 ONNX Data Types: Support for Float32, Float16, UInt8, Int8, Int32, Int64 tensor types
- 🎯 ONNX Model Configs: Pre-configured settings for YOLOv8, RT-DETR, ResNet, MobileNetV2, ViT, CLIP
- 🔍 Output Parsing: Built-in parsers for YOLOv8, YOLOv5, RT-DETR, SSD detection outputs
- 🧮 Segmentation Output: Parse ONNX segmentation model outputs with argmax
- 🎯 Native Quantization: Float→Int8/UInt8/Int16/INT4 with per-tensor and per-channel support (TFLite/ExecuTorch compatible)
- 🔢 INT4 Quantization: 4-bit quantization (8× compression) for LLM weights and edge deployment
- 📊 Per-Channel Quantization: Channel-wise scale/zeroPoint for higher accuracy (CNN, Transformer weights)
- 🔄 Float16 Conversion: IEEE 754 half-precision ↔ Float32 utilities for CVPixelBuffer processing
- 🎥 CVPixelBuffer Formats: BGRA/RGBA, NV12, and RGB565 conversion to tensor data
- 📦 Bounding Box Utilities: Format conversion (xyxy/xywh/cxcywh), scaling, clipping, IoU, NMS
- 🖼️ Letterbox Padding: YOLO-style letterbox preprocessing with automatic transform metadata for reverse coordinate mapping
- 📏 Depth Estimation: Process MiDaS, DPT, ZoeDepth, Depth Anything outputs with scientific colormaps (Viridis, Plasma, Turbo) and custom colormaps
- 🔄 Image Augmentation: Rotation, flip, brightness, contrast, saturation, blur
- 🎨 Color Jitter: Granular brightness/contrast/saturation/hue control with range support and seeded randomness
- ✂️ Cutout/Random Erasing: Mask random regions with constant/noise fill for robustness training
- 🎲 Random Crop with Seed: Reproducible random crops for data augmentation pipelines
- 🧮 Tensor Operations: Channel extraction, patch extraction, permutation, batch concatenation
- 🔙 Tensor to Image: Convert processed tensors back to images
- 🔲 Grid/Patch Extraction: Extract image patches in grid patterns for sliding window inference
- ✅ Tensor Validation: Validate tensor shapes, dtypes, and value ranges before inference
- 📦 Batch Assembly: Combine multiple images into NCHW/NHWC batch tensors
- 📦 Batch Processing: Process multiple images with concurrency control
- 🎨 Drawing/Visualization: Draw boxes, keypoints, masks, and heatmaps for debugging
- 📈 Image Analysis: Statistics, metadata, validation, blur detection
A comprehensive iOS example app is included in the Example/ directory, demonstrating all major features:
- TensorFlow Lite Classification - MobileNetV2 with TopK results
- ExecuTorch Classification - MobileNetV3 with TopK results
- Object Detection - YOLOv8 with NMS and bounding box visualization
- Semantic Segmentation - DeepLabV3 with colored mask overlay
- Depth Estimation - Depth Anything with colormaps and overlay visualization
- Pixel Extraction - Model presets (YOLO, RT-DETR, MobileNet, ResNet, ViT, CLIP, SAM2, DeepLab, etc.) and custom options
- Bounding Box Utilities - Format conversion, IoU calculation, NMS, scaling, clipping
- Image Augmentation - Rotation, flip, brightness, contrast, saturation, blur
- Tensor Operations - Channel extraction, permutation, batch assembly
- Drawing & Visualization - Boxes, labels, masks, and overlays
- Comprehensive UI Tests - 50+ UI tests covering all features
To run the example app:
cd Example/SwiftPixelUtilsExampleApp
pod install
open SwiftPixelUtilsExampleApp.xcworkspaceTo run UI tests, select the SwiftPixelUtilsExampleAppUITests target and press ⌘U.
Add SwiftPixelUtils to your Package.swift:
dependencies: [
.package(url: "https://github.com/manishkumar03/SwiftPixelUtils.git", from: "1.0.0")
]Or add it via Xcode:
- File → Add Package Dependencies
- Enter the repository URL
- Select version/branch
⚠️ Important: SwiftPixelUtils functions are synchronous and usethrows(notasync throws). Remote URLs (http, https) are not supported - download images first and use.data(Data)or.file(URL)instead.
import SwiftPixelUtils
// From local file
let result = try PixelExtractor.getPixelData(
source: .file(URL(fileURLWithPath: "/path/to/image.jpg")),
options: PixelDataOptions()
)
print(result.data) // Float array of pixel values
print(result.width) // Image width
print(result.height) // Image height
print(result.shape) // [height, width, channels]
// From downloaded data (for remote images)
let (data, _) = try await URLSession.shared.data(from: URL(string: "https://example.com/image.jpg")!)
let result2 = try PixelExtractor.getPixelData(
source: .data(data),
options: PixelDataOptions()
)import SwiftPixelUtils
// Use pre-configured YOLO settings
let result = try PixelExtractor.getPixelData(
source: .file(URL(fileURLWithPath: "/path/to/image.jpg")),
options: ModelPresets.yolov8
)
// Automatically configured: 640x640, letterbox resize, RGB, scale normalization, NCHW layout
// Or MobileNet
let mobileNetResult = try PixelExtractor.getPixelData(
source: .file(URL(fileURLWithPath: "/path/to/image.jpg")),
options: ModelPresets.mobilenet
)
// Configured: 224x224, cover resize, RGB, ImageNet normalization, NHWC layoutClassification Models
| Preset | Size | Resize | Normalization | Layout |
|---|---|---|---|---|
mobilenet / mobilenet_v2 / mobilenet_v3 |
224×224 | cover | ImageNet | NHWC |
efficientnet |
224×224 | cover | ImageNet | NHWC |
resnet / resnet50 |
224×224 | cover | ImageNet | NCHW |
vit |
224×224 | cover | ImageNet | NCHW |
clip |
224×224 | cover | CLIP-specific | NCHW |
dino |
224×224 | cover | ImageNet | NCHW |
Detection Models
| Preset | Size | Resize | Normalization | Layout | Notes |
|---|---|---|---|---|---|
yolo / yolov8 |
640×640 | letterbox | scale | NCHW | Standard YOLO |
yolov9 |
640×640 | letterbox | scale | NCHW | PGI + GELAN |
yolov10 / yolov10_n/s/m/l/x |
640×640 | letterbox | scale | NCHW | NMS-free |
rtdetr / rtdetr_l / rtdetr_x |
640×640 | letterbox | scale | NCHW | Real-time DETR |
detr |
800×800 | contain | ImageNet | NCHW | Transformer detection |
Segmentation Models
| Preset | Size | Resize | Normalization | Layout | Notes |
|---|---|---|---|---|---|
sam |
1024×1024 | contain | ImageNet | NCHW | Segment Anything |
sam2 / sam2_t/s/b_plus/l |
1024×1024 | contain | ImageNet | NCHW | SAM2 with video |
mask2former / mask2former_swin_t/l |
512×512 | contain | ImageNet | NCHW | Universal segmentation |
deeplab / deeplabv3 / deeplabv3_plus |
513×513 | contain | ImageNet | NCHW | ASPP module |
deeplab_769 / deeplab_1025 |
769/1025 | contain | ImageNet | NCHW | Higher resolution |
unet / unet_256 / unet_1024 |
512×512 | contain | scale | NCHW | Classic encoder-decoder |
segformer / segformer_b0/b5 |
512×512 | contain | ImageNet | NCHW | Transformer + MLP |
fcn |
512×512 | contain | ImageNet | NCHW | Fully convolutional |
pspnet |
473×473 | contain | ImageNet | NCHW | Pyramid pooling |
ONNX Runtime Models
| Preset | Size | Resize | Normalization | Layout | Notes |
|---|---|---|---|---|---|
onnx_yolov8 / onnx_yolov8n/s/m/l/x |
640×640 | letterbox | scale | NCHW | ONNX YOLOv8 detection |
onnx_rtdetr |
640×640 | letterbox | ImageNet | NCHW | ONNX RT-DETR detection |
onnx_resnet / onnx_resnet50 |
224×224 | cover | ImageNet | NCHW | ONNX ResNet classification |
onnx_mobilenetv2 |
224×224 | cover | ImageNet | NCHW | ONNX MobileNetV2 |
onnx_vit |
224×224 | cover | ImageNet | NCHW | ONNX Vision Transformer |
onnx_clip |
224×224 | cover | CLIP-specific | NCHW | ONNX CLIP vision encoder |
onnx_quantized_uint8 |
224×224 | cover | raw | NCHW | Quantized UInt8 models |
onnx_quantized_int8 |
224×224 | cover | raw | NCHW | Quantized Int8 models |
onnx_float16 |
224×224 | cover | ImageNet | NCHW | Float16 optimized models |
All presets with NCHW layout work directly with ExecuTorch models exported from PyTorch. For quantized ExecuTorch models, use the output with Quantizer to convert to Int8:
// Preprocess for ExecuTorch quantized model
let pixels = try PixelExtractor.getPixelData(
source: .file(imageURL),
options: PixelDataOptions(
resize: .fit(width: 224, height: 224),
normalization: .imagenet,
dataLayout: .nchw // PyTorch/ExecuTorch convention
)
)
// Quantize for Int8 ExecuTorch model
let quantized = try Quantizer.quantize(
data: pixels.pixelData,
options: QuantizationOptions(
mode: .perTensor,
dtype: .int8,
scale: [model_scale],
zeroPoint: [0] // Symmetric quantization
)
)
// Pass quantized.int8Data to ExecuTorch tensorSwiftPixelUtils provides comprehensive helpers for ONNX Runtime integration.
⚠️ Important: ONNX Runtime is not included in the example app due to XNNPACK symbol conflicts with TensorFlow Lite. To use ONNX Runtime features in your own project, you'll need to add the ONNX Runtime dependency separately. See the ONNX Runtime Integration Guide for setup instructions.
// Create tensor data using model config
let tensorInput = try ONNXHelper.createTensorData(
from: .uiImage(image),
config: .yolov8 // Pre-configured for YOLOv8 ONNX model
)
// Use with onnxruntime-objc
let inputTensor = try ORTValue(
tensorData: NSMutableData(data: tensorInput.data),
elementType: .float,
shape: tensorInput.shape.map { NSNumber(value: $0) }
)
// Parse detection output
let detections = ONNXHelper.parseYOLOv8Output(
data: outputData,
numClasses: 80,
confidenceThreshold: 0.25,
nmsThreshold: 0.45
)
// Scale detections back to original image coordinates
let scaledDetections = ONNXHelper.scaleDetections(
detections,
modelSize: CGSize(width: 640, height: 640),
originalSize: image.size
)ONNX Framework Variants:
| Framework | Layout | Normalization | Output Type |
|---|---|---|---|
.onnx |
NCHW | ImageNet | Float32 |
.onnxRaw |
NCHW | [0,1] scale | Float32 |
.onnxQuantizedUInt8 |
NCHW | raw [0,255] | UInt8 |
.onnxQuantizedInt8 |
NCHW | raw [0,255] | Int8 |
.onnxFloat16 |
NCHW | ImageNet | Float16 |
SwiftPixelUtils provides high-level APIs that handle all preprocessing and postprocessing decisions automatically based on your target ML framework.
Instead of manually configuring color format, normalization, layout, and output format, just specify your framework:
// TensorFlow Lite quantized model - one line!
let input = try PixelExtractor.getModelInput(
source: .uiImage(image),
framework: .tfliteQuantized,
width: 224,
height: 224
)
// input.data is raw Data containing UInt8 values in NHWC layout
// PyTorch model
let input = try PixelExtractor.getModelInput(
source: .uiImage(image),
framework: .pytorch,
width: 224,
height: 224
)
// input.data contains Float32 values in NCHW layout with ImageNet normalization| Framework | Layout | Normalization | Output Type |
|---|---|---|---|
.pytorch |
NCHW | ImageNet | Float32 |
.pytorchRaw |
NCHW | [0,1] scale | Float32 |
.tensorflow |
NHWC | [-1,1] | Float32 |
.tensorflowImageNet |
NHWC | ImageNet | Float32 |
.tfliteQuantized |
NHWC | raw [0,255] | UInt8 |
.tfliteFloat |
NHWC | [0,1] scale | Float32 |
.coreML |
NHWC | [0,1] scale | Float32 |
.coreMLImageNet |
NHWC | ImageNet | Float32 |
.onnx |
NCHW | ImageNet | Float32 |
.execuTorch |
NCHW | ImageNet | Float32 |
.execuTorchQuantized |
NCHW | raw [0,255] | Int8 |
.openCV |
HWC/BGR | [0,255] | UInt8 |
Process classification model output with automatic dequantization, softmax, and label mapping:
// Process TFLite quantized model output in one line
let result = try ClassificationOutput.process(
outputData: outputTensor.data,
quantization: .uint8(scale: quantParams.scale, zeroPoint: quantParams.zeroPoint),
topK: 5,
labels: .imagenet(hasBackgroundClass: true)
)
// Access results
for prediction in result.predictions {
print("\(prediction.label): \(String(format: "%.1f%%", prediction.confidence * 100))")
}
// Or get the top prediction directly
if let top = result.topPrediction {
print("Top: \(top.label) (\(top.confidence))")
}.none- Float32 model output.uint8(scale:zeroPoint:)- TFLite quantized.int8(scale:zeroPoint:)- ExecuTorch/ONNX quantized
.imagenet(hasBackgroundClass:)- ImageNet-1K (1000 classes).coco- COCO (80 classes).cifar10- CIFAR-10 (10 classes).cifar100- CIFAR-100 (100 classes).custom([String])- Your own label array.none- Returns "Class N" as labels
Process object detection model output (YOLO, SSD, etc.) with automatic parsing, NMS, and label mapping:
// Process YOLOv5 output in one line
let result = try DetectionOutput.process(
floatOutput: outputArray,
format: .yolov5(numClasses: 80),
confidenceThreshold: 0.25,
iouThreshold: 0.45,
maxDetections: 20,
labels: .coco,
imageSize: originalImage.size,
modelInputSize: CGSize(width: 320, height: 320),
outputCoordinateSpace: .normalized // TFLite YOLO outputs 0-1 coords
)
// Access detections
for detection in result.detections {
print("\(detection.label): \(String(format: "%.1f%%", detection.confidence * 100))")
print(" Box: \(detection.boundingBox)") // Normalized 0-1
print(" Pixels: \(detection.pixelBoundingBox!)") // Pixel coordinates
}.yolov5(numClasses:)- YOLOv5 format: [1, N, 5+classes].yolov8(numClasses:)- YOLOv8 format: [1, 4+classes, N].yolov8Transposed(numClasses:)- YOLOv8 already transposed.ssd(numClasses:)- SSD MobileNet format.efficientDet(numClasses:)- EfficientDet format
Different models output coordinates in different spaces. Use outputCoordinateSpace to handle this:
// TFLite YOLO models typically output normalized (0-1) coordinates
outputCoordinateSpace: .normalized
// Original PyTorch YOLO exports output pixel coordinates (0-640)
outputCoordinateSpace: .pixelSpace // defaultUse toDrawableBoxes() to convert detections directly to visualization format:
// One-line conversion to drawable boxes
let boxes = result.toDrawableBoxes(imageSize: image.size)
// Draw on image
let annotated = try Drawing.drawBoxes(
on: .uiImage(image),
boxes: boxes,
options: BoxDrawingOptions(lineWidth: 3, drawLabels: true, drawScores: true)
)The toDrawableBoxes() method:
- Converts normalized coordinates to pixel coordinates
- Applies
DetectionColorPalettefor per-class colors (20 distinct colors) - Returns
[DrawableBox]ready forDrawing.drawBoxes()
Here's a complete example using both simplified APIs for image classification:
import SwiftPixelUtils
import TensorFlowLite
func classifyImage(_ image: UIImage) throws -> [ClassificationPrediction] {
// 1. Preprocess - one line
let input = try PixelExtractor.getModelInput(
source: .uiImage(image),
framework: .tfliteQuantized,
width: 224,
height: 224
)
// 2. Run inference
let interpreter = try Interpreter(modelPath: modelPath)
try interpreter.allocateTensors()
try interpreter.copy(input.data, toInputAt: 0)
try interpreter.invoke()
// 3. Postprocess - one line
let output = try interpreter.output(at: 0)
let quantParams = output.quantizationParameters!
let result = try ClassificationOutput.process(
outputData: output.data,
quantization: .uint8(scale: quantParams.scale, zeroPoint: quantParams.zeroPoint),
topK: 5,
labels: .imagenet(hasBackgroundClass: true)
)
return result.predictions
}Here's a complete example for YOLOv5 object detection with visualization:
import SwiftPixelUtils
import TensorFlowLite
func detectObjects(_ image: UIImage) throws -> UIImage? {
let modelWidth = 320
let modelHeight = 320
// 1. Preprocess - use .stretch for YOLO (direct coordinate mapping)
let input = try PixelExtractor.getModelInput(
source: .uiImage(image),
framework: .tfliteFloat,
width: modelWidth,
height: modelHeight,
resizeStrategy: .stretch
)
// 2. Run inference
let interpreter = try Interpreter(modelPath: yoloModelPath)
try interpreter.allocateTensors()
try interpreter.copy(input.data, toInputAt: 0)
try interpreter.invoke()
// 3. Get output as float array
let output = try interpreter.output(at: 0)
let floatOutput = output.data.withUnsafeBytes { Array($0.bindMemory(to: Float.self)) }
// 4. Postprocess - one line with outputCoordinateSpace
let result = try DetectionOutput.process(
floatOutput: floatOutput,
format: .yolov5(numClasses: 80),
confidenceThreshold: 0.25,
iouThreshold: 0.45,
maxDetections: 20,
labels: .coco,
imageSize: image.size,
modelInputSize: CGSize(width: modelWidth, height: modelHeight),
outputCoordinateSpace: .normalized // TFLite YOLO outputs 0-1 coords
)
// 5. Draw boxes - one line conversion to drawable format
let boxes = result.toDrawableBoxes(imageSize: image.size)
let drawingResult = try Drawing.drawBoxes(
on: .uiImage(image),
boxes: boxes,
options: BoxDrawingOptions(lineWidth: 4, drawLabels: true, drawScores: true)
)
return UIImage(cgImage: drawingResult.cgImage, scale: image.scale, orientation: image.imageOrientation)
}Process semantic segmentation model output (DeepLabV3, etc.) with automatic parsing and visualization:
// Process DeepLabV3 output in one line
let result = try SegmentationOutput.process(
floatOutput: outputArray,
format: .logits(height: 257, width: 257, numClasses: 21),
labels: .voc
)
// Get detected classes with coverage percentages
for item in result.classSummary {
print("\(item.label): \(String(format: "%.1f%%", item.percentage))")
}
// Access the class mask
let classAtCenter = result.classAt(x: 128, y: 128)
print("Class at center: \(result.labels?[classAtCenter] ?? "unknown")").logits(height:width:numClasses:)- Raw logits in NHWC format (DeepLabV3).logitsNCHW(height:width:numClasses:)- Raw logits in NCHW format (PyTorch).probabilities(height:width:numClasses:)- Softmax probabilities (NHWC).probabilitiesNCHW(height:width:numClasses:)- Softmax probabilities (NCHW).argmax(height:width:numClasses:)- Pre-computed class indices
.voc- Pascal VOC (21 classes with background).ade20k- ADE20K (150 classes).cityscapes- Cityscapes (19 classes).custom([String])- Your own label array.none- Returns "class_N" as labels
Overlay colored segmentation mask on the original image:
// Overlay segmentation on image
let overlay = try Drawing.overlaySegmentation(
on: .uiImage(image),
segmentation: result,
palette: .voc, // Pascal VOC colors
alpha: 0.5, // 50% opacity
excludeBackground: true // Don't color background pixels
)
// Or create a standalone colored mask
let coloredMask = result.toColoredCGImage(palette: .voc)Built-in color palettes for common datasets:
SegmentationColorPalette.voc- Pascal VOC (21 colors)SegmentationColorPalette.ade20k- ADE20K (150 colors)SegmentationColorPalette.cityscapes- Cityscapes (19 colors)SegmentationColorPalette.rainbow(numClasses:)- Generate custom palette
Here's a complete example for DeepLabV3 semantic segmentation:
import SwiftPixelUtils
import TensorFlowLite
func segmentImage(_ image: UIImage) throws -> UIImage? {
let modelSize = 257
// 1. Preprocess
let input = try PixelExtractor.getModelInput(
source: .uiImage(image),
framework: .tfliteFloat,
width: modelSize,
height: modelSize
)
// 2. Run inference
let interpreter = try Interpreter(modelPath: deeplabModelPath)
try interpreter.allocateTensors()
try interpreter.copy(input.data, toInputAt: 0)
try interpreter.invoke()
// 3. Get output
let output = try interpreter.output(at: 0)
let floatOutput = output.data.withUnsafeBytes { Array($0.bindMemory(to: Float.self)) }
// 4. Postprocess - one line
let result = try SegmentationOutput.process(
floatOutput: floatOutput,
format: .logits(height: modelSize, width: modelSize, numClasses: 21),
labels: .voc
)
// 5. Visualize - overlay on original image
let overlay = try Drawing.overlaySegmentation(
on: .uiImage(image),
segmentation: result,
palette: .voc,
alpha: 0.5,
excludeBackground: true
)
return UIImage(cgImage: overlay.cgImage, scale: image.scale, orientation: image.imageOrientation)
}Extract pixel data from a single image.
let result = try PixelExtractor.getPixelData(
source: .file(fileURL),
options: PixelDataOptions(
resize: ResizeOptions(width: 224, height: 224, strategy: .cover),
colorFormat: .rgb,
normalization: .imagenet,
dataLayout: .nchw,
outputFormat: .float32Array,
normalizeOrientation: true // Fix UIImage EXIF rotation issues
)
)
// Access results
result.data // [Float] - normalized pixel data
result.float16Data // [UInt16]? - Float16 as bit patterns (when outputFormat is .float16Array)
result.letterboxInfo // LetterboxInfo? - transform metadata (when using .letterbox resize)// Float16 output for Core ML / Metal efficiency
let result = try PixelExtractor.getPixelData(
source: .uiImage(image),
options: PixelDataOptions(
resize: ResizeOptions(width: 224, height: 224, strategy: .cover),
outputFormat: .float16Array // Efficient for Apple Silicon
)
)
// result.float16Data contains UInt16 bit patterns
// Convert back: Float16(bitPattern: result.float16Data![i])// Letterbox resize automatically captures transform info
let result = try PixelExtractor.getPixelData(
source: .uiImage(image),
options: PixelDataOptions(
resize: ResizeOptions(width: 640, height: 640, strategy: .letterbox),
colorFormat: .rgb,
normalization: .scale,
dataLayout: .nchw
)
)
// Use letterboxInfo to reverse-transform detection coordinates
if let info = result.letterboxInfo {
print("Scale: \(info.scale)")
print("Offset: \(info.offset)")
print("Original size: \(info.originalSize)")
// Reverse transform a detection box from model output to original image
let originalX = (modelX - Float(info.offset.x)) / info.scale
let originalY = (modelY - Float(info.offset.y)) / info.scale
}Process multiple images with concurrency control.
// For local files
let results = try PixelExtractor.batchGetPixelData(
sources: [
.file(URL(fileURLWithPath: "/path/to/1.jpg")),
.file(URL(fileURLWithPath: "/path/to/2.jpg")),
.file(URL(fileURLWithPath: "/path/to/3.jpg"))
],
options: ModelPresets.mobilenet,
concurrency: 4
)
// For remote images, download first then process
let urls = ["https://example.com/1.jpg", "https://example.com/2.jpg", "https://example.com/3.jpg"]
var sources: [ImageSource] = []
for url in urls {
let (data, _) = try await URLSession.shared.data(from: URL(string: url)!)
sources.append(.data(data))
}
let results2 = try PixelExtractor.batchGetPixelData(
sources: sources,
options: ModelPresets.mobilenet,
concurrency: 4
)Calculate image statistics for analysis and preprocessing decisions.
let stats = try ImageAnalyzer.getStatistics(source: .file(fileURL))
print(stats.mean) // [r, g, b] mean values (0-1)
print(stats.std) // [r, g, b] standard deviations
print(stats.min) // [r, g, b] minimum values
print(stats.max) // [r, g, b] maximum values
print(stats.histogram) // RGB histogramsGet image metadata without loading full pixel data.
let metadata = try ImageAnalyzer.getMetadata(source: .file(fileURL))
print(metadata.width)
print(metadata.height)
print(metadata.channels)
print(metadata.colorSpace)
print(metadata.hasAlpha)Validate an image against specified criteria.
let validation = try ImageAnalyzer.validate(
source: .file(fileURL),
options: ValidationOptions(
minWidth: 224,
minHeight: 224,
maxWidth: 4096,
maxHeight: 4096,
requiredAspectRatio: 1.0,
aspectRatioTolerance: 0.1
)
)
print(validation.isValid) // true if passes all checks
print(validation.issues) // Array of validation issuesDetect if an image is blurry using Laplacian variance analysis.
let result = try ImageAnalyzer.detectBlur(
source: .file(fileURL),
threshold: 100,
downsampleSize: 500
)
print(result.isBlurry) // true if blurry
print(result.score) // Laplacian variance scoreApply image augmentations for data augmentation pipelines.
let augmented = try ImageAugmentor.applyAugmentations(
to: .file(fileURL),
options: AugmentationOptions(
rotation: 15, // Degrees
horizontalFlip: true,
verticalFlip: false,
brightness: 1.2, // 1.0 = no change
contrast: 1.1,
saturation: 0.9,
blur: BlurOptions(type: .gaussian, radius: 2)
)
)Apply color jitter augmentation with granular control.
let result = try ImageAugmentor.colorJitter(
source: .file(fileURL),
options: ColorJitterOptions(
brightness: 0.2, // Random in [-0.2, +0.2]
contrast: 0.2, // Random in [0.8, 1.2]
saturation: 0.3, // Random in [0.7, 1.3]
hue: 0.1, // Random in [-0.1, +0.1]
seed: 42 // For reproducibility
)
)Apply cutout (random erasing) augmentation.
let result = try ImageAugmentor.cutout(
source: .file(fileURL),
options: CutoutOptions(
numCutouts: 1,
minSize: 0.02,
maxSize: 0.33,
fillMode: .constant,
fillValue: [0, 0, 0],
seed: 42
)
)Convert between bounding box formats (xyxy, xywh, cxcywh).
let boxes = [[320.0, 240.0, 100.0, 80.0]] // [cx, cy, w, h]
let converted = BoundingBox.convertFormat(
boxes,
from: .cxcywh,
to: .xyxy
)
// [[270, 200, 370, 280]] - [x1, y1, x2, y2]Scale bounding boxes between different image dimensions.
let scaled = BoundingBox.scale(
[[100, 100, 200, 200]],
from: CGSize(width: 640, height: 640),
to: CGSize(width: 1920, height: 1080),
format: .xyxy
)Clip bounding boxes to image boundaries.
let clipped = BoundingBox.clip(
[[-10, 50, 700, 500]],
imageSize: CGSize(width: 640, height: 480),
format: .xyxy
)Calculate Intersection over Union between two boxes.
let iou = BoundingBox.calculateIoU(
[100, 100, 200, 200],
[150, 150, 250, 250],
format: .xyxy
)Apply Non-Maximum Suppression to filter overlapping detections.
let detections = [
Detection(box: [100, 100, 200, 200], score: 0.9, classIndex: 0),
Detection(box: [110, 110, 210, 210], score: 0.8, classIndex: 0),
Detection(box: [300, 300, 400, 400], score: 0.7, classIndex: 1)
]
let filtered = BoundingBox.nonMaxSuppression(
detections: detections,
iouThreshold: 0.5,
scoreThreshold: 0.3,
maxDetections: 100 // Optional limit
)Extract a single channel from pixel data.
let redChannel = TensorOperations.extractChannel(
data: result.data,
width: result.width,
height: result.height,
channels: result.channels,
channelIndex: 0, // 0=R, 1=G, 2=B
dataLayout: result.dataLayout
)Transpose/permute tensor dimensions.
// Convert HWC to CHW
let permuted = TensorOperations.permute(
data: result.data,
shape: result.shape,
order: [2, 0, 1] // new order: C, H, W
)Assemble multiple results into a batch tensor.
let batch = TensorOperations.assembleBatch(
results: [result1, result2, result3],
layout: .nchw,
padToSize: 4
)
print(batch.shape) // [3, 3, 224, 224] for NCHWQuantize float data to int8/uint8/int16/int4 format.
// Per-tensor quantization (standard)
let quantized = try Quantizer.quantize(
data: result.data,
options: QuantizationOptions(
mode: .perTensor,
dtype: .uint8,
scale: [0.0078125],
zeroPoint: [128]
)
)Quantize with per-channel scale/zeroPoint for higher accuracy (ideal for CNN and Transformer weights):
// Per-channel quantization with calibration
let (scales, zeroPoints) = Quantizer.calibratePerChannel(
data: weightsData,
numChannels: 64,
dtype: .int8,
layout: .chw // or .hwc
)
let quantized = try Quantizer.quantize(
data: weightsData,
options: QuantizationOptions(
mode: .perChannel(axis: 0), // Channel axis
dtype: .int8,
scale: scales,
zeroPoint: zeroPoints
)
)
// Result: int8Data with per-channel parameters for TFLite/ExecuTorch4-bit quantization for 8× compression, ideal for LLM weights and edge devices:
// INT4 quantization (2 values packed per byte)
let int4Result = try Quantizer.quantize(
data: llmWeights,
options: QuantizationOptions(
mode: .perTensor,
dtype: .int4, // or .uint4
scale: [scale],
zeroPoint: [zeroPoint]
)
)
// Access packed data (50% size of INT8)
let packedBytes = int4Result.packedInt4Data! // [UInt8] - 2 values per byte
let originalCount = int4Result.originalCount! // Original element count
let compression = int4Result.compressionRatio // 8.0 vs Float32
// Dequantize back to float
let restored = try Quantizer.dequantize(
packedInt4Data: packedBytes,
originalCount: originalCount,
scale: [scale],
zeroPoint: [zeroPoint],
dtype: .int4
)| Type | Range | Compression | Use Case |
|---|---|---|---|
uint8 |
[0, 255] | 4× | TFLite, general inference |
int8 |
[-128, 127] | 4× | ExecuTorch, symmetric |
int16 |
[-32768, 32767] | 2× | High precision |
int4 |
[-8, 7] | 8× | LLM weights, edge |
uint4 |
[0, 15] | 8× | Activation quantization |
Convert quantized data back to float.
let dequantized = try Quantizer.dequantize(
uint8Data: quantizedArray,
scale: [0.0078125],
zeroPoint: [128]
)Apply letterbox padding to an image.
let options = LetterboxOptions(
targetWidth: 640,
targetHeight: 640,
fillColor: (114, 114, 114), // YOLO gray
scaleUp: true,
center: true
)
let result = try Letterbox.apply(
to: .file(fileURL),
options: options
)
print(result.cgImage) // Letterboxed CGImage
print(result.scale) // Scale factor applied
print(result.offsetX) // X padding offset
print(result.offsetY) // Y padding offsetTransform detection boxes back to original image coordinates.
let originalBoxes = Letterbox.reverseTransformBoxes(
boxes: detectedBoxes,
scale: letterboxResult.scale,
offsetX: letterboxResult.offsetX,
offsetY: letterboxResult.offsetY
)Draw bounding boxes with labels on an image.
let result = try Drawing.drawBoxes(
on: .uiImage(image),
boxes: [
DrawableBox(box: [100, 100, 200, 200], label: "person", score: 0.95, color: (255, 0, 0, 255)),
DrawableBox(box: [300, 150, 400, 350], label: "dog", score: 0.87, color: (0, 255, 0, 255))
],
options: BoxDrawingOptions(
lineWidth: 3,
fontSize: 14,
drawLabels: true,
drawScores: true
)
)
let annotatedImage = result.cgImageBuilt-in label databases for common ML classification and detection models.
Get a label by its class index.
let label = LabelDatabase.getLabel(0, dataset: .coco) // Returns "person"Get top-K labels from prediction scores.
let topLabels = LabelDatabase.getTopLabels(
scores: modelOutput,
dataset: .coco,
k: 5,
minConfidence: 0.1
)| Dataset | Classes | Description |
|---|---|---|
coco |
80 | COCO 2017 object detection labels |
coco91 |
91 | COCO original labels with background |
imagenet |
1000 | ImageNet ILSVRC 2012 classification |
imagenet21k |
21841 | ImageNet-21K full classification |
voc |
21 | PASCAL VOC with background |
cifar10 |
10 | CIFAR-10 classification |
cifar100 |
100 | CIFAR-100 classification |
places365 |
365 | Places365 scene recognition |
ade20k |
150 | ADE20K semantic segmentation |
Note: Remote URLs (http, https, ftp) are not supported. Download images first and use
.data(Data)instead.
public enum ImageSource {
case file(URL) // Local file path only
case data(Data) // Raw image data (use for downloaded images)
case base64(String) // Base64 encoded image
case cgImage(CGImage) // CGImage instance
case uiImage(UIImage) // UIImage (iOS)
case nsImage(NSImage) // NSImage (macOS)
}public enum ColorFormat {
case rgb // 3 channels
case rgba // 4 channels
case bgr // 3 channels (OpenCV style)
case bgra // 4 channels
case grayscale // 1 channel
case hsv // 3 channels (Hue, Saturation, Value)
case hsl // 3 channels (Hue, Saturation, Lightness)
case lab // 3 channels (CIE LAB)
case yuv // 3 channels
case ycbcr // 3 channels
}public enum ResizeStrategy {
case cover // Fill target, crop excess (default)
case contain // Fit within target, padding
case stretch // Stretch to fill (may distort)
case letterbox // Fit within target, letterbox padding (YOLO-style)
}public enum NormalizationPreset {
case scale // [0, 1] range
case imagenet // ImageNet mean/std
case tensorflow // [-1, 1] range
case raw // No normalization (0-255)
case custom(mean: [Float], std: [Float]) // Custom mean/std
}public enum DataLayout {
case hwc // Height × Width × Channels (default)
case chw // Channels × Height × Width (PyTorch)
case nhwc // Batch × Height × Width × Channels (TensorFlow/TFLite)
case nchw // Batch × Channels × Height × Width (PyTorch/ExecuTorch)
}public enum OutputFormat {
case array // Default float array (result.data)
case float32Array // Float array (result.data) - same as array
case uint8Array // UInt8 array (result.uint8Data) - for quantized models
case int32Array // Int32 array (result.int32Data) - for integer models
}Usage:
result.data- Always populated with[Float]valuesresult.uint8Data- Populated whenoutputFormat: .uint8Arrayornormalization: .rawresult.int32Data- Populated whenoutputFormat: .int32Array
All functions throw typed errors:
public enum PixelUtilsError: Error {
case invalidSource(String)
case loadFailed(String)
case invalidROI(String)
case processingFailed(String)
case invalidOptions(String)
case invalidChannel(String)
case invalidPatch(String)
case dimensionMismatch(String)
case emptyBatch(String)
case unknown(String)
}| Tip | Description |
|---|---|
| 🤖 Simplified APIs | Use getModelInput(), ClassificationOutput.process(), DetectionOutput.process(), and SegmentationOutput.process() for the easiest integration |
| 🎯 Resize Strategies | Use letterbox for YOLO, cover for classification models |
| 📦 Batch Processing | Process multiple images concurrently for better performance |
| ⚙️ Model Presets | Pre-configured settings are optimized for each model |
| 🔄 Data Layout | Choose the right dataLayout upfront to avoid conversions |
| 🚀 Accelerate Framework | Leverage Apple's Accelerate for optimal performance |
| 🤖 ExecuTorch | Use NCHW layout + Int8 quantization for PyTorch-exported models |
Contributions are welcome! Please feel free to submit a Pull Request.
MIT License - see LICENSE file for details.
This project was built with AI assistance.