Skip to content

High-performance React Native SDK for image pre-processing for ML/AI pipelines. Native implementations for pixel extraction, tensor conversion, quantization, augmentation, and model-specific preprocessing (YOLO, MobileNet, etc.)

License

Notifications You must be signed in to change notification settings

manishkumar03/react-native-vision-utils

react-native-vision-utils

High-performance React Native library for image preprocessing optimized for ML/AI inference pipelines

InstallationQuick StartAPI ReferenceFeaturesContributing

npm version downloads license


Provides comprehensive tools for pixel data extraction, tensor manipulation, image augmentation, and model-specific preprocessing with native implementations in Swift (iOS) and Kotlin (Android).

📋 Table of Contents


✨ Features

  • 🚀 High Performance: Native implementations in Swift (iOS) and Kotlin (Android)
  • 🔢 Raw Pixel Data: Extract pixel values as typed arrays (Float32, Int32, Uint8) ready for ML inference
  • 🎨 Multiple Color Formats: RGB, RGBA, BGR, BGRA, Grayscale, HSV, HSL, LAB, YUV, YCbCr
  • 📐 Flexible Resizing: Cover, contain, stretch, and letterbox strategies
  • 🔢 ML-Ready Normalization: ImageNet, TensorFlow, custom presets
  • 📊 Multiple Data Layouts: HWC, CHW, NHWC, NCHW (PyTorch/TensorFlow compatible)
  • 📦 Batch Processing: Process multiple images with concurrency control
  • 🖼️ Multiple Sources: URL, file, base64, assets, photo library
  • 🤖 Model Presets: Pre-configured settings for YOLO, MobileNet, EfficientNet, ResNet, ViT, CLIP, SAM, DINO, DETR
  • 🔄 Image Augmentation: Rotation, flip, brightness, contrast, saturation, blur
  • 🎨 Color Jitter: Granular brightness/contrast/saturation/hue control with range support and seeded randomness
  • ✂️ Cutout/Random Erasing: Mask random regions with constant/noise fill for robustness training
  • 📈 Image Analysis: Statistics, metadata, validation, blur detection
  • 🧮 Tensor Operations: Channel extraction, patch extraction, permutation, batch concatenation
  • 🔙 Tensor to Image: Convert processed tensors back to images
  • 🎯 Native Quantization: Float→Int8/Uint8/Int16 with per-tensor and per-channel support (TFLite compatible)
  • 🏷️ Label Database: Built-in labels for COCO, ImageNet, VOC, CIFAR, Places365, ADE20K
  • 📹 Camera Frame Utils: Direct YUV/NV12/BGRA→tensor conversion for vision-camera integration
  • 📦 Bounding Box Utilities: Format conversion (xyxy/xywh/cxcywh), scaling, clipping, IoU, NMS
  • 🖼️ Letterbox Padding: YOLO-style letterbox preprocessing with reverse coordinate transform
  • 🎨 Drawing/Visualization: Draw boxes, keypoints, masks, and heatmaps for debugging
  • 🎬 Video Frame Extraction: Extract frames from videos at timestamps, intervals, or evenly-spaced for temporal ML models
  • 🔲 Grid/Patch Extraction: Extract image patches in grid patterns for sliding window inference
  • 🎲 Random Crop with Seed: Reproducible random crops for data augmentation pipelines
  • Tensor Validation: Validate tensor shapes, dtypes, and value ranges before inference
  • 📦 Batch Assembly: Combine multiple images into NCHW/NHWC batch tensors

📦 Installation

npm install react-native-vision-utils

For iOS, run:

cd ios && pod install

🚀 Quick Start

Basic Usage

import { getPixelData } from 'react-native-vision-utils';

const result = await getPixelData({
  source: { type: 'url', value: 'https://example.com/image.jpg' },
});

console.log(result.data); // Float array of pixel values
console.log(result.width); // Image width
console.log(result.height); // Image height
console.log(result.shape); // [height, width, channels]

Using Model Presets

import { getPixelData, MODEL_PRESETS } from 'react-native-vision-utils';

// Use pre-configured YOLO settings
const result = await getPixelData({
  source: { type: 'file', value: '/path/to/image.jpg' },
  ...MODEL_PRESETS.yolov8,
});
// Automatically configured: 640x640, letterbox resize, RGB, scale normalization, NCHW layout

// Or MobileNet
const mobileNetResult = await getPixelData({
  source: { type: 'file', value: '/path/to/image.jpg' },
  ...MODEL_PRESETS.mobilenet,
});
// Configured: 224x224, cover resize, RGB, ImageNet normalization, NCHW layout

Available Model Presets

Preset Size Resize Normalization Layout
yolo / yolov8 640×640 letterbox scale NCHW
mobilenet 224×224 cover ImageNet NHWC
mobilenet_v2 224×224 cover ImageNet NHWC
mobilenet_v3 224×224 cover ImageNet NHWC
efficientnet 224×224 cover ImageNet NHWC
resnet 224×224 cover ImageNet NCHW
resnet50 224×224 cover ImageNet NCHW
vit 224×224 cover ImageNet NCHW
clip 224×224 cover CLIP-specific NCHW
sam 1024×1024 contain ImageNet NCHW
dino 224×224 cover ImageNet NCHW
detr 800×800 contain ImageNet NCHW

📖 API Reference

🔧 Core Functions

getPixelData(options)

Extract pixel data from a single image.

const result = await getPixelData({
  source: { type: 'file', value: '/path/to/image.jpg' },
  resize: { width: 224, height: 224, strategy: 'cover' },
  colorFormat: 'rgb',
  normalization: { preset: 'imagenet' },
  dataLayout: 'nchw',
  outputFormat: 'float32Array',
});
Options
Property Type Default Description
source ImageSource required Image source specification
colorFormat ColorFormat 'rgb' Output color format
resize ResizeOptions - Resize options
roi Roi - Region of interest to extract
normalization Normalization { preset: 'scale' } Normalization settings
dataLayout DataLayout 'hwc' Data layout format
outputFormat OutputFormat 'array' Output format
Result
interface PixelDataResult {
  data: number[] | Float32Array | Uint8Array;
  width: number;
  height: number;
  channels: number;
  colorFormat: ColorFormat;
  dataLayout: DataLayout;
  shape: number[];
  processingTimeMs: number;
}

batchGetPixelData(optionsArray, batchOptions)

Process multiple images with concurrency control.

const results = await batchGetPixelData(
  [
    {
      source: { type: 'url', value: 'https://example.com/1.jpg' },
      ...MODEL_PRESETS.mobilenet,
    },
    {
      source: { type: 'url', value: 'https://example.com/2.jpg' },
      ...MODEL_PRESETS.mobilenet,
    },
    {
      source: { type: 'url', value: 'https://example.com/3.jpg' },
      ...MODEL_PRESETS.mobilenet,
    },
  ],
  { concurrency: 4 }
);

console.log(results.totalTimeMs);
results.results.forEach((result, index) => {
  if ('error' in result) {
    console.log(`Image ${index} failed: ${result.message}`);
  } else {
    console.log(`Image ${index}: ${result.width}x${result.height}`);
  }
});

🔍 Image Analysis

getImageStatistics(source)

Calculate image statistics for analysis and preprocessing decisions.

import { getImageStatistics } from 'react-native-vision-utils';

const stats = await getImageStatistics({
  type: 'file',
  value: '/path/to/image.jpg',
});

console.log(stats.mean); // [r, g, b] mean values (0-1)
console.log(stats.std); // [r, g, b] standard deviations
console.log(stats.min); // [r, g, b] minimum values
console.log(stats.max); // [r, g, b] maximum values
console.log(stats.histogram); // { r: number[], g: number[], b: number[] }

getImageMetadata(source)

Get image metadata without loading full pixel data.

import { getImageMetadata } from 'react-native-vision-utils';

const metadata = await getImageMetadata({
  type: 'file',
  value: '/path/to/image.jpg',
});

console.log(metadata.width); // Image width
console.log(metadata.height); // Image height
console.log(metadata.channels); // Number of channels
console.log(metadata.colorSpace); // Color space (sRGB, etc.)
console.log(metadata.hasAlpha); // Has alpha channel
console.log(metadata.aspectRatio); // Width / height ratio

validateImage(source, options)

Validate an image against specified criteria.

import { validateImage } from 'react-native-vision-utils';

const validation = await validateImage(
  { type: 'file', value: '/path/to/image.jpg' },
  {
    minWidth: 224,
    minHeight: 224,
    maxWidth: 4096,
    maxHeight: 4096,
    requiredAspectRatio: 1.0,
    aspectRatioTolerance: 0.1,
  }
);

if (validation.isValid) {
  console.log('Image meets all requirements');
} else {
  console.log('Issues:', validation.issues);
}

detectBlur(source, options?)

Detect if an image is blurry using Laplacian variance analysis.

import { detectBlur } from 'react-native-vision-utils';

const result = await detectBlur(
  { type: 'file', value: '/path/to/photo.jpg' },
  {
    threshold: 100, // Higher = stricter blur detection
    downsampleSize: 500, // Downsample for faster processing
  }
);

console.log(result.isBlurry); // true if blurry
console.log(result.score); // Laplacian variance score
console.log(result.threshold); // Threshold used
console.log(result.processingTimeMs); // Processing time

Options:

Parameter Type Default Description
threshold number 100 Variance threshold - images with score below this are considered blurry
downsampleSize number 500 Max dimension for downsampling (improves performance on large images)

Result:

Property Type Description
isBlurry boolean Whether the image is considered blurry
score number Laplacian variance score (higher = sharper)
threshold number The threshold that was used
processingTimeMs number Processing time in milliseconds

extractVideoFrames(source, options?)

Extract frames from video files for video analysis, action recognition, and temporal ML models.

import { extractVideoFrames } from 'react-native-vision-utils';

// Extract 10 evenly-spaced frames
const result = await extractVideoFrames(
  { type: 'file', value: '/path/to/video.mp4' },
  {
    count: 10, // Extract 10 evenly-spaced frames
    resize: { width: 224, height: 224 },
    outputFormat: 'base64', // or 'pixelData' for ML-ready arrays
  }
);

console.log(result.frameCount); // Number of frames extracted
console.log(result.videoDuration); // Video duration in seconds
console.log(result.videoWidth); // Video width
console.log(result.videoHeight); // Video height
console.log(result.frames); // Array of extracted frames

Extraction Modes:

// Mode 1: Specific timestamps
const result = await extractVideoFrames(source, {
  timestamps: [0.5, 1.0, 2.5, 5.0], // Extract at these exact times
});

// Mode 2: Regular intervals
const result = await extractVideoFrames(source, {
  interval: 1.0, // Extract every 1 second
  startTime: 5.0, // Start at 5 seconds
  endTime: 30.0, // End at 30 seconds
  maxFrames: 25, // Limit to 25 frames
});

// Mode 3: Count-based (evenly spaced)
const result = await extractVideoFrames(source, {
  count: 16, // Extract exactly 16 evenly-spaced frames
});

ML-Ready Output:

// Get pixel arrays ready for inference
const result = await extractVideoFrames(
  { type: 'file', value: '/path/to/video.mp4' },
  {
    count: 16,
    resize: { width: 224, height: 224 },
    outputFormat: 'pixelData',
    colorFormat: 'rgb',
    normalization: { preset: 'imagenet' },
  }
);

// Each frame has normalized pixel data ready for models
result.frames.forEach((frame) => {
  const tensor = frame.data; // Float32 array
  // Use with your ML model
});

Options:

Parameter Type Default Description
timestamps number[] - Specific timestamps in seconds to extract frames
interval number - Extract frames at regular intervals (seconds)
count number - Number of evenly-spaced frames to extract
startTime number 0 Start time in seconds (for interval/count modes)
endTime number duration End time in seconds (for interval/count modes)
maxFrames number 100 Maximum frames for interval mode
resize object - Resize frames to { width, height }
outputFormat string 'base64' 'base64' or 'pixelData' for ML arrays
quality number 90 JPEG quality for base64 output (0-100)
colorFormat string 'rgb' Color format for pixelData ('rgb', 'rgba', 'bgr', 'grayscale')
normalization object scale Normalization preset for pixelData ('scale', 'imagenet', 'tensorflow')

Note: If no extraction mode is specified (no timestamps, interval, or count), a single frame at t=0 is extracted by default.

Platform Note: The asset source type is only supported on iOS. Android supports file and url sources.

PixelData Note: colorFormat and normalization are applied only when outputFormat === 'pixelData'.

Result:

Property Type Description
frames ExtractedFrame[] Array of extracted frames
frameCount number Number of frames extracted
videoDuration number Total video duration in seconds
videoWidth number Video width in pixels
videoHeight number Video height in pixels
frameRate number Video frame rate (fps)
processingTimeMs number Processing time in milliseconds

ExtractedFrame:

Property Type Description
timestamp number Actual frame timestamp in seconds
requestedTimestamp number Originally requested timestamp
width number Frame width
height number Frame height
data string | number[] Base64 string (when outputFormat='base64') or pixel array (when outputFormat='pixelData')
channels number Number of channels (only present when outputFormat='pixelData')
error string Error message if frame extraction failed (frame may still be present with partial data)

🎨 Image Augmentation

applyAugmentations(source, augmentations)

Apply image augmentations for data augmentation pipelines.

import { applyAugmentations } from 'react-native-vision-utils';

const augmented = await applyAugmentations(
  { type: 'file', value: '/path/to/image.jpg' },
  {
    rotation: 15, // Degrees
    horizontalFlip: true,
    verticalFlip: false,
    brightness: 1.2, // 1.0 = no change
    contrast: 1.1, // 1.0 = no change
    saturation: 0.9, // 1.0 = no change
    blur: { type: 'gaussian', radius: 2 }, // Blur options
  }
);

console.log(augmented.base64); // Base64 encoded result
console.log(augmented.width);
console.log(augmented.height);

colorJitter(source, options)

Apply color jitter augmentation with granular control over brightness, contrast, saturation, and hue. More flexible than applyAugmentations - supports ranges for each property with random sampling and optional seed for reproducibility.

import { colorJitter } from 'react-native-vision-utils';

// Basic usage with symmetric ranges
const result = await colorJitter(
  { type: 'file', value: '/path/to/image.jpg' },
  {
    brightness: 0.2,     // Random in [-0.2, +0.2]
    contrast: 0.2,       // Random in [0.8, 1.2]
    saturation: 0.3,     // Random in [0.7, 1.3]
    hue: 0.1,            // Random in [-0.1, +0.1] (fraction of 360°)
  }
);

console.log(result.base64);              // Augmented image
console.log(result.appliedBrightness);   // Actual value applied
console.log(result.appliedContrast);
console.log(result.appliedSaturation);
console.log(result.appliedHue);

// Asymmetric ranges with seed for reproducibility
const reproducible = await colorJitter(
  { type: 'url', value: 'https://example.com/image.jpg' },
  {
    brightness: [-0.1, 0.3],  // More brightening than darkening
    contrast: [0.8, 1.5],     // Allow up to 50% more contrast
    seed: 42,                  // Same seed = same random values
  }
);

Options:

Parameter Type Default Description
brightness number | [min, max] 0 Additive brightness adjustment. Single value creates symmetric range [-v, +v]
contrast number | [min, max] 1 Contrast multiplier. Single value creates range [max(0, 1-v), 1+v]
saturation number | [min, max] 1 Saturation multiplier. Single value creates range [max(0, 1-v), 1+v]
hue number | [min, max] 0 Hue shift as fraction of color wheel (0-1 = 0-360°). Single value creates symmetric range [-v, +v]
seed number random Seed for reproducible random sampling

Result:

Property Type Description
base64 string Augmented image as base64 PNG
width number Output image width
height number Output image height
appliedBrightness number Actual brightness value applied
appliedContrast number Actual contrast value applied
appliedSaturation number Actual saturation value applied
appliedHue number Actual hue shift value applied
seed number Seed used for random generation
processingTimeMs number Processing time in milliseconds

cutout(source, options)

Apply cutout (random erasing) augmentation to mask random rectangular regions. Improves model robustness by forcing it to rely on diverse features.

import { cutout } from 'react-native-vision-utils';

// Basic cutout with black fill
const result = await cutout(
  { type: 'file', value: '/path/to/image.jpg' },
  {
    numCutouts: 1,
    minSize: 0.02,      // 2% of image area min
    maxSize: 0.33,      // 33% of image area max
  }
);

console.log(result.base64);       // Augmented image
console.log(result.numCutouts);   // Number of cutouts applied
console.log(result.regions);      // Details of each cutout

// Multiple cutouts with noise fill
const noisy = await cutout(
  { type: 'url', value: 'https://example.com/image.jpg' },
  {
    numCutouts: 3,
    minSize: 0.01,
    maxSize: 0.1,
    fillMode: 'noise',
    seed: 42,  // Reproducible
  }
);

// Cutout with custom gray fill and probability
const probabilistic = await cutout(
  { type: 'base64', value: base64Data },
  {
    numCutouts: 2,
    fillMode: 'constant',
    fillValue: [128, 128, 128],  // Gray
    probability: 0.5,  // 50% chance of applying
  }
);

Options:

Parameter Type Default Description
numCutouts number 1 Number of cutout regions to apply
minSize number 0.02 Minimum cutout size as fraction of image area (0-1)
maxSize number 0.33 Maximum cutout size as fraction of image area (0-1)
minAspect number 0.3 Minimum aspect ratio (width/height)
maxAspect number 3.3 Maximum aspect ratio (width/height)
fillMode string 'constant' Fill mode: 'constant', 'noise', or 'random'
fillValue [R,G,B] [0,0,0] Fill color for 'constant' mode (0-255)
probability number 1.0 Probability of applying cutout (0-1)
seed number random Seed for reproducible cutouts

Result:

Property Type Description
base64 string Augmented image as base64 PNG
width number Output image width
height number Output image height
applied boolean Whether cutout was applied (based on probability)
numCutouts number Number of cutout regions applied
regions array Details of each cutout: {x, y, width, height, fill}
seed number Seed used for random generation
processingTimeMs number Processing time in milliseconds

✂️ Multi-Crop Operations

fiveCrop(source, options, pixelOptions)

Extract 5 crops: 4 corners + center. Useful for test-time augmentation.

import { fiveCrop, MODEL_PRESETS } from 'react-native-vision-utils';

const crops = await fiveCrop(
  { type: 'file', value: '/path/to/image.jpg' },
  { width: 224, height: 224 },
  MODEL_PRESETS.mobilenet
);

console.log(crops.cropCount); // 5
crops.results.forEach((result, i) => {
  console.log(`Crop ${i}: ${result.width}x${result.height}`);
});

tenCrop(source, options, pixelOptions)

Extract 10 crops: 5 crops + their horizontal flips.

import { tenCrop, MODEL_PRESETS } from 'react-native-vision-utils';

const crops = await tenCrop(
  { type: 'file', value: '/path/to/image.jpg' },
  { width: 224, height: 224 },
  MODEL_PRESETS.mobilenet
);

console.log(crops.cropCount); // 10

extractGrid(source, gridOptions, pixelOptions?)

Extract image patches in a grid pattern. Useful for sliding window detection, image tiling, and patch-based models.

import { extractGrid } from 'react-native-vision-utils';

// Extract 3x3 grid of patches
const result = await extractGrid(
  { type: 'file', value: '/path/to/image.jpg' },
  {
    columns: 3,      // Number of columns
    rows: 3,         // Number of rows
    overlap: 0,      // Overlap in pixels (optional)
    overlapPercent: 0.1, // Or overlap as percentage (optional)
    includePartial: false, // Include edge patches smaller than full size
  },
  {
    colorFormat: 'rgb',
    normalization: { preset: 'scale' },
  }
);

console.log(result.patchCount); // 9
console.log(result.patchWidth);  // Width of each patch
console.log(result.patchHeight); // Height of each patch
result.patches.forEach((patch) => {
  console.log(`Patch [${patch.row},${patch.column}] at (${patch.x},${patch.y}): ${patch.width}x${patch.height}`);
  console.log(patch.data); // Pixel data for this patch
});

Options:

Parameter Type Default Description
columns number required Number of columns in the grid
rows number required Number of rows in the grid
overlap number 0 Overlap between patches in pixels
overlapPercent number - Overlap as percentage (0-1), alternative to overlap
includePartial boolean false Include edge patches that are smaller than full size

randomCrop(source, cropOptions, pixelOptions?)

Extract random crops from an image with optional seed for reproducibility. Essential for data augmentation in training pipelines.

import { randomCrop } from 'react-native-vision-utils';

// Extract 5 random 64x64 crops with a fixed seed
const result = await randomCrop(
  { type: 'file', value: '/path/to/image.jpg' },
  {
    width: 64,
    height: 64,
    count: 5,
    seed: 42, // For reproducibility (optional)
  },
  {
    colorFormat: 'rgb',
    normalization: { preset: 'imagenet' },
  }
);

console.log(result.cropCount); // 5
result.crops.forEach((crop, i) => {
  console.log(`Crop ${i}: position (${crop.x},${crop.y}), size ${crop.width}x${crop.height}`);
  console.log(`Seed: ${crop.seed}`); // Seed used for this specific crop
  console.log(crop.data); // Pixel data for this crop
});

Options:

Parameter Type Default Description
width number required Width of each crop
height number required Height of each crop
count number 1 Number of random crops to extract
seed number random Seed for reproducible random positions

validateTensor(data, shape, spec?)

Validate tensor data against a specification. Pure TypeScript function for checking tensor shapes, dtypes, and value ranges before inference.

import { validateTensor } from 'react-native-vision-utils';

const tensorData = new Float32Array(1 * 3 * 224 * 224);
// ... fill with data

const validation = validateTensor(
  tensorData,
  [1, 3, 224, 224], // Actual shape of your data
  {
    shape: [1, 3, 224, 224], // Expected shape (-1 as wildcard)
    dtype: 'float32',
    minValue: 0,
    maxValue: 1,
  }
);

if (validation.isValid) {
  console.log('Tensor is valid for inference');
} else {
  console.log('Validation issues:', validation.issues);
}

// Also provides statistics
console.log(validation.actualMin);
console.log(validation.actualMax);
console.log(validation.actualMean);
console.log(validation.actualShape);

TensorSpec Options:

Parameter Type Description
shape number[] Expected shape (use -1 as wildcard for any dimension)
dtype string Expected dtype ('float32', 'int32', 'uint8', etc.)
minValue number Minimum allowed value (optional)
maxValue number Maximum allowed value (optional)
channels number Expected number of channels (optional)

Result:

Property Type Description
isValid boolean Whether tensor passes all validations
issues string[] List of validation issues found
actualShape number[] Actual shape of the tensor
actualMin number Minimum value in tensor
actualMax number Maximum value in tensor
actualMean number Mean value of tensor

assembleBatch(pixelResults, options?)

Assemble multiple PixelDataResults into a single batch tensor. Pure TypeScript function for preparing batched inference.

import { getPixelData, assembleBatch, MODEL_PRESETS } from 'react-native-vision-utils';

// Process multiple images
const images = ['/path/to/1.jpg', '/path/to/2.jpg', '/path/to/3.jpg'];
const results = await Promise.all(
  images.map((path) =>
    getPixelData({
      source: { type: 'file', value: path },
      ...MODEL_PRESETS.mobilenet,
    })
  )
);

// Assemble into batch
const batch = assembleBatch(results, {
  layout: 'nchw', // or 'nhwc'
  padToSize: 4,   // Pad batch to size 4 (optional)
});

console.log(batch.shape);     // [3, 3, 224, 224] for NCHW
console.log(batch.batchSize); // 3 (or 4 if padded)
console.log(batch.data);      // Combined Float32Array

Options:

Parameter Type Default Description
layout 'nchw' | 'nhwc' 'nchw' Output batch layout
padToSize number - Pad batch to this size with zeros (optional)

🧮 Tensor Operations

extractChannel(data, width, height, channels, channelIndex, dataLayout)

Extract a single channel from pixel data.

import { getPixelData, extractChannel } from 'react-native-vision-utils';

const result = await getPixelData({
  source: { type: 'file', value: '/path/to/image.jpg' },
  colorFormat: 'rgb',
});

// Extract red channel
const redChannel = await extractChannel(
  result.data,
  result.width,
  result.height,
  result.channels,
  0, // channel index (0=R, 1=G, 2=B)
  result.dataLayout
);

extractPatch(data, width, height, channels, patchOptions, dataLayout)

Extract a rectangular patch from pixel data.

import { getPixelData, extractPatch } from 'react-native-vision-utils';

const result = await getPixelData({
  source: { type: 'file', value: '/path/to/image.jpg' },
});

const patch = await extractPatch(
  result.data,
  result.width,
  result.height,
  result.channels,
  { x: 100, y: 100, width: 64, height: 64 },
  result.dataLayout
);

concatenateToBatch(results)

Combine multiple results into a batch tensor.

import {
  batchGetPixelData,
  concatenateToBatch,
  MODEL_PRESETS,
} from 'react-native-vision-utils';

const batch = await batchGetPixelData(
  [
    {
      source: { type: 'file', value: '/path/to/1.jpg' },
      ...MODEL_PRESETS.mobilenet,
    },
    {
      source: { type: 'file', value: '/path/to/2.jpg' },
      ...MODEL_PRESETS.mobilenet,
    },
  ],
  { concurrency: 2 }
);

const batchTensor = await concatenateToBatch(batch.results);
console.log(batchTensor.shape); // [2, 3, 224, 224]
console.log(batchTensor.batchSize); // 2

permute(data, shape, order)

Transpose/permute tensor dimensions.

import { getPixelData, permute } from 'react-native-vision-utils';

const result = await getPixelData({
  source: { type: 'file', value: '/path/to/image.jpg' },
  dataLayout: 'hwc', // [H, W, C]
});

// Convert HWC to CHW
const permuted = await permute(
  result.data,
  result.shape,
  [2, 0, 1] // new order: C, H, W
);
console.log(permuted.shape); // [channels, height, width]

🖼️ Tensor to Image

tensorToImage(data, width, height, options)

Convert tensor data back to an image.

import {
  getPixelData,
  tensorToImage,
  MODEL_PRESETS,
} from 'react-native-vision-utils';

// Process image
const result = await getPixelData({
  source: { type: 'file', value: '/path/to/image.jpg' },
  ...MODEL_PRESETS.mobilenet,
});

// Convert back to image (with denormalization)
const image = await tensorToImage(result.data, result.width, result.height, {
  channels: 3,
  dataLayout: 'chw',
  format: 'png',
  denormalize: true,
  mean: [0.485, 0.456, 0.406],
  std: [0.229, 0.224, 0.225],
});

console.log(image.base64); // Base64 encoded PNG

💾 Cache Management

clearCache()

Clear the internal image cache.

import { clearCache } from 'react-native-vision-utils';

await clearCache();

getCacheStats()

Get cache statistics.

import { getCacheStats } from 'react-native-vision-utils';

const stats = await getCacheStats();
console.log(stats.hitCount);
console.log(stats.missCount);
console.log(stats.size);
console.log(stats.maxSize);

🏷️ Label Database

Built-in label databases for common ML classification and detection models. No external files needed.

getLabel(index, options)

Get a label by its class index.

import { getLabel } from 'react-native-vision-utils';

// Simple lookup
const label = await getLabel(0); // Returns 'person' (COCO default)

// With metadata
const labelInfo = await getLabel(0, {
  dataset: 'coco',
  includeMetadata: true,
});
console.log(labelInfo.name); // 'person'
console.log(labelInfo.displayName); // 'Person'
console.log(labelInfo.supercategory); // 'person'

getTopLabels(scores, options)

Get top-K labels from prediction scores (e.g., from softmax output).

import { getTopLabels } from 'react-native-vision-utils';

// After running inference, get your output scores
const scores = modelOutput; // Array of 80 confidence scores for COCO

const topLabels = await getTopLabels(scores, {
  dataset: 'coco',
  k: 5,
  minConfidence: 0.1,
  includeMetadata: true,
});

topLabels.forEach((result) => {
  console.log(`${result.label}: ${(result.confidence * 100).toFixed(1)}%`);
});
// Output:
// person: 92.3%
// car: 45.2%
// dog: 23.1%

getAllLabels(dataset)

Get all labels for a dataset.

import { getAllLabels } from 'react-native-vision-utils';

const cocoLabels = await getAllLabels('coco');
console.log(cocoLabels.length); // 80
console.log(cocoLabels[0]); // 'person'

getDatasetInfo(dataset)

Get information about a dataset.

import { getDatasetInfo } from 'react-native-vision-utils';

const info = await getDatasetInfo('imagenet');
console.log(info.name); // 'imagenet'
console.log(info.numClasses); // 1000
console.log(info.description); // 'ImageNet ILSVRC 2012 classification labels'

getAvailableDatasets()

List all available label datasets.

import { getAvailableDatasets } from 'react-native-vision-utils';

const datasets = await getAvailableDatasets();
// ['coco', 'coco91', 'imagenet', 'voc', 'cifar10', 'cifar100', 'places365', 'ade20k']
Available Datasets
Dataset Classes Description
coco 80 COCO 2017 object detection labels
coco91 91 COCO original labels with background
imagenet 1000 ImageNet ILSVRC 2012 classification
imagenet21k 21841 ImageNet-21K full classification
voc 21 PASCAL VOC with background
cifar10 10 CIFAR-10 classification
cifar100 100 CIFAR-100 classification
places365 365 Places365 scene recognition
ade20k 150 ADE20K semantic segmentation

📹 Camera Frame Utilities

High-performance utilities for processing camera frames directly, optimized for integration with react-native-vision-camera.

processCameraFrame(source, options)

Convert camera frame buffer to ML-ready tensor.

import { processCameraFrame } from 'react-native-vision-utils';

// In a vision-camera frame processor
const result = await processCameraFrame(
  {
    width: 1920,
    height: 1080,
    pixelFormat: 'yuv420', // Camera format
    bytesPerRow: 1920,
    dataBase64: frameDataBase64, // Base64-encoded frame data
    orientation: 'right', // Device orientation
    timestamp: Date.now(),
  },
  {
    outputWidth: 224, // Resize for model
    outputHeight: 224,
    normalize: true,
    outputFormat: 'rgb',
    mean: [0.485, 0.456, 0.406], // ImageNet normalization
    std: [0.229, 0.224, 0.225],
  }
);

console.log(result.tensor); // Normalized float array
console.log(result.shape); // [224, 224, 3]
console.log(result.processingTimeMs); // Processing time

convertYUVToRGB(options)

Direct YUV to RGB conversion for camera frames.

import { convertYUVToRGB } from 'react-native-vision-utils';

const rgbData = await convertYUVToRGB({
  width: 640,
  height: 480,
  pixelFormat: 'yuv420',
  yPlaneBase64: yPlaneData,
  uPlaneBase64: uPlaneData,
  vPlaneBase64: vPlaneData,
  outputFormat: 'rgb', // or 'base64'
});

console.log(rgbData.data); // RGB pixel values
console.log(rgbData.channels); // 3
Supported Pixel Formats
Format Description
yuv420 YUV 4:2:0 planar
yuv422 YUV 4:2:2 planar
nv12 NV12 (Y plane + interleaved UV)
nv21 NV21 (Y plane + interleaved VU)
bgra BGRA 8-bit
rgba RGBA 8-bit
rgb RGB 8-bit
Frame Orientations
Orientation Description
up No rotation (default)
down 180° rotation
left 90° counter-clockwise
right 90° clockwise

📊 Quantization (for TFLite and Other Quantized Models)

Native high-performance quantization for deploying to quantized ML models like TFLite int8.

quantize(data, options)

Quantize float data to int8/uint8/int16 format.

import {
  getPixelData,
  quantize,
  MODEL_PRESETS,
} from 'react-native-vision-utils';

// Get float pixel data
const result = await getPixelData({
  source: { type: 'file', value: '/path/to/image.jpg' },
  ...MODEL_PRESETS.mobilenet,
});

// Per-tensor quantization (single scale/zeroPoint)
const quantized = await quantize(result.data, {
  mode: 'per-tensor',
  dtype: 'uint8',
  scale: 0.0078125, // From TFLite model
  zeroPoint: 128, // From TFLite model
});

console.log(quantized.data); // Uint8Array
console.log(quantized.scale); // 0.0078125
console.log(quantized.zeroPoint); // 128
console.log(quantized.dtype); // 'uint8'
console.log(quantized.mode); // 'per-tensor'
Per-Channel Quantization

For models with per-channel quantization (common in TFLite):

// Per-channel quantization (different scale/zeroPoint per channel)
const quantized = await quantize(result.data, {
  mode: 'per-channel',
  dtype: 'int8',
  scale: [0.0123, 0.0156, 0.0189], // Per-channel scales
  zeroPoint: [0, 0, 0], // Per-channel zero points
  channels: 3,
  dataLayout: 'chw', // Specify layout for per-channel
});
Automatic Parameter Calculation

Calculate optimal quantization parameters from your data:

import {
  calculateQuantizationParams,
  quantize,
} from 'react-native-vision-utils';

// Calculate optimal params for your data range
const params = await calculateQuantizationParams(result.data, {
  mode: 'per-tensor',
  dtype: 'uint8',
});

console.log(params.scale); // Calculated scale
console.log(params.zeroPoint); // Calculated zero point
console.log(params.min); // Data min value
console.log(params.max); // Data max value

// Use calculated params for quantization
const quantized = await quantize(result.data, {
  mode: 'per-tensor',
  dtype: 'uint8',
  scale: params.scale as number,
  zeroPoint: params.zeroPoint as number,
});

dequantize(data, options)

Convert quantized data back to float32.

import { dequantize } from 'react-native-vision-utils';

// Dequantize int8 data back to float
const dequantized = await dequantize(quantized.data, {
  mode: 'per-tensor',
  dtype: 'int8',
  scale: 0.0078125,
  zeroPoint: 0,
});

console.log(dequantized.data); // Float32Array
Quantization Options
Property Type Required Description
mode 'per-tensor' | 'per-channel' Yes Quantization mode
dtype 'int8' | 'uint8' | 'int16' Yes Output data type
scale number | number[] Yes Scale factor(s) - array for per-channel
zeroPoint number | number[] Yes Zero point(s) - array for per-channel
channels number Per-channel Number of channels (required for per-channel)
dataLayout 'hwc' | 'chw' Per-channel Data layout (default: 'hwc')
Quantization Formulas
Quantize:   q = round(value / scale + zeroPoint)
Dequantize: value = (q - zeroPoint) * scale

📦 Bounding Box Utilities

High-performance utilities for working with object detection outputs.

convertBoxFormat(boxes, options)

Convert between bounding box formats (xyxy, xywh, cxcywh).

import { convertBoxFormat } from 'react-native-vision-utils';

// Convert YOLO format (center x, center y, width, height) to corners
const result = await convertBoxFormat(
  [[320, 240, 100, 80]], // [cx, cy, w, h]
  { fromFormat: 'cxcywh', toFormat: 'xyxy' }
);
console.log(result.boxes); // [[270, 200, 370, 280]] - [x1, y1, x2, y2]
Box Formats
Format Description Values
xyxy Two corners [x1, y1, x2, y2]
xywh Top-left corner + dimensions [x, y, width, height]
cxcywh Center + dimensions (YOLO style) [cx, cy, width, height]

scaleBoxes(boxes, options)

Scale bounding boxes between different image dimensions.

import { scaleBoxes } from 'react-native-vision-utils';

// Scale boxes from 640x640 model input back to 1920x1080 original
const result = await scaleBoxes(
  [[100, 100, 200, 200]], // detections in 640x640 space
  {
    fromWidth: 640,
    fromHeight: 640,
    toWidth: 1920,
    toHeight: 1080,
    format: 'xyxy',
  }
);
// Boxes are now in original image coordinates

clipBoxes(boxes, options)

Clip bounding boxes to image boundaries.

import { clipBoxes } from 'react-native-vision-utils';

const result = await clipBoxes(
  [[-10, 50, 700, 500]], // box extends beyond image
  { width: 640, height: 480, format: 'xyxy' }
);
console.log(result.boxes); // [[0, 50, 640, 480]]

calculateIoU(box1, box2, format)

Calculate Intersection over Union between two boxes.

import { calculateIoU } from 'react-native-vision-utils';

const result = await calculateIoU(
  [100, 100, 200, 200],
  [150, 150, 250, 250],
  'xyxy'
);
console.log(result.iou); // ~0.142 (14% overlap)
console.log(result.intersection); // Area of overlap
console.log(result.union); // Total covered area

nonMaxSuppression(detections, options)

Apply Non-Maximum Suppression to filter overlapping detections.

import { nonMaxSuppression } from 'react-native-vision-utils';

const detections = [
  { box: [100, 100, 200, 200], score: 0.9, classIndex: 0 },
  { box: [110, 110, 210, 210], score: 0.8, classIndex: 0 }, // overlaps
  { box: [300, 300, 400, 400], score: 0.7, classIndex: 1 },
];

const result = await nonMaxSuppression(detections, {
  iouThreshold: 0.5,
  scoreThreshold: 0.3,
  maxDetections: 100,
});
// Keeps first and third detection; second is suppressed due to overlap

📐 Letterbox Padding

YOLO-style letterbox preprocessing for preserving aspect ratio.

letterbox(source, options)

Apply letterbox padding to an image.

import { letterbox, getPixelData } from 'react-native-vision-utils';

// Letterbox a 1920x1080 image to 640x640 for YOLO
const result = await letterbox(
  { type: 'file', value: '/path/to/image.jpg' },
  {
    targetWidth: 640,
    targetHeight: 640,
    fillColor: [114, 114, 114], // YOLO gray
  }
);

console.log(result.imageBase64); // Letterboxed image
console.log(result.letterboxInfo); // Transform info for reverse mapping
// letterboxInfo: { scale, offset, originalSize, letterboxedSize }

// Get pixel data from letterboxed image
const pixels = await getPixelData({
  source: { type: 'base64', value: result.imageBase64 },
  colorFormat: 'rgb',
  normalization: { preset: 'scale' },
  dataLayout: 'nchw',
});

reverseLetterbox(boxes, options)

Transform detection boxes back to original image coordinates.

import { letterbox, reverseLetterbox } from 'react-native-vision-utils';

// First letterbox the image
const lb = await letterbox(source, { targetWidth: 640, targetHeight: 640 });

// Run detection on letterboxed image...
const detections = [
  { box: [100, 150, 200, 250], score: 0.9 }, // in 640x640 space
];

// Reverse transform to original coordinates
const result = await reverseLetterbox(
  detections.map((d) => d.box),
  {
    scale: lb.letterboxInfo.scale,
    offset: lb.letterboxInfo.offset,
    originalSize: lb.letterboxInfo.originalSize,
    format: 'xyxy',
  }
);
// Boxes are now in original 1920x1080 coordinates

🎨 Drawing & Visualization

Utilities for visualizing detection and segmentation results.

drawBoxes(source, boxes, options)

Draw bounding boxes with labels on an image.

import { drawBoxes } from 'react-native-vision-utils';

const result = await drawBoxes(
  { type: 'base64', value: imageBase64 },
  [
    { box: [100, 100, 200, 200], label: 'person', score: 0.95, classIndex: 0 },
    { box: [300, 150, 400, 350], label: 'dog', score: 0.87, classIndex: 16 },
    { box: [500, 200, 600, 300], color: [255, 0, 0] }, // custom red color
  ],
  {
    lineWidth: 3,
    fontSize: 14,
    drawLabels: true,
    labelBackgroundAlpha: 0.7,
  }
);

// Display result.imageBase64
console.log(result.width, result.height);
console.log(result.processingTimeMs);

drawKeypoints(source, keypoints, options)

Draw pose keypoints with skeleton connections.

import { drawKeypoints } from 'react-native-vision-utils';

// COCO pose keypoints
const keypoints = [
  { x: 320, y: 100, confidence: 0.9 }, // nose
  { x: 310, y: 90, confidence: 0.85 }, // left_eye
  { x: 330, y: 90, confidence: 0.87 }, // right_eye
  // ... 17 keypoints total
];

const result = await drawKeypoints(
  { type: 'base64', value: imageBase64 },
  keypoints,
  {
    pointRadius: 5,
    minConfidence: 0.5,
    lineWidth: 2,
    skeleton: [
      { from: 0, to: 1 }, // nose to left_eye
      { from: 0, to: 2 }, // nose to right_eye
      { from: 5, to: 6 }, // left_shoulder to right_shoulder
      // ... more connections
    ],
  }
);

overlayMask(source, mask, options)

Overlay a segmentation mask on an image.

import { overlayMask } from 'react-native-vision-utils';

// Segmentation output (class indices for each pixel)
const segmentationMask = new Array(160 * 160).fill(0); // 160x160 mask
segmentationMask.fill(1, 0, 5000); // Some pixels class 1
segmentationMask.fill(15, 5000, 10000); // Some pixels class 15

const result = await overlayMask(
  { type: 'base64', value: imageBase64 },
  segmentationMask,
  {
    maskWidth: 160,
    maskHeight: 160,
    alpha: 0.5,
    isClassMask: true, // Each value is a class index
  }
);

overlayHeatmap(source, heatmap, options)

Overlay an attention/saliency heatmap on an image.

import { overlayHeatmap } from 'react-native-vision-utils';

// Attention weights from a ViT model (14x14 patches)
const attentionWeights = new Array(14 * 14).fill(0).map(() => Math.random());

const result = await overlayHeatmap(
  { type: 'base64', value: imageBase64 },
  attentionWeights,
  {
    heatmapWidth: 14,
    heatmapHeight: 14,
    alpha: 0.6,
    colorScheme: 'jet', // 'jet', 'hot', or 'viridis'
  }
);
Heatmap Color Schemes
Scheme Description
jet Blue → Cyan → Green → Yellow → Red (default)
hot Black → Red → Yellow → White
viridis Purple → Blue → Green → Yellow

📝 Type Reference

Image Source Types

type ImageSourceType = 'url' | 'file' | 'base64' | 'asset' | 'photoLibrary';

interface ImageSource {
  type: ImageSourceType;
  value: string;
}

// Examples:
{ type: 'url', value: 'https://example.com/image.jpg' }
{ type: 'file', value: '/path/to/image.jpg' }
{ type: 'base64', value: 'data:image/png;base64,...' }
{ type: 'asset', value: 'image_name' }
{ type: 'photoLibrary', value: 'identifier' }

Color Formats

type ColorFormat =
  | 'rgb' // 3 channels
  | 'rgba' // 4 channels
  | 'bgr' // 3 channels (OpenCV style)
  | 'bgra' // 4 channels
  | 'grayscale' // 1 channel
  | 'hsv' // 3 channels (Hue, Saturation, Value)
  | 'hsl' // 3 channels (Hue, Saturation, Lightness)
  | 'lab' // 3 channels (CIE LAB)
  | 'yuv' // 3 channels
  | 'ycbcr'; // 3 channels

Resize Strategies

type ResizeStrategy =
  | 'cover' // Fill target, crop excess (default)
  | 'contain' // Fit within target, padColor padding
  | 'stretch' // Stretch to fill (may distort)
  | 'letterbox'; // Fit within target, letterboxColor padding (YOLO-style)

interface ResizeOptions {
  width: number;
  height: number;
  strategy?: ResizeStrategy;
  padColor?: [number, number, number, number]; // RGBA for 'contain' (default: black)
  letterboxColor?: [number, number, number]; // RGB for 'letterbox' (default: [114, 114, 114])
}

Normalization

type NormalizationPreset =
  | 'scale' // [0, 1] range
  | 'imagenet' // ImageNet mean/std
  | 'tensorflow' // [-1, 1] range
  | 'raw' // No normalization (0-255)
  | 'custom'; // Custom mean/std

interface Normalization {
  preset?: NormalizationPreset;
  mean?: number[]; // Per-channel mean
  std?: number[]; // Per-channel std
}

Data Layouts

type DataLayout =
  | 'hwc' // Height × Width × Channels (default)
  | 'chw' // Channels × Height × Width (PyTorch)
  | 'nhwc' // Batch × Height × Width × Channels (TensorFlow)
  | 'nchw'; // Batch × Channels × Height × Width (PyTorch batched)

Utility Functions

// Get channel count for a color format
getChannelCount('rgb'); // 3
getChannelCount('rgba'); // 4
getChannelCount('grayscale'); // 1

// Check if error is a VisionUtilsError
isVisionUtilsError(error); // boolean

⚠️ Error Handling

All functions throw VisionUtilsError on failure:

import {
  getPixelData,
  isVisionUtilsError,
  VisionUtilsErrorCode,
} from 'react-native-vision-utils';

try {
  const result = await getPixelData({
    source: { type: 'file', value: '/nonexistent.jpg' },
  });
} catch (error) {
  if (isVisionUtilsError(error)) {
    console.log(error.code); // 'LOAD_FAILED'
    console.log(error.message); // Human-readable message
  }
}

Error Codes

Code Description
INVALID_SOURCE Invalid image source
LOAD_FAILED Failed to load image
INVALID_ROI Invalid region of interest
PROCESSING_FAILED Processing error
INVALID_OPTIONS Invalid options provided
INVALID_CHANNEL Invalid channel index
INVALID_PATCH Invalid patch dimensions
DIMENSION_MISMATCH Tensor dimension mismatch
EMPTY_BATCH Empty batch provided
UNKNOWN Unknown error

🔄 Platform Considerations

⚠️ Cross-Platform Variance

When processing identical images on iOS and Android, you may observe slight differences in pixel values (typically 5-15%). This is expected behavior due to:

Source Description
🖼️ Image Decoders iOS uses ImageIO, Android uses Skia - JPEG/PNG decoding differs slightly
📐 Resize Algorithms Bilinear/bicubic interpolation implementations vary between platforms
🎨 Color Space Handling sRGB gamma curves may be applied differently
🔢 Floating-Point Precision Minor rounding differences in native math operations

Impact on ML Models:

  • ✅ Relative comparisons work identically across platforms
  • ✅ Classification/detection results are consistent
  • ✅ Threshold-based logic (value > 0, value < 1) works the same
  • ⚠️ Exact numerical equality between platforms is not guaranteed

Best Practices:

// ✅ Good - threshold-based comparison
const isValid = result.actualMin >= 0 && result.actualMax <= 1;

// ✅ Good - relative comparison
const isBlurry = blurScore < 100;

// ⚠️ Avoid - exact value comparison across platforms
const isExact = result.actualMin === 0.0431; // May differ on iOS

This variance is standard across cross-platform ML libraries (TensorFlow Lite, ONNX Runtime, PyTorch Mobile).


⚡ Performance Tips

💡 Pro Tips for optimal performance

Tip Description
🎯 Resize Strategies Use letterbox for YOLO, cover for classification models
📦 Batch Processing Use batchGetPixelData with appropriate concurrency for multiple images
💾 Cache Management Call clearCache() when memory is constrained
⚙️ Model Presets Pre-configured settings are optimized for each model
🔄 Data Layout Choose the right dataLayout upfront to avoid unnecessary conversions

🤝 Contributing

We welcome contributions! Please see our guides:

Resource Description
📋 Development workflow How to set up your dev environment
🔀 Sending a pull request Guidelines for submitting PRs
📜 Code of conduct Community guidelines

📄 License

MIT


Made with ❤️ using create-react-native-library

⬆️ Back to Top

About

High-performance React Native SDK for image pre-processing for ML/AI pipelines. Native implementations for pixel extraction, tensor conversion, quantization, augmentation, and model-specific preprocessing (YOLO, MobileNet, etc.)

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Sponsor this project

Packages

No packages published