Skip to content

john-rocky/SamKit

Repository files navigation

SAMKit

Segment Anything, right on your iPhone.

Install · Quick Start · Demo App · Download Models

SAMKit Demo


SAMKit brings Meta's Segment Anything Model to iOS as a native Swift package. Tap, draw, or describe any object to instantly segment it — all inference runs on-device with Core ML, no server required.

Features

  • Point & Box — Tap a point or drag a bounding box to segment any object
  • Text Prompt — Type "dog" or "red cup" to find and segment objects, powered by YOLO-World + CLIP
  • Subject Lift — Long-press to lift the segmented object from the scene, then copy, save, or share as a transparent PNG
  • Three Backbones — MobileSAM (fast, 23 MB), SAM2 Tiny (accurate, 76 MB), and FastSAM (YOLOv8-seg "segment everything", real-time)
  • Drop-in UI — Pre-built SwiftUI views for shipping a segmentation feature in minutes
  • Fully On-Device — Neural Engine / GPU acceleration, FP16, zero network calls

Requirements

  • iOS 15.0+
  • Xcode 14.0+
  • Swift 5.7+

Installation

1. Add the Swift Package

dependencies: [
    .package(url: "https://github.com/john-rocky/SamKit.git", from: "1.0.0")
]
Product What it does
SAMKit Core segmentation engine (point / box)
SAMKitGrounding Open-vocabulary text detection (YOLO-World + CLIP)
SAMKitUI Ready-made SwiftUI views

2. Download Models

Grab the .mlpackage files from Releases and drag them into your Xcode project.

MobileSAM — 23 MB (required)
File Size
mobile_sam_encoder.mlpackage 13 MB
mobile_sam_decoder.mlpackage 9.8 MB
mobile_sam_prompt_encoder_weights.json 40 KB
SAM2 Tiny — 76 MB (optional)
File Size
SAM2TinyImageEncoderFLOAT16.mlpackage 64 MB
SAM2TinyPromptEncoderFLOAT16.mlpackage 2.0 MB
SAM2TinyMaskDecoderFLOAT16.mlpackage 9.8 MB
FastSAM — 23 MB (s) / 138 MB (x) (optional)
File Size
FastSAM_s_<320|512|640>.mlpackage ~23 MB each
FastSAM_x_<size>.mlpackage ~138 MB each

YOLOv8-seg "segment everything" (ImageType input, exported per resolution). Use FastSAM_s for real-time / on-device, FastSAM_x for quality.

Grounding (YOLO-World + CLIP) — 148 MB (optional)
File Size
clip_text_encoder.mlpackage 121 MB
yoloworld_detector.mlpackage 25 MB
clip_vocab.json 1.6 MB
cv4_params.json 4 KB

Quick Start

Point & Box Segmentation

import SAMKit

let session = try SamSession(
    model: .bundled(.mobileSam),
    config: .bestAvailable
)

try session.setImage(cgImage)

let result = try session.predict(
    points: [SamPoint(x: 100, y: 200, label: .positive)]
)

let mask = result.masks.first!   // .cgImage, .alpha, .score

SAM2 Tiny

import SAMKit

let session = try Sam2Session(
    modelName: "SAM2Tiny",
    config: .bestAvailable
)

try session.setImage(cgImage)
let result = try session.predict(
    points: [SamPoint(x: 100, y: 200, label: .positive)]
)

FastSAM — Segment Everything

FastSAM is a YOLOv8-seg model: one forward pass segments every object, and a tap just selects one. The detector runs once in setImage, so taps and per-frame (real-time) use are cheap — ~30 fps on-device. Masks are assembled with a batched sgemm at proto resolution; FP16 model outputs (Float16) are bulk-converted, and an optional IoU tracker keeps colours stable on video.

import SAMKit

// ImageType models, exported per input size: "FastSAM_s_320" / "_512" / "_640" (or "_x_…").
let session = try FastSamSession(modelName: "FastSAM_s_512")
session.trackColors = true                 // stable per-object colours across frames (video/live)

// Real-time: feed the camera's pixel buffer directly (no CGImage round-trip)
try session.setImage(cvPixelBuffer)
let overlay = try session.segmentEverythingMask()          // CGImage? — composited overlay

// Photos: feed a CGImage
try session.setImage(cgImage)
let instances = try session.segmentEverything()            // [FastSamSession.Instance]
let picked    = try session.segment(at: CGPoint(x: 100, y: 200))   // tap to isolate one

Drives real-time camera, photo tap-to-pick, and offline video segmentation — see john-rocky/CoreML-Models → FastSAMDemo.

Text-Prompted Segmentation

import SAMKit
import SAMKitGrounding

let session = try TextSegmentationSession(
    groundingModel: .bundled(),
    samModel: .bundled(.mobileSam)
)

try session.setImage(cgImage)
let result = try session.segment(query: "dog, cat")
// result.masks      — segmentation masks
// result.detections — bounding boxes + labels

Subject Lifting

import SAMKit

// After segmentation, extract the object with transparency
let extracted = SamMask.extractObject(from: cgImage, masks: result.masks)
// Returns a CGImage with transparent background — ready for copy/save/share

Architecture

SAMKit/
├── runtime/apple/
│   ├── SAMKit/            # Core inference engine
│   ├── SAMKitGrounding/   # YOLO-World + CLIP text detection
│   └── SAMKitUI/          # SwiftUI components
├── models/converters/     # PyTorch -> Core ML conversion scripts
├── samples/ios-sample/    # Full demo app
└── CLAUDE.md

Sample App

git clone https://github.com/john-rocky/SamKit.git
open samples/ios-sample/SAMKitDemo.xcodeproj

Download models from Releases, add to the project, and run on a physical device.

Model Conversion

Convert from PyTorch checkpoints yourself:

cd models/converters
pip install -r requirements.txt

# MobileSAM
python convert_to_coreml.py --model mobile_sam

# SAM2 Tiny
python convert_sam2_to_coreml.py

# YOLO-World (S/M/L/X)
python convert_yoloworld_to_coreml.py --size s

# FastSAM (s and x)
python convert_fastsam_to_coreml.py

License

Apache 2.0 — see LICENSE for details.

Acknowledgments

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors