Segment Anything, right on your iPhone.
Install · Quick Start · Demo App · Download Models
SAMKit brings Meta's Segment Anything Model to iOS as a native Swift package. Tap, draw, or describe any object to instantly segment it — all inference runs on-device with Core ML, no server required.
- Point & Box — Tap a point or drag a bounding box to segment any object
- Text Prompt — Type
"dog"or"red cup"to find and segment objects, powered by YOLO-World + CLIP - Subject Lift — Long-press to lift the segmented object from the scene, then copy, save, or share as a transparent PNG
- Three Backbones — MobileSAM (fast, 23 MB), SAM2 Tiny (accurate, 76 MB), and FastSAM (YOLOv8-seg "segment everything", real-time)
- Drop-in UI — Pre-built SwiftUI views for shipping a segmentation feature in minutes
- Fully On-Device — Neural Engine / GPU acceleration, FP16, zero network calls
- iOS 15.0+
- Xcode 14.0+
- Swift 5.7+
dependencies: [
.package(url: "https://github.com/john-rocky/SamKit.git", from: "1.0.0")
]| Product | What it does |
|---|---|
SAMKit |
Core segmentation engine (point / box) |
SAMKitGrounding |
Open-vocabulary text detection (YOLO-World + CLIP) |
SAMKitUI |
Ready-made SwiftUI views |
Grab the .mlpackage files from Releases and drag them into your Xcode project.
MobileSAM — 23 MB (required)
| File | Size |
|---|---|
mobile_sam_encoder.mlpackage |
13 MB |
mobile_sam_decoder.mlpackage |
9.8 MB |
mobile_sam_prompt_encoder_weights.json |
40 KB |
SAM2 Tiny — 76 MB (optional)
| File | Size |
|---|---|
SAM2TinyImageEncoderFLOAT16.mlpackage |
64 MB |
SAM2TinyPromptEncoderFLOAT16.mlpackage |
2.0 MB |
SAM2TinyMaskDecoderFLOAT16.mlpackage |
9.8 MB |
FastSAM — 23 MB (s) / 138 MB (x) (optional)
| File | Size |
|---|---|
FastSAM_s_<320|512|640>.mlpackage |
~23 MB each |
FastSAM_x_<size>.mlpackage |
~138 MB each |
YOLOv8-seg "segment everything" (ImageType input, exported per resolution). Use FastSAM_s for
real-time / on-device, FastSAM_x for quality.
Grounding (YOLO-World + CLIP) — 148 MB (optional)
| File | Size |
|---|---|
clip_text_encoder.mlpackage |
121 MB |
yoloworld_detector.mlpackage |
25 MB |
clip_vocab.json |
1.6 MB |
cv4_params.json |
4 KB |
import SAMKit
let session = try SamSession(
model: .bundled(.mobileSam),
config: .bestAvailable
)
try session.setImage(cgImage)
let result = try session.predict(
points: [SamPoint(x: 100, y: 200, label: .positive)]
)
let mask = result.masks.first! // .cgImage, .alpha, .scoreimport SAMKit
let session = try Sam2Session(
modelName: "SAM2Tiny",
config: .bestAvailable
)
try session.setImage(cgImage)
let result = try session.predict(
points: [SamPoint(x: 100, y: 200, label: .positive)]
)FastSAM is a YOLOv8-seg model: one forward pass segments every object, and a tap just selects
one. The detector runs once in setImage, so taps and per-frame (real-time) use are cheap —
~30 fps on-device. Masks are assembled with a batched sgemm at proto resolution; FP16 model
outputs (Float16) are bulk-converted, and an optional IoU tracker keeps colours stable on video.
import SAMKit
// ImageType models, exported per input size: "FastSAM_s_320" / "_512" / "_640" (or "_x_…").
let session = try FastSamSession(modelName: "FastSAM_s_512")
session.trackColors = true // stable per-object colours across frames (video/live)
// Real-time: feed the camera's pixel buffer directly (no CGImage round-trip)
try session.setImage(cvPixelBuffer)
let overlay = try session.segmentEverythingMask() // CGImage? — composited overlay
// Photos: feed a CGImage
try session.setImage(cgImage)
let instances = try session.segmentEverything() // [FastSamSession.Instance]
let picked = try session.segment(at: CGPoint(x: 100, y: 200)) // tap to isolate oneDrives real-time camera, photo tap-to-pick, and offline video segmentation — see
john-rocky/CoreML-Models → FastSAMDemo.
import SAMKit
import SAMKitGrounding
let session = try TextSegmentationSession(
groundingModel: .bundled(),
samModel: .bundled(.mobileSam)
)
try session.setImage(cgImage)
let result = try session.segment(query: "dog, cat")
// result.masks — segmentation masks
// result.detections — bounding boxes + labelsimport SAMKit
// After segmentation, extract the object with transparency
let extracted = SamMask.extractObject(from: cgImage, masks: result.masks)
// Returns a CGImage with transparent background — ready for copy/save/shareSAMKit/
├── runtime/apple/
│ ├── SAMKit/ # Core inference engine
│ ├── SAMKitGrounding/ # YOLO-World + CLIP text detection
│ └── SAMKitUI/ # SwiftUI components
├── models/converters/ # PyTorch -> Core ML conversion scripts
├── samples/ios-sample/ # Full demo app
└── CLAUDE.md
git clone https://github.com/john-rocky/SamKit.git
open samples/ios-sample/SAMKitDemo.xcodeprojDownload models from Releases, add to the project, and run on a physical device.
Convert from PyTorch checkpoints yourself:
cd models/converters
pip install -r requirements.txt
# MobileSAM
python convert_to_coreml.py --model mobile_sam
# SAM2 Tiny
python convert_sam2_to_coreml.py
# YOLO-World (S/M/L/X)
python convert_yoloworld_to_coreml.py --size s
# FastSAM (s and x)
python convert_fastsam_to_coreml.pyApache 2.0 — see LICENSE for details.
- Segment Anything & SAM 2 — Meta AI
- MobileSAM — Chaoning Zhang et al.
- YOLO-World — Tencent AILab
- OpenAI CLIP

