Local, offline object redaction for video, with one command.
demo.mp4
Note The demo above is sped up. Actual runtime depends on hardware, the clip was processed on an M1 Pro MBP (16 GB RAM); expect faster results on better hardware. See the Performance table for details.
- Install uv (if you don't have it):
macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | shWindows:
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"- Clone and install
git clone https://github.com/ssrajadh/sentryblur.git
cd sentryblur
# Faces only (base)
uv tool install .
# Or, all detectors (faces + plates + natural-language prompt) in one command:
uv tool install '.[plates,prompt]' --with git+https://github.com/facebookresearch/sam2.gituv tool install re-resolves on each invocation rather than merging, so multiple sequential installs would silently drop earlier extras. Use a single command with all the extras you want, plus --with for SAM 2 (which isn't on PyPI). To upgrade later, re-run the same command.
- Run
sentryblur faces video.mp4 # → video_blurred.mp4
sentryblur plates video.mp4 # → video_blurred.mp4
sentryblur prompt video.mp4 "road signs" # → video_blurred.mp4
sentryblur faces video.mp4 --preview # → video_preview.jpg (3x3 contact sheet)
sentryblur prompt video.mp4 "phone screen" --previewffmpeg must be on PATH (brew install ffmpeg on macOS, apt install ffmpeg on Ubuntu).
- Local, offline. No API keys, no cloud upload. Footage never leaves your machine. Detection runs on CPU by default; CUDA optional.
- Preview-first.
--previewrenders a 3x3 contact sheet of detector output across keyframes so you can verify quality before committing to a full render of sensitive footage. - Composes with SentrySearch. Search footage with SentrySearch, then redact the matching clip with
sentryblur faces --last,sentryblur plates --last, orsentryblur prompt --last "..."—--lastpicks up the most recent clip SentrySearch saved, no path-passing required.
From a local clone. Pick one — uv tool install re-resolves on each invocation, so running multiple installs sequentially will drop earlier extras. Always pass every extra you want in a single command.
# Faces only (~50 MB of deps)
uv tool install .
# Faces + plates
uv tool install '.[plates]'
# Faces + plates + prompt (natural-language redaction).
# SAM 2 is not on PyPI, so pass it via --with so it lands in the same uv-managed
# venv as sentryblur itself. A bare `pip install git+...sam2.git` writes to
# whatever Python is first on your PATH and won't be visible to `sentryblur`.
uv tool install '.[plates,prompt]' --with git+https://github.com/facebookresearch/sam2.gitHardware: prompt requires an NVIDIA GPU or Apple Silicon (16 GB+ unified memory). CPU is not supported.
To upgrade after pulling new commits, re-run the same command.
For development (editable install with test deps):
uv sync --group test
uv run pytestSystem prerequisites:
- Python 3.11+
ffmpegon PATH- uv (the install snippet in Quickstart gets you this)
First run of each detector downloads weights:
| Detector | Size | Cache location |
|---|---|---|
| SCRFD (faces) | ~16 MB | ~/.insightface/ |
| YOLOv9-T (plates) | ~5 MB | ~/.cache/sentryblur/ |
$ sentryblur faces video.mp4
Loading face detector...
Blurring video.mp4 -> video_blurred.mp4
Detecting [####################################] 100%
Done. 720 frames, coverage 31.5%, 18.2s (0.6x realtime)
Output: /path/to/video_blurred.mp4$ sentryblur plates video.mp4
Loading plate detector...
sentryblur: loading yolo-v9-t-384-license-plates-end2end (downloads ~5 MB on first run)...
Blurring video.mp4 -> video_blurred.mp4
Detecting [####################################] 100%
Done. 720 frames, coverage 84.2%, 22.7s (0.8x realtime)Natural-language redaction for objects outside the closed faces/plates vocabulary — phone screens, monitors, name tags, specific people. Grounding DINO finds the target boxes from your text prompt on frame 0; SAM 2's video predictor then propagates pixel-precise masks across every frame.
$ sentryblur prompt video.mp4 "license plate"
This 30-second clip will take approximately 12 minutes on Apple M1 Pro / M1 Max.
Continue? [y/N]: y
Blurring video.mp4 -> video_blurred.mp4
Done. 900 frames, coverage 18.6%, 758.4s (25.3x realtime)
Output: /home/user/video_blurred.mp4--preview skips SAM 2 entirely and only runs DINO on 9 keyframes — fast (~5s after model load) and gives you a contact sheet to verify the prompt before committing to the full propagation:
$ sentryblur prompt video.mp4 "phone screen" --preview
Rendering preview video.mp4 -> video_preview.jpg
Preview saved to video_preview.jpg. Review detections, then re-run without --preview to render the full video.--last picks up the most recent clip SentrySearch saved, so you can search-then-redact without retyping paths:
$ sentryblur prompt --last "license plate"
Last clip: /home/user/match_2026-04-15_14-30_02m15s-02m45s.mp4 (saved 2m ago by sentrysearch)
Process this clip? [Y/n]: y
This 30-second clip will take approximately 12 minutes on Apple M1 Pro / M1 Max.
Continue? [y/N]: y
Blurring match_2026-04-15_14-30_02m15s-02m45s.mp4 -> match_2026-04-15_14-30_02m15s-02m45s_blurred.mp4
Done. 900 frames, coverage 22.1%, 762.7s (25.4x realtime)
Output: /home/user/match_2026-04-15_14-30_02m15s-02m45s_blurred.mp4Pass -y/--yes to skip both the --last and duration confirmation prompts (useful for scripts).
prompt is dominated by per-frame inference (Grounding DINO + SAM 2 video propagation), so processing time scales with frame count, not just clip duration. A 60 fps clip takes roughly 2x longer than a 30 fps clip of the same duration.
On Apple M1 Pro / M1 Max (validated): a 10-second 30 fps clip (300 frames) processes in approximately 4 minutes.
| Hardware | Per-frame cost | 10s @ 30fps |
|---|---|---|
| NVIDIA GPU (16GB+) | ~50-150 ms | ~1-2 min |
| NVIDIA GPU | ~150-400 ms | ~1-3 min |
| Apple Silicon M2 Pro+ | ~400-800 ms | ~2-4 min |
| Apple M1 Pro / M1 Max | ~0.8 s | ~4 min (validated) |
| Apple M1 (base) | ~1.5-2.5 s | ~8-13 min |
| CPU | not supported | — |
Only the M1 Pro / M1 Max row is from direct measurement; other rows are estimates and will be revised as users contribute real numbers.
Shared by faces, plates, and prompt unless noted.
| Flag | Default | Purpose |
|---|---|---|
INPUT_PATH |
— | Source video (positional, required unless --last). prompt also takes a positional TEXT_PROMPT after the video. |
--last |
off | Use the most recent clip saved by SentrySearch (sentrysearch search --save-top). Mutually exclusive with INPUT_PATH. |
-y, --yes |
off | Skip the --last and (for prompt) duration-confirmation prompts. |
-o, --output PATH |
<input>_blurred.<ext> |
Output path. With --preview, defaults to <input>_preview.jpg. |
--dilation N |
15 | Pixels to grow each detected box/mask. Larger = safer margin around the target. |
--window N |
3 | Temporal smoothing window in frames. The mask for frame i is the union across [i-N, i+N], so a single-frame detection miss gets filled by its neighbors. |
--blur-mode MODE |
pixelate |
Redaction style: pixelate (mosaic) or gaussian. Pixelate is the standard for redaction and harder to see through; gaussian can look weak on small targets. |
--pixel-size N |
16 | Mosaic block size in pixels (pixelate mode only). Smaller = stronger redaction. |
--blur-strength N |
51 | Gaussian kernel size (gaussian mode only). Must be odd; even values are bumped up. |
--conf F |
0.25 | Detector confidence threshold. Lower = more recall, more false positives. (faces, plates only) |
--gpu |
off | Use CUDA for detection. Apple MPS not yet wired. (faces, plates only — prompt always uses GPU/MPS via torch auto-detect) |
--preview |
off | Render a 3x3 keyframe contact sheet with bounding boxes instead of blurring the full video. |
-v, --verbose |
off | Print progress (tqdm) and timing info to stderr. |
Sensitive footage warrants a sanity check before committing to a long render:
$ sentryblur faces important.mp4 --preview
Loading face detector...
Rendering preview important.mp4 -> important_preview.jpg
Preview saved to important_preview.jpg. Review detections, then re-run without --preview to render the full video.Open the JPG, verify the boxes land where you expect, then re-run without --preview.
1. Detect. Each frame is fed to a per-target model — SCRFD (insightface) for faces, YOLOv9-T (open-image-models) for plates. The detector returns axis-aligned boxes plus confidences. No tracking, no Re-ID — just per-frame inference.
2. Mask and dilate. Each box becomes a binary mask the size of the frame. Masks are dilated by --dilation pixels with an elliptical kernel, so the blur region extends slightly past the detected box. This catches the case where the detector's bounding box clips the edge of a face or plate.
3. Temporal smooth. For frame i, the final mask is the union of dilated masks across [i-window, i+window]. A single-frame detection miss — which is the most common failure mode and the most damaging one for redaction — gets filled in by its neighbors as long as either side detected. --window 3 covers a 7-frame radius (~230 ms at 30 fps), wide enough for transient miss but narrow enough not to over-blur during fast motion.
4. Redact and reassemble. The masked region of each frame is replaced — by default with a pixel mosaic (--pixel-size), or with a Gaussian blur (--blur-strength) if --blur-mode gaussian. Pixelation is the default because Gaussian blur on small targets (faces in dashcams are often 30–60 px) tends to collapse to a flat blob that looks weak; a mosaic preserves the visual signal "this region is redacted." The unmasked region is kept untouched, and ffmpeg reassembles to H.264 at CRF 18. The output is written atomically — to a tempfile, then mv'd into place — so a crash mid-render never overwrites your intended output with a half-finished file.
prompt. Grounding DINO runs on frame 0 with the user's text prompt, producing bounding boxes. The boxes are fed to SAM 2's video predictor, which propagates pixel-precise masks across all frames using its built-in tracking. Masks then go through the same dilation and temporal smoothing as faces/plates before blur and reassembly. SAM 2 is invoked once per video (not per frame) and uses CPU offload for the video and state buffers — without these flags, MPS unified memory is exceeded on clips longer than ~20 seconds.
This section is honest, not aspirational. Read it before trusting SentryBlur with anything sensitive.
- Detection misses small or distant targets, faces in profile, plates in low light, and partially occluded objects. Temporal smoothing and dilation reduce miss rate but do not eliminate it. Always run
--previewon sensitive footage and visually verify before committing to the full render. - Audio is not redacted. Voices stating license plate numbers, addresses, or names will pass through untouched. SentryBlur is a video-frame redaction tool. Use a separate audio editor if your footage has identifying speech.
- One missed frame defeats the purpose. A leak of even a single unblurred frame in a published clip can be screenshotted and zoomed. SentryBlur reduces miss probability but cannot guarantee zero misses on arbitrary footage. For high-stakes redaction, watch the output at 0.25× speed before publishing.
- Closed vocabulary. The
facesandplatesdetectors only know their respective targets. Other identifying objects (visible monitor contents, name tags, building signage) need thepromptsubcommand or manual masking.
- Frame-0 dependency.
promptdepends on Grounding DINO finding the target in the first frame of the clip. If the target is occluded, off-screen, or too small/distant in frame 0, the run fails immediately withCould not find <prompt>. Use a clip where the target is visible at the start, or trim withffmpegfirst. - GPU/Apple Silicon only.
promptrequires CUDA or MPS. CPU is rejected up front.facesandplatesstill run on CPU. - Slow.
promptis one to two orders of magnitude slower thanfaces/plates(see the Performance table). Suitable for one-off tasks, not batch workflows.
- Faster
promptmode. Per-N-frame DINO detection plus a lightweight tracker between detections, instead of full SAM 2 propagation on every frame. Should bringpromptcloser tofaces/platesthroughput on long clips.
- insightface — SCRFD face detector.
- open-image-models — MIT-licensed YOLOv9-T license plate detector by ankandrew. Chosen specifically because the weights are MIT-licensed (most YOLOv8 plate weights on HuggingFace inherit Ultralytics's AGPL).
- ffmpeg — frame extraction and video reassembly.
- Grounding DINO — IDEA-Research, accessed via HuggingFace
transformers(IDEA-Research/grounding-dino-tiny). - SAM 2 — Meta AI Research.