🥔 Tator – Local CLIP + SAM Annotation Toolkit

Tator is a single-machine annotation workflow that pairs a clean, fast, simple, web-based frontend with a FastAPI backend to deliver fast bounding-box annotation for images as well as some cool optional automation like class suggestions powered by CLIP and bbox cleanup / auto-suggestion using Segment Anything (SAM). The UI now bundles labeling, CLIP training, and model management in one place so you can iterate on datasets without leaving the browser.

Lightning-Fast Labeling Modes

Auto Class Corrector

Drop any tentative label and let CLIP clean it up instantly. Tator double-checks every box and snaps it to the class with the highest confidence so you can move through image stacks at warp speed.

Auto Box Refinement

Rough sketches are enough—SAM reshapes your loose bounding boxes into pixel-perfect rectangles while CLIP verifies the class. It feels like spell-check for geometry.

One-Click Point-to-Box

Tap once anywhere on the object and SAM conjures a tight box while CLIP names it. Perfect for those “I just need this labeled now” marathons.

Multi-Point Magic

When objects are tricky, sprinkle a few positive/negative points and let SAM sculpt the exact mask. Queue up new selections immediately—no waiting for the previous mask to land.

SAM Preload Boost

Enable preloading to keep the next image warmed up inside SAM. You’ll see progress ticks in the side rail and enjoy near-zero latency when you start pointing.

Key Features

One-click assists – auto class, SAM box/point refinements, and multi-point prompts with live progress indicators.
SAM 1 & SAM 2 – switch backends at runtime, optionally preload images into SAM to minimise round-trips.
Embedded CLIP trainer – start training jobs from the UI, watch convergence metrics, and reuse cached embeddings across runs.
Model switcher – activate new CLIP + regression pairs without restarting the server; metadata keeps backbone/labelmap in sync.
Prometheus metrics – enable /metrics via .env for operational visibility.

Repository Layout

app/, localinferenceapi.py – FastAPI app, SAM/CLIP orchestration, training endpoints.
ybat-master/ – browser UI (ybat.html, CSS/JS, assets).
tools/ – reusable training helpers and CLI scripts.
uploads/, crops/, corrected_labels/ – runtime artifacts, embedding cache, and exported crops (ignored by git).
AGENTS.md – contributor handbook and project conventions.

Prerequisites

Python 3.10 or newer (3.11+ recommended).
Optional GPU with CUDA for faster CLIP/SAM inference.
Model weights: sam_vit_h_4b8939.pth (SAM1) plus any desired SAM2 checkpoints/configs.

Quick Start

Create an environment

python3 -m venv .venv
source .venv/bin/activate        # Windows: .\\.venv\\Scripts\\activate

Install runtime deps
```
pip install -r requirements.txt
```
Torch wheels are hardware-specific; replace torch/torchvision with the build matching your CUDA/cuDNN stack if needed.

Install dev tooling (optional)

pip install -r requirements-dev.txt
pre-commit install

Fetch model weights
- Place sam_vit_h_4b8939.pth in the repo root.
- For SAM2, download a config + checkpoint pair (e.g. sam2_hiera_large.yaml, sam2_hiera_large.pt). Keep absolute paths handy.

Configure the backend

cp .env.example .env

Update .env:

LOGREG_PATH=./my_logreg_model.pkl
LABELMAP_PATH=./my_label_list.pkl
CLIP_MODEL_NAME=ViT-B/32
SAM_VARIANT=sam1                # or sam2
SAM_CHECKPOINT_PATH=./sam_vit_h_4b8939.pth
SAM2_CONFIG_PATH=/abs/path/to/sam2_config.yaml
SAM2_CHECKPOINT_PATH=/abs/path/to/sam2_weights.pt
ENABLE_METRICS=true             # optional Prometheus

You can also point CLIP_EMBED_CACHE to customise where training caches embeddings (./uploads/clip_embeddings by default).

Run the API
```
python -m uvicorn app:app --host 0.0.0.0 --port 8000
```
Watch the logs for confirmations that CLIP, SAM, and the logistic regression model loaded correctly.
Open the UI – load ybat-master/ybat.html (locally renamed “Tator 🥔”) in your browser.

Using the UI

Label Images Tab

Load images via the folder picker; per-image CLIP/SAM helpers live in the left rail.
Toggle Preload SAM to stream the next image into memory; the side progress bar shows status and cancels stale tasks when you move to another image.
Auto class, SAM box/point modes, and multi-point masks share a top progress indicator and support keyboard shortcuts documented in the panel footer.

Train CLIP Tab

Choose Image folder and Label folder via native directory pickers. Only files matching YOLO expectations are enumerated.
(Optional) Provide a labelmap so class ordering matches the labeling tab.
Configure solver, class weights, max iterations, batch size, convergence tolerance, and hard-example mining (with adjustable weights/thresholds) plus Cache & reuse embeddings (enabled by default).
Select an output directory; training writes {model,labelmap,meta}.pkl plus JSON metrics.
Click Start Training. Progress logs stream live, including per-iteration convergence and per-class precision/recall/F1. Completed runs appear in the summary panel with download links.

Cached embeddings live under uploads/clip_embeddings/<signature>/ and are keyed by dataset paths + CLIP backbone, independent of batch size. Toggling cache reuse will hit the store when inputs match.

CLIP Model Tab

Activate a classifier by picking its .pkl artifacts or by selecting a completed training run; metadata auto-selects the correct CLIP backbone and labelmap.
Guidance text explains backbone auto-detection when a .meta.pkl file accompanies the classifier.

Command-Line Training

The UI shares its engine with tools/train_clip_regression_from_YOLO.py:

python tools/train_clip_regression_from_YOLO.py \
  --images_path ./images \
  --labels_path ./labels \
  --labelmap_path my_label_list.pkl \
  --model_output my_logreg_model.pkl \
  --labelmap_output my_label_list.pkl \
  --solver saga --max_iter 1000 --device_override cuda

Use --resume-cache to reuse embeddings and --hard-example-mining to emphasise frequently misclassified classes.

Development & Testing

Run unit tests: pytest
Static checks: ruff check ., black --check ., mypy .
See AGENTS.md for coding conventions, PR expectations, and manual verification steps.

Troubleshooting

Torch install errors – install the wheel that matches your platform (pip install torch==<version>+cu118 ...).
SAM weights missing – confirm paths in .env. SAM2 requires both config and checkpoint.
Large datasets – enable caching (default) to avoid recomputing embeddings; caches are safe to prune manually.
Prometheus scraper fails – ensure /metrics is enabled and FastAPI is reachable; the endpoint now serves plaintext output compatible with Prometheus.

Credits

Built on top of YBAT, OpenAI CLIP, and Meta’s SAM / SAM2. Novel code is released under the MIT License (see below). GIF assets in this README showcase the Auto Class workflows.

LOP

[planned] SAM2 implementation is not properly tested yet - it's likely there are some issues to be cleaned up!
[planned] CLIP regression / training is in early stages - it works but it's likely we can develop some better default recipes
[up for grabs] We should add OBB support, it would be pretty simple to do in terms of UX and can really leverage SAM refinement
[up for grabs] Tracking / video sequence-annotation would be a cool longer-term objective.
[planned] Docs should be improved, especially around explaining how to run the backend on a remore GPU-enabled server for bigger labeling jobs.
[up for grabs] Clean multi-user support would be nice in the future, using a single backend with some UX / UI to deconflict and distribute work packages.
[planned] Faster, faster! Everything should be made faster to keep the UX enjoyable.
[up for grabs] The logic of running the training from a remote server (transferring images in base64) is untested, and most likely buggy. MRs welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
app		app
tools		tools
ybat-master		ybat-master
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
config.py		config.py
localinferenceapi.py		localinferenceapi.py
logging_setup.py		logging_setup.py
model.py		model.py
pyproject.toml		pyproject.toml
readme.md		readme.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
simple_tokenizer.py		simple_tokenizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🥔 Tator – Local CLIP + SAM Annotation Toolkit

Lightning-Fast Labeling Modes

Auto Class Corrector

Auto Box Refinement

One-Click Point-to-Box

Multi-Point Magic

SAM Preload Boost

Key Features

Repository Layout

Prerequisites

Quick Start

Using the UI

Label Images Tab

Train CLIP Tab

CLIP Model Tab

Command-Line Training

Development & Testing

Troubleshooting

Credits

LOP

License

About

Uh oh!

Releases

Packages

Languages

g0d/Tator

Folders and files

Latest commit

History

Repository files navigation

🥔 Tator – Local CLIP + SAM Annotation Toolkit

Lightning-Fast Labeling Modes

Auto Class Corrector

Auto Box Refinement

One-Click Point-to-Box

Multi-Point Magic

SAM Preload Boost

Key Features

Repository Layout

Prerequisites

Quick Start

Using the UI

Label Images Tab

Train CLIP Tab

CLIP Model Tab

Command-Line Training

Development & Testing

Troubleshooting

Credits

LOP

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages