A sleek, mobile-friendly web UI for NVIDIA LocateAnything-3B — point it at an image, type what you want to find in plain language, and get bounding boxes back. Object detection, phrase grounding, OCR/text localization, document layout, GUI element grounding, and pointing — all from one prompt box.
Run it on your own NVIDIA GPU with a single docker compose up.
Important
Model license is non-commercial. LocateAnything-3B is released under the NVIDIA license for academic / research / non-commercial use only. This UI is a convenience wrapper — using it does not grant any commercial rights to the model. The UI code itself is Apache-2.0 (see LICENSE).
- One prompt, many tasks — detection, phrase grounding, text detection, document layout, GUI grounding, pointing.
- Speed/quality toggle —
fast/hybrid/slowParallel Box Decoding. - Search history — every search (image + prompt + results) is saved and re-runnable.
- Mobile-first — works on your phone, including the camera.
- GPU preflight — tells you up front whether your card is supported.
| Home — upload, prompt, task presets, decode mode | Detection — labeled reticle boxes show what was found |
![]() |
![]() |
System — GPU/health readout, GPU picker, and the configurable backend URL
Detecting chickens in a live scene — prompt to reticle boxes:
detection-final.mp4
- An NVIDIA GPU: Ampere / Lovelace / Hopper / Blackwell — RTX 30 / 40 / 50-series, A100, H100. ~12GB+ VRAM recommended. (The image ships CUDA 12.8 PyTorch with native kernels through Blackwell
sm_120, so 50-series cards work out of the box; pre-Ampere cards aren't supported by the model.) - Docker with the NVIDIA Container Toolkit (
--gpus all). Works on native Linux, WSL2, and Windows via Docker Desktop's WSL2 backend. - Linux / WSL2 host (the model is Linux + CUDA + BF16 only).
Rule of thumb: any NVIDIA GPU with CUDA compute capability ≥ 8.0 (Ampere or newer). The model uses BF16, which rules out pre-Ampere cards. ~12GB+ VRAM is recommended — 8GB cards can work but may run out of memory on large images.
| Architecture | Compute | Consumer (GeForce) | Workstation / Data center |
|---|---|---|---|
| Blackwell | 10.0 / 12.0 | RTX 5090, 5080, 5070 Ti, 5070, 5060 Ti, 5060 | RTX PRO 6000 Blackwell, B100, B200 |
| Hopper | 9.0 | — | H100, H200, GH200 |
| Ada Lovelace | 8.9 | RTX 4090, 4080 (Super), 4070 Ti/Super, 4070 (Super), 4060 Ti, 4060 | RTX 6000 Ada, L40/L40S, L4 |
| Ampere | 8.0 / 8.6 | RTX 3090 Ti, 3090, 3080 Ti, 3080, 3070 Ti, 3070, 3060 Ti, 3060 (12GB) | A100, A40, A30, A10, RTX A6000/A5000/A4000 |
Laptop/mobile and "Super"/"Ti" variants of the above are also fine. Not supported: GTX 10-series (Pascal), RTX 20-series / GTX 16-series (Turing) — no BF16. Run bash scripts/check-gpu.sh to confirm your card.
On Windows, first install Docker Desktop (choose the WSL 2 engine and enable WSL Integration). Optionally confirm Docker can see your GPU:
docker run --rm --gpus all nvidia/cuda:12.8.1-base-ubuntu22.04 nvidia-smiThen, in an empty folder, grab the end-user compose file and start it.
Linux / macOS / WSL:
mkdir la-app && cd la-app
curl -O https://raw.githubusercontent.com/gammahazard/locate-anything/main/docker-compose.ghcr.yml
mv docker-compose.ghcr.yml docker-compose.yml
docker compose pull
docker compose upWindows (PowerShell):
mkdir la-app; cd la-app
curl.exe -O https://raw.githubusercontent.com/gammahazard/locate-anything/main/docker-compose.ghcr.yml
Move-Item -Force docker-compose.ghcr.yml docker-compose.yml
docker compose pull
docker compose upOpen http://localhost:9080. On the first run the backend downloads the ~6GB
model into a cached volume — the UI shows "loading model…" until it's ready (a minute
or two), then later runs start in seconds. Pin a version with LA_VERSION=0.1.1 in a
.env (the image tag has no leading v); omit it for latest.
git clone git@github.com:gammahazard/locate-anything.git
cd locate-anything
cp .env.example .env
docker compose up # builds the images locallyThen open http://localhost:9080. The first run downloads the ~6GB model into a cached volume; later runs are fast.
Not sure your card is supported? Run the preflight check first:
bash scripts/check-gpu.shOnly the web UI port (9080) needs to be free — the backend isn't published by
default. If 9080 is taken, either set FRONTEND_PORT in .env, or let the helper
pick the next free port automatically:
scripts/run.sh # finds a free UI port and prints the URL
scripts/run.sh --ghcr # prebuilt images · --mock for no-GPUNo GPU, no model download — deterministic fake boxes so you can explore the whole UI on any machine:
docker compose -f docker-compose.yml -f docker-compose.mock.yml upNo compatible local card — or on a Mac? Run the stack on any Linux GPU box and expose the backend port, then open the UI anywhere and set the backend URL under System → backend url — your phone or laptop drives a remote GPU over HTTP:
docker compose -f docker-compose.yml -f docker-compose.expose-backend.yml upAlready running it? Don't re-create the folder or re-download the compose file — just pull the new release and restart from the same folder you first ran it in (that folder name is your Compose project, which keeps your cached model and history).
Prebuilt images (GHCR):
docker compose pull # fetch the new release
docker compose up # recreate containers with the new imagesBuilt from source:
git pull
docker compose up --buildThen hard-refresh your browser (Ctrl/Cmd+Shift+R) so it loads the new UI.
docker compose pullis required —upalone keeps the old cachedlatestimage.- Running from a different folder creates a new Compose project with empty volumes, so it re-downloads the ~6GB model. Use the same folder to reuse the cache.
- Pinned a version with
LA_VERSION? Bump it (e.g.LA_VERSION=0.1.3) or remove it to tracklatest.
See CONTRIBUTING.md for local dev (without Docker), commit conventions, and the test suite.
Built on NVIDIA's LocateAnything-3B. This project is not affiliated with or endorsed by NVIDIA.


