segment-3gs

A Gaussian-splat segmentation playground, powered by SpAItial AI worlds

Generate a 3D world on SpAItial AI, view its Gaussian splat in the browser, segment objects in it on demand with SAM3, lift the 2D masks into the 3D Gaussians, then recolor / isolate those Gaussians — "segment once, reuse again".

SpAItial AI is a world-model company: describe or upload an image and it generates a full 3D world you can explore and export. This demo is a showcase of what you can build on top of those worlds — it loads a .ply Gaussian splat you export straight from SpAItial AI.

This is a starting point to fork, not a product. It wires up a PlayCanvas splat viewer, a local Python SAM server, and a full 2D→3D lift + multi-view voting + true gsplat recolor pipeline so you can build your own splat-understanding tools on top.

What this is

The scene is static, so segmentation is "segment once, reuse forever" rather than per-frame. You point the camera (or run a batch sweep), the server runs a single SAM forward on the rendered frame, and the client lifts each 2D mask onto the splats it covers and stores the per-Gaussian index set. From then on the object lives in 3D — selectable, re-framable, recolorable — with no further inference.

See **[docs/ARCHITECTURE.md](./docs/ARCHITECTURE.md)** for the full write-up: component + sequence + lift + batch-sweep + recolor + coordinate diagrams, the /segment API contract, the 3D lift algorithm, per-Gaussian voting merge, camera/coordinate spaces, persistence, and model tiers.

Quickstart

Prerequisites

Node 18+ (recent LTS) and npm.
Python 3.11+ for the segmentation server (only needed for real masks — the viewer runs against a built-in mock without it).
macOS Apple Silicon (MPS) is the reference target; CUDA and CPU also work. SAM is seconds-per-call on MPS, so segmentation here is on-demand, not real-time.
A Hugging Face account with SAM3 license acceptance if you want the gated open-vocab sam3 tier (see below). The mock and mobile_sam tiers need no HF account.

1. Install

npm install

2. Generate your world with SpAItial AI

The repo intentionally ships without any splat assets (they're heavy and gitignored). The recommended way to get one is to generate a world with SpAItial AI and export it:

Create a world at app.spaitial.ai — describe a scene or upload an image, and SpAItial AI generates a full 3D world (generation takes a few minutes).
Download the world as a .ply Gaussian splat — the app exports a ready-to-use, PlayCanvas-compatible .ply directly (no conversion needed) — and drop it at public/world.ply. The viewer derives the pivot, initial zoom, and spatial-grid bounds from the splat centers at runtime — no per-scene config or manifest.
Export the world's simplified collision mesh (optional) and drop it at public/mesh_simplified.ply. It is used only at startup to ground the initial camera on the floor. If it's missing, the app still boots and grounds the spawn on the splat bounds instead.
The splat path lives in one place — SPLAT_URL at the top of [src/main.ts](./src/main.ts) — change it for a different filename.

So the full workflow is: create a world on SpAItial AI → export the splat .ply → export the collision mesh → drop both into public/.

Doing it programmatically? The SpAItial developer API (POST /v1/worlds) creates worlds from text or image input; download the splat from /v1/worlds/requests/:id/splat and start the simplified-mesh export with POST /v1/worlds/requests/:id/exports/mesh-simplified. See About SpAItial AI.

Only needed for the developer API path, which returns .spz (the app already gives you a .ply). Convert a raw .spz to a PlayCanvas-compatible .ply with:

python3 -m venv .venv-spz
.venv-spz/bin/pip install "spz @ git+https://github.com/nianticlabs/spz" numpy
.venv-spz/bin/python scripts/convert-spz-to-playcanvas-ply.py <input>.spz public/world.ply --max-splats 0

3. Run the SAM server

Pick a tier (details + latencies in [server/README.md](./server/README.md)):

cd server

# Tier 1 — mock: deterministic ellipses, ZERO ML deps. Instant, great for frontend work.
SAM_MODEL=mock ./run.sh            # http://localhost:8765

# Tier 2 — mobile_sam / fallback: real masks, NO gated weights (MobileSAM points/boxes +
# FastSAM text via CLIP). Sub-second per segment on MPS.
SAM_MODEL=mobile_sam ./run.sh

# Tier 3 — sam3: gated, open-vocab "segment all <noun>" text segmentation.
#   1) Accept the license at https://huggingface.co/facebook/sam3
#   2) Authenticate, then run:
hf auth login                      # paste a Hugging Face token
SAM_MODEL=sam3 ./run.sh            # ~3.4 GB download on first run; ~13 s/call on MPS

run.sh creates server/.venv, installs requirements.txt, and launches uvicorn on :8765. The viewer falls back to the built-in mock mask if the server is unreachable, so npm run dev always shows something.

4. Run the viewer

npm run dev

Open the printed Vite URL.

Features

Orbit / zoom / pan viewer around a movable pivot (no physics, no Rapier).
Floor-grounded spawn — robust percentile-trimmed splat bounds + an optional collision mesh place the initial camera at standing eye height.
Segment View (G) — type a concept (e.g. sofa), press G, and the current frame is segmented and lifted into the splats. Concept-only (open-vocab text) — no point/click prompts.
Batch Segment — a one-time 16-view look-around sweep ("stand in the center and turn around") over an editable concept list, with a live progress bar, cost estimate, and cancel.
Confidence-voted multi-view merge — masks of the same object from different views merge via per-Gaussian voting (vote counts + accumulated scores), pruning splats only one noisy view grabbed.
Multi-select — toggle any number of objects on; each lights up in its own color.
3D point-cloud highlight + tracking labels — selected objects render as an on-top point cloud with a DOM label that follows the object's centroid as the camera moves.
"Gaussian view" modes — beyond the point overlay, recolor the actual Gaussians: Point highlight (overlay only), Recolor + dim rest, or Isolate selected (hide everything else).
localStorage persistence — segmented objects, the selection, and the concept list all survive a page reload.

How it works

Render a Gaussian splat in PlayCanvas; build a uniform spatial grid over the world-space splat centers.
Capture the rendered frame (downscaled to ≤1024 px JPEG via OffscreenCanvas) plus the camera pose at trigger time.
Segment — POST the frame + concept text to the local SAM server; get back instance masks.
Lift each mask into 3D: cull candidate splats to the mask's frustum slab via the grid, project them with the captured pose, keep those inside the mask within a depth band.
Merge + vote the lifted indices into the registry (per-Gaussian voting across views).
Visualize — point-cloud highlight + tracking label, and optionally recolor / isolate the underlying Gaussians via a gsplatModifyVS shader-chunk override and a per-splat id texture.

Full detail, diagrams, and the API contract: **[docs/ARCHITECTURE.md](./docs/ARCHITECTURE.md)**.

Controls

Input	Action
Drag	orbit (yaw/pitch) around the pivot
Wheel	zoom
W A S D / arrows	pan the pivot
Q / E	lower / raise the pivot
Type a concept + G	segment the current view (open-vocab text prompt)
Batch Segment	one-time 16-view look-around sweep over the concept list
Gaussian view select	point highlight / recolor + dim / isolate

Requirements & limitations

Not real-time. SAM3 is ~~13 s/call warm (~~20 s+ cold) per concept on Apple MPS, so a full batch (16 × concepts calls) is a leave-it-running job (e.g. 8 concepts ≈ ~28 min on SAM3). This is why segmentation is on-demand and lifted-once, not a per-frame loop. mobile_sam is sub-second per segment.
SAM3 weights are gated (Hugging Face license acceptance + hf auth login), ~3.4 GB.
**mobile_sam text needs CLIP** weights (FastSAM + CLIP); on SSL-restricted networks the CLIP download can fail — point/box prompts still work without it.
"Segment everything" is unsupported by SAM3 (it requires a prompt); only FastSAM does it.
Main-thread lift. The lift projects on the main thread; grid culling keeps it cheap, but a Web Worker is the natural next step.
SAM3 video/tracking is CUDA-only and unused here.

License & attribution

This project's code is MIT licensed — fork it freely.

Important: the SAM3 model weights are not MIT. They are distributed under Meta's SAM license and are gated on Hugging Face — you must accept that license yourself to download and use them. This repo ships no weights.

Credits:

SpAItial AI — 3D world generation; the Gaussian-splat worlds (world.ply) and collision meshes (mesh_simplified.ply) this demo loads are created and exported with SpAItial AI. This project exists to showcase those worlds.
PlayCanvas — WebGL engine + Gaussian-splat rendering.
Ultralytics — MobileSAM / FastSAM (the no-gated-weights fallback tier).
Meta AI — Segment Anything 3 — open-vocabulary concept segmentation (the gated sam3 tier).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
scripts		scripts
server		server
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
segmen.code-workspace		segmen.code-workspace
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

segment-3gs

What this is

Quickstart

Prerequisites

1. Install

2. Generate your world with SpAItial AI

3. Run the SAM server

4. Run the viewer

Features

How it works

Controls

Requirements & limitations

License & attribution

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

segment-3gs

What this is

Quickstart

Prerequisites

1. Install

2. Generate your world with SpAItial AI

3. Run the SAM server

4. Run the viewer

Features

How it works

Controls

Requirements & limitations

License & attribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages