An open, hackable take on connected-warfare-style perception — running on a $500 dev kit.
Multi-sensor fusion · Cross-camera tracking · Tactical AR HUD · Edge inference on Jetson Orin Nano
Inspired by Anduril Connected Warfare & the Lattice OS concept —
built as a community reference implementation, not affiliated with Anduril Industries.
Demo · Inspiration · Features · Architecture · Quick Start · Deployment · Testing · API · Troubleshooting
Watch on YouTube — prototype iteration 1.
OVERWATCH is a publicly-available reference implementation of the multi-sensor situational-awareness concept popularised by Anduril's Connected Warfare and its Lattice software platform — the idea that a network of low-cost, heterogeneous sensors can be fused at the edge into a single, AI-driven view of the battlespace.
This project takes that idea and runs with it on commodity hardware:
- A $500 NVIDIA Jetson Orin Nano instead of a hardened tactical server
- IP webcams + mobile phone cameras instead of dedicated military-grade sensors
- YOLOv8 + Kalman + homography instead of classified perception stacks
- A FastAPI/React stack instead of proprietary tactical software
The visual language — diamond IFF markers, compass ribbon, threat rings, ghost predictions — is inspired by Anduril's EagleEye HUD aesthetic (one of the publicly-shown UI surfaces of Lattice). It is not a clone, not affiliated with or endorsed by Anduril Industries, and not a substitute for their products. Trademarks belong to their respective owners.
Scope honesty: this is a research/educational project. It demonstrates the principles of connected sensing — sensor fusion, cross-camera re-ID, edge inference, real-time broadcast — at a scale that fits in a backpack. It is not military-grade, not C2-system-grade, and not certified for any operational use.
| Anduril Lattice / Connected Warfare | OVERWATCH (this repo) | |
|---|---|---|
| Goal | Unified situational awareness across heterogeneous sensors | Same — at hobbyist scale |
| Sensor mix | Cameras, radar, RF, sonar, drones, ground vehicles, … | IP cameras + phone cameras (extensible) |
| Fusion | Proprietary, classified | Open: Kalman + Hungarian + homography |
| Edge compute | Hardened tactical hardware | Jetson Orin Nano dev kit |
| HUD style | EagleEye tactical UI | EagleEye-inspired canvas overlay |
| Autonomy | Multi-asset autonomous teaming | Single-pipeline perception only |
| Use | Defense / national security | Research, learning, civilian situational awareness |
| Source | Closed | Public on GitHub (license: see LICENSE) |
If you're building something in this space — researchers, students, civilian defense-tech tinkerers, public-safety folks — this repo is meant to be a starting point you can fork, hack on, and learn from.
OVERWATCH is a real-time multi-camera situational awareness platform built for edge deployment on NVIDIA Jetson Orin Nano. It fuses video from IP cameras and mobile phones into a unified world model using YOLOv8 detection, Hungarian-assignment tracking, adaptive Kalman filtering, and cross-camera appearance re-identification — all at TensorRT FP16 speeds.
The system runs a singleton perception pipeline: detection, tracking, and fusion execute once per tick regardless of how many viewers are connected, then broadcast pre-serialized snapshots to all clients over binary WebSocket.
1 camera + 10 viewers = 1 GPU inference, not 10.
| Capability | Implementation |
|---|---|
| Person detection | YOLOv8n with NMS-level class filter (classes=[0]) — person-only |
| TensorRT FP16 | .engine export on Jetson — ~8 MiB, sub-10 ms inference |
| Hungarian tracking | scipy.optimize.linear_sum_assignment — 0.6 × IoU + 0.4 × cosine appearance cost |
| Tracker fallback chain | DeepSORT (MobileNet) → Hungarian (scipy) → Centroid |
| Adaptive Kalman filter | 6-state [x, y, z, vx, vy, vz] — measurement noise scales by confidence, bbox area, sensor trust |
| Cross-camera re-ID | 64-dim HSV histogram descriptors, L2-normalized, EMA-smoothed (α = 0.3) |
| Sensor trust scoring | Per-sensor trust ∈ [0.1, 1.0] — increases for consistent measurements, decays for innovation outliers |
| Cross-camera homography | Self-calibrating ground-plane H from shared foot-point observations via cv2.findHomography + RANSAC |
| 3-path ghost predictions | (A) homography projection from any source camera (green), (B) pixel extrapolation with adaptive budget (red), (C) world-coordinate pinhole projection fallback (orange) |
| Capability | Implementation |
|---|---|
| Multi-camera | Up to 4 concurrent streams (physical MJPEG/RTSP + mobile virtual cameras) |
| Mobile streaming | Phone browsers → getUserMedia → binary JPEG over WebSocket → VirtualCamera |
| GPS + IMU fusion | Mobile geolocation → equirectangular projection; DeviceOrientationEvent → camera rotation |
| AR overlays | Canvas-based: cyan detection brackets, amber track boxes, green/orange/red ghost predictions |
| Binary protocol | msgpack-serialized snapshots — zero-copy broadcast to all viewers |
| SSL/TLS | Self-signed certificates with SAN for LAN IP access (required for getUserMedia) |
| Optional JWT auth | Default-off; enable with AUTH_ENABLED=true. Token issuance via POST /api/token; WS endpoints accept ?token=... query param |
| Edge deployment | Automated SSH/SFTP deployment to Jetson Orin Nano via paramiko, with atomic staging swap and --rollback |
┌─────────────────────────────────┐
│ OVERWATCH v2.0.0 │
└─────────────────────────────────┘
╔═══════════════╗ ╔═══════════════════════════════════════════════════╗
║ DATA SOURCES ║ ║ JETSON ORIN NANO (backend :8000) ║
╠═══════════════╣ ╠═══════════════════════════════════════════════════╣
║ ║ ║ ║
║ 📷 IP Camera ─────────► CameraCapture (OpenCV, MJPEG/RTSP) ║
║ ║ ║ │ ║
║ 📱 Mobile ─────────► VirtualCamera (binary JPEG push) ║
║ Phone ║ws/cam ║ │ + GPS/IMU sensor data ║
║ ║ ║ ▼ ║
║ ║ ║ ┌──────────────────────────────────────────┐ ║
║ ║ ║ │ PerceptionPipeline (singleton) │ ║
║ ║ ║ │ │ ║
║ ║ ║ │ 1. DETECT YOLOv8n TensorRT FP16 │ ║
║ ║ ║ │ + HSV appearance features │ ║
║ ║ ║ │ │ │ ║
║ ║ ║ │ 2. TRACK Hungarian assignment │ ║
║ ║ ║ │ IoU + cosine appearance │ ║
║ ║ ║ │ │ │ ║
║ ║ ║ │ 3. FUSE Adaptive Kalman 6-state │ ║
║ ║ ║ │ Cross-camera matching │ ║
║ ║ ║ │ Sensor trust scoring │ ║
║ ║ ║ │ │ │ ║
║ ║ ║ │ 4. SNAPSHOT Pre-serialized msgpack │ ║
║ ║ ║ └──────────────┬───────────────────────────┘ ║
║ ║ ║ │ ║
╚═══════════════╝ ║ ▼ broadcast ║
║ WebSocketManager (/ws, msgpack binary) ║
║ │ │ │ ║
╚═════════╪═══════════╪═══════════╪═════════════════╝
│ │ │
┌─────────▼──┐ ┌─────▼──┐ ┌─────▼─────┐
│ Viewer 1 │ │Viewer 2│ │ Viewer N │
│ React │ │ React │ │ React │
│ AR Canvas │ │ ... │ │ ... │
└────────────┘ └────────┘ └───────────┘
OVERWATCH runs a single shared pipeline rather than per-viewer. The PerceptionPipeline singleton executes detect → track → fuse once per tick, produces a PerceptionSnapshot with pre-serialized msgpack packets, and all connected viewers read from the latest snapshot.
- 1 camera + 10 viewers = 1 GPU inference
- Zero-copy broadcast via pre-serialized binary packets
- Slow viewers gracefully skip intermediate frames (per-client 2 s send timeout)
OVERWATCH/
├── backend/ # FastAPI + perception engine
│ ├── main.py # App entry, lifespan, REST + WS endpoints
│ ├── requirements.txt # Python deps (CPU/Windows dev)
│ ├── requirements-jetson.txt # Jetson Orin Nano deps (pinned)
│ ├── .env.example # Config template
│ └── app/
│ ├── domain/entities.py # Detection, Track, WorldObject, ...
│ ├── application/
│ │ ├── ports.py # Repository interfaces
│ │ └── services.py # PerceptionPipelineService
│ └── infrastructure/
│ ├── auth.py # Optional JWT verify/issue
│ ├── camera_adapter.py # OpenCV capture + virtual cameras
│ ├── config_adapter.py # Pydantic settings
│ ├── container.py # DI container
│ ├── detection_adapter.py # YOLO wrapper
│ ├── frame_encoder_adapter.py # JPEG encode
│ ├── tracking_adapter.py # Hungarian + DeepSORT
│ ├── websocket_adapter.py # msgpack broadcast
│ └── world_model_adapter.py # Kalman fusion + homography
│ └── tests/
│ ├── conftest.py # Shared fixtures
│ └── unit/ # 57 unit tests
│
├── frontend/ # React 18 admin dashboard
│ ├── package.json
│ └── src/
│ ├── pages/
│ │ ├── AdminDashboard.jsx # Main camera grid
│ │ └── MobileCamera.jsx # Phone camera streaming UI
│ ├── components/
│ │ ├── CameraDisplay.jsx # Canvas AR overlay renderer
│ │ ├── ErrorBoundary.jsx # Top-level error fallback
│ │ ├── StatsPanel.jsx
│ │ └── ConnectionStatus.jsx
│ ├── application/hooks/ # useCameraData, useWebSocket, useSystemStats
│ └── infrastructure/ # websocketAdapter, cameraStreamAdapter, apiAdapter
│
├── scripts/ # Deployment & ops
│ ├── _jetson_common.py # Shared SSH/SFTP helper (env-driven creds)
│ ├── deploy_jetson.py # Atomic deploy with --rollback
│ ├── restart_jetson.py # Quick backend restart
│ ├── check_logs.py / check_status.py
│ ├── ws_test.py
│ └── archive/ # Retired/duplicate scripts (reference only)
│
├── certs/ # SSL certificates (gitignored)
├── .github/workflows/ci.yml # GitHub Actions test runner
├── pyproject.toml # pytest + project metadata
└── README.md
- Python 3.10+ with pip
- Node.js 18+ with npm
- NVIDIA Jetson Orin Nano for production, or any machine with CUDA for development
git clone https://github.com/mandarwagh9/overwatch.git
cd overwatch# Backend
cd backend
pip install -r requirements.txt
python main.py
# Frontend (new terminal)
cd frontend
npm install
npm startOpen https://localhost:3000 — accept the self-signed certificate warning.
cd frontend && npm install && npm run build && cd ..
cd backend && python main.pyBackend serves both the React app and the API at https://localhost:8000.
Credentials are read from environment — never hardcoded:
export JETSON_HOST=192.168.1.10 # default if unset
export JETSON_USER=mandar # default if unset
export JETSON_PASS=... # or use JETSON_KEY=/path/to/id_rsa
python scripts/deploy_jetson.pyThe script:
- Connects via SSH/SFTP using paramiko
- Uploads backend, frontend build, certs to
<remote>.new/(staging) - Atomically swaps
<remote>.new→<remote>, keeping previous version at<remote>.bak - Generates a fresh
JWT_SECRETand writes achmod 600.env - Installs Python dependencies
- Starts the backend
python scripts/deploy_jetson.py --rollbackSwaps the last .bak directory back into place. Use after a bad deploy.
# Restart backend without redeploying
python scripts/restart_jetson.py
# Tail logs
python scripts/check_logs.py
# Quick status
python scripts/check_status.py| Service | URL |
|---|---|
| Admin Dashboard | https://<jetson-host>:8000 |
| Mobile Camera (standalone) | https://<jetson-host>:8000/mobile |
The backend has 57 unit tests covering domain primitives, Kalman filtering, coordinate transforms, tracking, and configuration. CI runs them on every push and PR.
python -m pytest backend/tests/unit -vTests are pure-Python and do not require CUDA, ultralytics, or torch. They use pytest.importorskip("cv2") where OpenCV is needed.
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Serves the React app (or returns API status if no build present) |
GET |
/health |
Detailed health status |
GET |
/status |
System status (cameras, clients, detection engine, pipeline metrics) |
GET |
/cameras |
Active camera list |
POST |
/cameras/{id}/start |
Start a physical camera |
POST |
/cameras/{id}/stop |
Stop a camera |
POST |
/api/token |
Issue a JWT (only when AUTH_ENABLED=true) |
| Endpoint | Direction | Format | Purpose |
|---|---|---|---|
/ws |
Server → client | msgpack binary | Viewer stream (frames + detections + tracks + predictions) |
/ws/camera |
Client → server | Binary JPEG + JSON | Mobile camera source |
When AUTH_ENABLED=true, both endpoints require a ?token=<jwt> query parameter; unauthorized connections close with 1008.
Client → { "type": "register", "role": "camera_source", "camera_id": null }
Server → { "type": "registered", "camera_id": 0, "target_fps": 15 }
Client → [binary JPEG frames at target FPS]
Client → { "type": "sensor_data", "gps": {...}, "orientation": {...} }
| Variable | Default | Description |
|---|---|---|
MODEL_PATH |
yolov8n.pt |
Model file — .pt, .engine (TensorRT), or .onnx |
DEVICE |
auto |
Compute device — auto, cpu, cuda:0 |
HALF_PRECISION |
false |
FP16 inference (set true on Jetson with .engine) |
DETECTION_CLASSES |
[0] |
COCO class IDs to detect (0 = person) |
CONFIDENCE_THRESHOLD |
0.5 |
Detection confidence threshold |
IOU_THRESHOLD |
0.45 |
NMS IoU threshold |
TARGET_FPS |
24 |
Processing framerate target |
MAX_CAMERAS |
4 |
Maximum concurrent camera streams |
TRACKING_MAX_AGE |
30 |
Max frames to keep lost tracks |
TRACKING_MIN_HITS |
3 |
Min hits to confirm a track |
TRACKING_IOU_THRESHOLD |
0.25 |
IoU threshold for tracking |
MOBILE_CAMERA_FPS |
15 |
Mobile camera target FPS |
MOBILE_CAMERA_MAX_WIDTH |
640 |
Mobile camera max width |
SSL_ENABLED |
true |
Enable HTTPS/WSS |
SSL_CERTFILE |
certs/cert.pem |
SSL certificate path |
SSL_KEYFILE |
certs/key.pem |
SSL private key path |
HOST |
0.0.0.0 |
Bind address |
PORT |
8000 |
Bind port |
AUTH_ENABLED |
false |
Require JWT on WS + REST when true |
JWT_SECRET |
(empty) | HS256 signing key — required when AUTH_ENABLED=true |
CORS_ORIGINS |
["*"] |
JSON list of allowed origins |
MAX_WS_CLIENTS |
100 |
Hard cap on concurrent viewer WebSocket connections |
| Variable | Default | Description |
|---|---|---|
REACT_APP_BACKEND_HOST |
window.location.hostname |
Backend IP or hostname |
REACT_APP_BACKEND_PORT |
8000 |
Backend port |
REACT_APP_BACKEND_PROTOCOL |
wss (https) / ws (http) |
WebSocket protocol |
REACT_APP_MAX_CAMERAS |
4 |
Maximum cameras to display |
REACT_APP_CAMERA_INACTIVITY_TIMEOUT |
3000 |
ms before marking camera offline |
REACT_APP_MOBILE_TARGET_FPS |
15 |
Mobile streaming FPS |
REACT_APP_MOBILE_JPEG_QUALITY |
0.5 |
Mobile JPEG quality (0–1) |
REACT_APP_MOBILE_MAX_WIDTH |
640 |
Mobile frame width |
| Variable | Default | Description |
|---|---|---|
JETSON_HOST |
192.168.1.10 |
Jetson IP or hostname |
JETSON_USER |
mandar |
SSH user |
JETSON_PASS |
(prompt) | SSH password — fallback to getpass if unset |
JETSON_KEY |
(unset) | Path to private key (preferred over password) |
The frontend renders a tactical HUD inspired by Anduril's EagleEye UI — diamond IFF markers, compass ribbon, threat rings — implemented entirely in HTML5 Canvas. (Visual style only; rendered from open code, no Anduril assets used.)
| Layer | Color | Elements |
|---|---|---|
| Detections | Slate-blue #64b5f6 |
Diamond markers, corner brackets, PERSON confidence pill |
| Tracks | Amber #ffd740 |
Diamond/chevron markers, velocity vector arrows, track ID callouts |
| Predictions (H-PROJ) | Green #00ff82 solid |
Homography-projected ghost — accurate, real-time cross-camera |
| Predictions (EXTRAP) | Red #ff5050 dashed |
Pixel-extrapolated ghost — time-decaying dead-reckoning |
| Predictions (WORLD) | Orange #ff9800 dashed |
World-coordinate projection — pinhole-model fallback |
| Compass ribbon | — | Heading ribbon with N/E/S/W and bearing tick marks |
| Threat ring | Per-IFF color | Inner ring around feed showing bearing to off-screen predictions |
Detection overlays show what the model sees right now. Track overlays show persistent identity across frames. Predictions show cross-camera projections — green for homography (most accurate), orange for world-model fallback (rough but always available), red for pixel extrapolation (last resort).
Each fused world object maintains a 6-state Kalman filter [x, y, z, vx, vy, vz] with constant-velocity dynamics. Measurement noise R adapts per-update based on detection confidence, bounding box area, and sensor trust — higher-quality observations tighten the filter, while noisy or untrusted sensors widen it. dt is clamped to ≥ 0 to defend against cross-camera clock skew.
Objects from different cameras are matched when:
- Euclidean distance < 2 m
- Same
class_id - Appearance cosine similarity > 0.5 (when feature vectors available)
Each camera/sensor earns trust through consistency:
- Consistent measurements → trust increases (capped at 1.0)
- Innovation outliers → trust decays (floored at 0.1)
- 64-dimensional HSV histogram descriptors computed per detection (~0.1 ms each)
- L2-normalized for cosine similarity
- Exponential moving average (α = 0.3) for descriptor stability across frames
The signature feature is ghost prediction: when Camera 0 can't see a person but Camera 1 can, the system renders a ghost overlay on Camera 0's feed showing where that person is.
Sliding a person's last-known pixel position forward in time fails within seconds because:
- Different cameras have completely different pixel coordinate systems
- The mapping between camera views is a projective transformation, not a linear offset
- A person at pixel
(400, 300)in Camera 1 might correspond to(800, 500)in Camera 0
When both cameras simultaneously observe the same person (matched via appearance re-ID), the system records foot-point correspondence pairs — the bottom-center of the bounding box in each view. These foot points project to the same physical ground-plane location.
With ≥ 4 such pairs, cv2.findHomography() + RANSAC computes a 3×3 homography matrix
-
Collect: when re-ID matches a person across Camera 0 and Camera 1, record
(foot_cam0, foot_cam1)pair -
Estimate: after 4+ pairs, compute
$H_{0\to 1}$ and$H_{1\to 0}$ via RANSAC (re-estimated every 5 new pairs) -
Project: when Camera 0 loses a person but Camera 1 still sees them, apply
$H_{1\to 0}$ to Camera 1's current foot point → position on Camera 0's feed - Validate: monitor reprojection error; if it spikes (camera moved), flush and re-learn
- Homography estimation: < 0.1 ms (called every 5 new pairs, not every frame)
- Per-prediction projection: < 0.001 ms (one 3×3 matrix multiply)
- Total overhead per frame: effectively zero on Jetson Orin Nano
| Ghost color | Tag | Source | Meaning |
|---|---|---|---|
| 🟢 Green solid | H-PROJ |
Path A — homography | Cross-camera ground-plane projection. Tries all source cameras with valid |
| 🟠 Orange dashed | WORLD |
Path C — world projection | Fused 3D world position → pinhole camera model. Rough but always works even when no homography exists and the target camera has never seen the person. |
| 🔴 Red dashed | EXTRAP |
Path B — pixel extrapolation | Slides last-known pixel position by velocity × time. Adaptive budget: min(250 px, 80 + 40 × t). Only works if the target camera previously saw the person. |
Any phone on the same LAN can become a camera source:
- Via React app:
https://<frontend-ip>:3000/mobile - Standalone page:
https://<jetson-ip>:8000/mobile
The mobile client:
- Opens the rear camera via
getUserMedia(1280×720) - Renders to an offscreen canvas, extracts a JPEG blob
- Sends binary frames over WebSocket to
/ws/camera - Captures GPS (
watchPosition, high-accuracy) and IMU (DeviceOrientationEvent) at 2 Hz - Sends sensor data as JSON for camera calibration fusion
getUserMediarequires HTTPS — this is why SSL certificates are mandatory even on LAN.
| Edge case | Behavior | Mitigation |
|---|---|---|
| No homography learned yet | Path A fails silently; falls through to Path B (extrap) or Path C (world projection). Ghost appears orange instead of green. | Walk through overlapping camera FOVs to collect ≥ 4 foot-point pairs. Homography auto-learns within ~5 s of co-visibility. |
| Camera moved after calibration | Reprojection error spikes; stale |
The system monitors error and flushes the homography when it exceeds 50 px. Walk through overlap again to re-learn. |
| Person only seen by one camera ever | Path A has no source to project from; Path B has no pixel history for the target. Path C is the only option. | Path C accuracy depends on calibrated camera positions in the CoordinateTransformer (set via CAMERA_POSITIONS env). |
| Cameras with no overlapping FOV | No co-visible observations → no foot-point pairs → no homography. Path A never activates between these cameras. | Path C still works. For better accuracy, set CAMERA_POSITIONS to your physical camera extrinsics. |
| Edge case | Behavior | Mitigation |
|---|---|---|
| Identical clothing | HSV histograms are nearly identical; re-ID may merge two people into one world object. | The system uses spatial distance (< 2 m) AND appearance similarity (> 0.5 cosine). If two people are spatially separated, they stay separate even with identical appearance. |
| Person temporarily fully occluded | Track coasts for prediction_horizon seconds (default 5 s); confidence decays linearly. After timeout, the track is pruned. |
Increase prediction_horizon if longer persistence is needed. Kalman velocity keeps the ghost moving during occlusion. |
| Crowded scenes (> 10 people) | Hungarian cost matrix grows as O(n × m); appearance feature extraction adds ~0.1 ms per detection. |
Throughput may drop below target FPS. YOLOv8n NMS already limits detections. |
| Person enters from off-screen | No pixel history, no world object yet. First detection creates a new track with high measurement noise. | Kalman initializes with large uncertainty; trust builds over 5–10 consistent frames. |
| Edge case | Behavior | Mitigation |
|---|---|---|
| Mobile GPS jitter indoors | GPS accuracy can be 10–50 m indoors; Kalman receives noisy position updates. | Sensor trust scoring down-weights high-innovation sources. Trust floor (0.1) prevents complete rejection. |
| Mobile phone loses WebSocket | Virtual camera stream stops; existing tracks coast via Kalman prediction. | Tracks persist for prediction_horizon. Phone auto-reconnects (with intentional-close handling) and gets a new camera ID. |
| Clock drift between cameras | Frame timestamps may not be synchronized. Co-visibility matching uses a 0.5 s window. | The 0.5 s window is generous for typical LAN latency. NTP sync is recommended for sub-100 ms accuracy. The Kalman dt is clamped to ≥ 0 so negative skew can't corrupt covariance. |
| Edge case | Behavior | Mitigation |
|---|---|---|
| Self-signed cert rejected by browser | WebSocket connection fails silently; frontend shows no feeds. | Visit https://<jetson-ip>:8000 directly and accept the certificate (once per browser session). |
| Jetson runs out of GPU memory | TensorRT engine uses ~30 MiB. With 4 cameras at 640×640, CUDA memory ≈ 200 MiB total. Orin Nano has 8 GB shared. | Monitor with tegrastats. Reduce MAX_CAMERAS or input resolution if tight. |
| Backend crash | Run via nohup in deploy script; no auto-restart. |
Add a systemd unit with Restart=always for production. python scripts/restart_jetson.py brings it back manually. |
| Many viewers cause lag | The singleton pipeline runs once per tick regardless of viewers, but msgpack serialize + send scales linearly. Per-client send timeout is 2 s. | Pre-serialized snapshots minimize per-viewer cost. For > 10 viewers, consider a pub/sub layer (Redis, NATS). |
| Mid-deploy network drop | Atomic SFTP staging means previous version stays at <remote> until the swap. |
If a deploy partially fails, run python scripts/deploy_jetson.py --rollback to restore the last .bak. |
WebSocket won't connect
- Visit
https://<jetson-ip>:8000in your browser and accept the self-signed certificate - Verify
REACT_APP_BACKEND_HOSTinfrontend/.envmatches the backend IP - Check the backend is running:
curl -sk https://<jetson-ip>:8000/health - If
AUTH_ENABLED=true, ensure the client supplies?token=<jwt>fromPOST /api/token
Mobile camera shows black screen
- HTTPS is required for
getUserMedia— ensureSSL_ENABLED=true - Phone must be on the same LAN as the backend
- Allow camera permission when the browser prompts
- Try the standalone page:
https://<jetson-ip>:8000/mobile
Port already in use on Jetson
python scripts/restart_jetson.py
# Or manually over SSH:
JETSON_HOST=192.168.1.10 JETSON_PASS=... ssh "$JETSON_USER@$JETSON_HOST" \
'pkill -9 -f "python3 main.py"; sleep 2; cd /home/$USER/overwatch/backend && nohup python3 main.py > /tmp/overwatch.log 2>&1 &'Checking Jetson logs
python scripts/check_logs.py # Tails the last 50 lines
python scripts/check_status.py # Backend status snapshotBad deploy — roll back
python scripts/deploy_jetson.py --rollbackSwaps the last <remote>.bak directory back into place.
Ghost predictions not appearing on a camera
- Check homography status — look for
H learned: cam0→cam1in logs. If missing, walk through both camera FOVs simultaneously to collect correspondence pairs. - Check world projection — Path C (orange ghost) should always work if camera positions are configured. If missing, verify the
CoordinateTransformercalibration matches your physical setup. - Check prediction horizon — if
time_since_seen > prediction_horizon(default 5 s), the object is pruned. The person must be actively tracked by at least one camera. - Check
source_tracks— if Camera 1 is currently tracking the person, no prediction is generated for it (live track, not a ghost).
Ghosts flicker between green and orange
The homography is borderline — sometimes projection succeeds (green), sometimes it fails and falls through to Path C (orange):
- Homography learned from too few correspondence pairs (minimum 4, but 8+ is more stable)
- Person is near the edge of the overlap zone where reprojection error is highest
- Walk more paths through the camera overlap to improve
$H$ stability
Two people merged into one ghost
Cross-camera re-ID matched two different people as the same world object. This happens with:
- Identical clothing (same HSV histogram)
- People standing < 2 m apart in world coordinates
- Temporary occlusion causing track ID swap
The system self-corrects once the people separate spatially. The appearance descriptor EMA (α = 0.3) gradually diverges.
Pydantic Config error
Use only the model_config = SettingsConfigDict(...) dict pattern — do not define an inner class Config. This is the Pydantic v2 convention.
| Layer | Technology |
|---|---|
| Detection | Ultralytics YOLOv8 (nano) |
| Inference | NVIDIA TensorRT FP16 / ONNX Runtime / PyTorch |
| Tracking | DeepSORT / Hungarian (scipy) / Centroid |
| Fusion | Custom 6-state Kalman filter with adaptive noise |
| Cross-camera | Ground-plane homography via OpenCV findHomography + RANSAC |
| Backend | FastAPI + Uvicorn (ASGI) |
| Protocol | msgpack binary over WebSocket |
| Frontend | React 18 + Canvas 2D API |
| Auth (optional) | PyJWT (HS256) |
| Hardware | NVIDIA Jetson Orin Nano (JetPack 6.x, R36) |
| Deployment | paramiko SSH/SFTP automation |
| Tests + CI | pytest + GitHub Actions |
The cross-camera homography system is built on established multi-view geometry principles and inspired by several academic works and open-source implementations.
| Source | Relevance |
|---|---|
| Hartley, R. & Zisserman, A. (2004). Multiple View Geometry in Computer Vision, 2nd ed. Cambridge University Press. | Chapter 13: ground-plane homography between uncalibrated camera pairs. |
| Faugeras, O. (1993). Three-Dimensional Computer Vision: A Geometric Viewpoint, MIT Press. | Projective geometry fundamentals used in the homography estimation pipeline. |
| Paper | Venue | Contribution |
|---|---|---|
| Hou, Y., Zheng, L., & Gould, S. (2020). Multiview Detection with Feature Perspective Transformation | ECCV 2020 | Ground-plane projection of CNN feature maps via homography for multi-view pedestrian detection. 88.2% MODA on Wildtrack. |
| Hou, Y. & Zheng, L. (2021). MVDeTr: Multiview Detection with Shadow Transformer | ACM MM 2021 | Deformable transformer extension of MVDet. 91.5% MODA on Wildtrack. |
| Psaltis, A. et al. (2021). Tracking Grow-Finish Pigs Across Large Pens Using Multiple Cameras | CVPR 2021 Workshop | Production homography-based cross-camera tracking with DeepSORT + YOLOv4. |
| Ristani, E. et al. (2016). Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking | ECCV 2016 Workshop | Defined IDF1, IDP, IDR metrics. Established the DukeMTMC benchmark. |
| Jeon, Y. et al. (2023). Leveraging Future Trajectory Prediction for Multi-Camera People Tracking | CVPR 2023 Workshop | Spatial-temporal cross-camera graph for MCMT. |
| Chen, C. et al. (2023). ReST: A Reconfigurable Spatial-Temporal Graph Model for MCMT | ICCV 2023 | Graph-based cross-camera association that learns spatial topology from observations. |
| Fischler, M.A. & Bolles, R.C. (1981). Random Sample Consensus | CACM 24(6) | The RANSAC algorithm used in cv2.findHomography. |
| Repository | Usage |
|---|---|
| hou-yz/MVDet | Reference for get_worldcoord_from_imgcoord() and multi-view feature fusion. |
| hou-yz/MVDeTr | Reference for deformable transformer attention across multi-view projected features. |
| AIFARMS/multi-camera-pig-tracking | Direct inspiration for the homography-based cross-camera approach. |
| yuntaeJ/SCIT-MCMT-Tracking | Reference for spatial-temporal cross-camera association graphs. |
| chengche6230/ReST | Reference for reconfigurable spatial-temporal graphs in MCMT. |
| ultralytics/ultralytics | YOLOv8 detection model. |
| levan92/deep_sort_realtime | DeepSORT tracker implementation. |
| Dataset | Citation |
|---|---|
| Wildtrack | Chavdarova, T. et al. (2018). CVPR. |
| MultiviewX | Hou, Y. et al. (2020). Synthetic multi-view pedestrian dataset introduced with MVDet. |
| DukeMTMC | Ristani, E. et al. (2016). |
OVERWATCH is released under the MIT License — copyright © 2024–2026 Mandar Wagh. You're free to use, copy, modify, merge, publish, distribute, sublicense, and sell copies of the software, subject to the conditions in the license file. If you build something cool with it, a link back is appreciated but not required.
"Anduril," "Lattice," "Connected Warfare," and "EagleEye" are trademarks of Anduril Industries, Inc. This project is an independent community implementation inspired by publicly-shown concepts of those products. It is not affiliated with, endorsed by, or sponsored by Anduril Industries. No proprietary information, code, or assets from Anduril are used.
All other third-party trademarks (NVIDIA, Jetson, TensorRT, React, FastAPI, etc.) belong to their respective owners.
Connected sensing, at hackathon scale. 🎯
Inspired by Anduril Connected Warfare · Built on open tools.