Skip to content

movensys/movensys-intelligence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

263 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Movensys Intelligence

Vision-language and speech intelligence layer for the movensys-manipulator stack. Adds a FastAPI VLM service, a Whisper speech endpoint, a Qdrant vector memory, optional Phoenix tracing, and sample applications that drive the manipulator from natural language.

Overview

This repository sits on top of the WMX ROS 2 manipulator stack and gives it a higher-level reasoning layer:

  • VLM service — FastAPI server wrapping a vLLM-hosted Gemma 4 model with image input, exposed as REST + WebSocket. It bridges to ROS 2 so it can call manipulator services (MovePose, MoveJoints, GetEefPose, etc.) directly.
  • Whisper service — streaming speech-to-text used to issue commands by voice.
  • Vector memory — Qdrant-backed long-term memory for the VLM agent.
  • Sample appsmovensys_robopoly, a board-game demo where the robot picks and places pieces under VLM control, with a YOLO + AprilTag perception pipeline and a dry-run mode that exercises the full stack without moving the arm.

The entire stack runs as a set of Docker compose services and supports NVIDIA desktop GPUs, Jetson Thor, and Intel B60 / Panther Lake XPU.

Repository Layout

.
├── movensys_vlm/
│   ├── main.py / router.py / ros2_node.py   # FastAPI app + ROS 2 bridge
│   ├── vlm_client.py / whisper_client.py    # vLLM + Whisper clients
│   ├── memory_client.py                     # Qdrant vector memory client
│   ├── models/                              # Local model assets (Gemma, Whisper, embeddings)
│   ├── docker/                              # Compose files: vllm, whisper, vectordb, vlm
│   └── doc/running.md                       # Step-by-step bring-up
└── movensys_sample/
    └── movensys_robopoly/                   # Board-game demo (FastAPI + adapters)
        ├── main.py / router.py
        ├── pick_and_place.py
        ├── adapters/                        # robot, ros_image, stt, vlm
        ├── game/                            # rules, manager, decks, boards
        ├── scripts/                         # auto_play_dry_run, render helpers
        └── docker/                          # Compose stack

Services and Ports

Service Default port Purpose
movensys_vlm (FastAPI) 8000 VLM REST/WebSocket API + ROS 2 bridge
vllm 9000 vLLM OpenAI-compatible inference server
whisper 9010 Speech-to-text server
vectordb (Qdrant) 6333 Long-term vector memory
movensys_robopoly 7999 Robopoly demo UI/API
phoenix (optional) 6006 OpenTelemetry/LLM traces UI

Requirements

  • Ubuntu 22.04 or 24.04
  • Docker with docker compose
  • Hardware: NVIDIA GPU (desktop or Jetson Thor) or Intel XPU (B60 / Panther Lake)
  • Local model weights placed under movensys_vlm/models/ (Gemma 4 E2B/E4B, Whisper large-v3, embedding model)
  • The movensys-manipulator stack running (the VLM publishes/calls its ROS 2 services)

Quick Start

1. Configure the host environment

Add the following to your ~/.bashrc:

export XPU_CORE=nvidia-gpu        # {nvidia-gpu, intel-xpu}
export CPU_ARCH=amd64             # {amd64, arm64}
source ~/.bashrc

2. Clone the repository

mkdir -p ~/workspaces
cd ~/workspaces
git clone https://github.com/movensys/movensys-intelligence.git

3. Start the VLM stack

For Nvidia desktop, Jetson Thor, or Intel B60:

cd ~/workspaces/movensys-intelligence/movensys_vlm/docker
COMPOSE_PROFILES=$XPU_CORE docker compose -f vllm.yaml up -d --build
COMPOSE_PROFILES=$CPU_ARCH docker compose -f vectordb.yaml up -d --build
COMPOSE_PROFILES=$XPU_CORE docker compose -f whisper.yaml up -d --build
COMPOSE_PROFILES=$XPU_CORE docker compose -f movensys_vlm.yaml up -d --build

For Intel Panther Lake (vLLM uses a separate build path):

cd ~/workspaces/movensys-intelligence/movensys_vlm/docker
./vllm-intel-build.sh
./vllm-intel-run.sh

Wait for application startup complete in the vLLM logs before continuing.

On Jetson Thor or Intel Panther Lake, drop kernel caches between restarts if memory pressure builds up: sync && sudo sysctl vm.drop_caches=3

Full bring-up, teardown, and Phoenix-tracing options are documented in movensys_vlm/doc/running.md.

4. Run a sample application

The Robopoly board-game demo drives the manipulator via the VLM stack. With the movensys-manipulator YOLO simulation example running (see movensys-manipulator/doc/6a_yolo_simulation.md):

export MOVENSYS_PNP_DRY_RUN=0     # set to 1 to skip arm motion
cd ~/workspaces/movensys-intelligence/movensys_sample/movensys_robopoly/docker
docker compose up -d --build

Open the UI on http://localhost:7999/, toggle is_YOLO on, and start a game. Dry-run mode and the auto-play test script are described in movensys_sample/doc/1a_robopoly_simulation.md.

Pick-and-place from the command line

cd ~/workspaces/movensys-intelligence
python3 movensys_sample/movensys_robopoly/pick_and_place.py red_cube GO true 2>&1 | tee baseline.log
grep '\[timing\]' baseline.log

Related Repositories

License

Released under the MIT License.

About

Vision-language and speech intelligence layer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors