# ROS Integration


## Goals

* Run a local VLM as a service node using ROS2 Python RCL

## References

* [Understanding Services](https://docs.ros.org/en/kilted/Tutorials/Beginner-CLI-Tools/Understanding-ROS2-Services/Understanding-ROS2-Services.html)

## Setup

This notebook demonstrates integrating a Vision Language Model (VLM) with ROS2 using llama.cpp as the inference backend.

### Prerequisites

1. **llama.cpp server** running with a VLM model (e.g., LLaVA, BakLLaVA)
2. **ROS2** environment sourced
3. **vlm_ros package** built and sourced

### Architecture

- **llama.cpp server**: Runs the VLM model and exposes HTTP API (port 8080)
- **vlm_service node**: ROS2 node that subscribes to camera images and queries the VLM
- **Test script**: Python script to publish test images and receive VLM responses

## Start llama.cpp Server

**In a separate terminal**, start the llama.cpp server with a VLM model:

```bash
# Example: Download and run a VLM model
export PATH=/ryzers/llamacpp/build/bin/:$PATH
llama-server -hf ggml-org/SmolVLM-500M-Instruct-GGUF \
  --host 0.0.0.0 \
  --port 8080
```

Wait until you see "HTTP server listening" before proceeding.

## Try OpenAI API server directly

In [25]:
import base64
import json
import requests
from pathlib import Path

LLAMA_SERVER_URL = "http://0.0.0.0:8080/v1/chat/completions"

def b64_image(image_path: str) -> str:
    data = Path(image_path).read_bytes()
    return base64.b64encode(data).decode("utf-8")

def ask_with_image(image_path: str, question: str):
    payload = {
        "model": "smolvlm",
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": question},
                    {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{b64_image(image_path)}"}}
                ]
            }
        ],
        "temperature": 0.2,
        "max_tokens": 256
    }
    
    r = requests.post(LLAMA_SERVER_URL, json=payload, timeout=120)
    r.raise_for_status()
    resp = r.json()
    print(resp["choices"][0]["message"]["content"])

ask_with_image("lena.jpg", "What's in this image? Be concise.")

 A black and white portrait of a woman wearing a wide brimmed hat with a feather sticking out of the right side.


## Build the ROS Package

In [26]:
%%bash
source /opt/ros/kilted/setup.bash

# Build the vlm_ros package
cd /ryzers/notebooks/vlm_ros
colcon build --symlink-install

Starting >>> vlm_ros
Finished <<< vlm_ros [0.59s]

Summary: 1 package finished [0.67s]


## Start the VLM Service Node

Run the VLM service node in the background:

```bash
source /opt/ros/kilted/setup.bash
source /ryzers/notebooks/vlm_ros/install/setup.bash
ros2 run vlm_ros vlm_service
```

You should see this output:

```
[INFO] [1760575146.024196594] [vlm_service]: VLM Service started, connecting to http://localhost:8080
```

## Test with a Sample Image

Create a test script to publish an image and receive VLM responses:

In [27]:
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from std_msgs.msg import String
from cv_bridge import CvBridge
import cv2
import numpy as np

class VLMTester(Node):
    def __init__(self):
        super().__init__('vlm_tester')
        self.image_pub = self.create_publisher(Image, 'camera/image', 10)
        self.response_sub = self.create_subscription(String, 'vlm/response', self.response_callback, 10)
        self.bridge = CvBridge()
        self.latest_response = None
        
    def response_callback(self, msg):
        self.latest_response = msg.data
        print(f"\nVLM Response: {msg.data}\n")
        
    def publish_test_image(self, image_path):
        # Load and publish image
        img = cv2.imread(image_path)
        if img is None:
            print(f"Could not load image: {image_path}")
            return
            
        msg = self.bridge.cv2_to_imgmsg(img, encoding='bgr8')
        self.image_pub.publish(msg)
        print(f"Published image: {image_path}")

# Initialize ROS
rclpy.init()
tester = VLMTester()
print("VLM tester node initialized")

VLM tester node initialized


### Publish image

In [53]:
# Publish a test image
tester.publish_test_image('/ryzers/notebooks/images/toucan.jpg')
rclpy.spin_once(tester, timeout_sec=5)

Published image: /ryzers/notebooks/images/toucan.jpg

VLM Response:  A bird with a blue patch on its head sits on a branch in a forest.



## Next steps

* Try launching llama-server with different models. See how a smaller model like `SmolVLM-256M-Instruct-GGUF` or a much larger SoTA model like `gemma-3-4b-it-GGUF`

```
llama-server -hf ggml-org/gemma-3-4b-it-GGUF
llama-server -hf ggml-org/SmolVLM-256M-Instruct-GGUF
```



---
CopyrightÂ© 2025 AMD, Inc SPDX-License-Identifier: MIT