# Imports

- cv2: To capture and manipulate video frames (e.g., from a webcam) and prepare them for processing.
- mediapipe: To analyze video frames and extract landmarks (e.g., hand or face tracking). This is the machine-learning model used in this project.
- python-osc: To send processed data (e.g., landmark positions) via the OSC protocol to external software or devices. In this case, it's sending to processing 4.


Make sure to have the following installed in the enviroment:
- `mediapipe`: For landmark detection.
- `python-osc`: To send data via OSC.
- `cv2` (OpenCV): To capture and handle video frames.

In [1]:
import cv2
import mediapipe as mp
import os
from pythonosc.udp_client import SimpleUDPClient

# Setting Up OSC and MediaPipe

In this section, we will configure two key components of our project:

1. **MediaPipe**: A powerful library for real-time perception tasks such as hand tracking, face detection, and pose estimation. It processes video frames to extract meaningful landmarks (e.g., hand positions) which we can use creatively.
   
2. **OSC (Open Sound Control)**: A protocol used to send data between different software or hardware. We'll use the Python-OSC library to transmit the data extracted by MediaPipe to other applications, such as Processing or Max/MSP.

The code below initializes these components and prepares them to work together.


In [2]:
# Setup OSC
OSC_IP = "127.0.0.1"  # Local address
OSC_PORT = 12000       # Port for Processing
client = SimpleUDPClient(OSC_IP, OSC_PORT)

# Setup MediaPipe
mp_pose = mp.solutions.pose
pose = mp_pose.Pose()
mp_drawing = mp.solutions.drawing_utils

I0000 00:00:1734086521.898584  240593 gl_context.cc:357] GL version: 2.1 (2.1 Metal - 88.1), renderer: Apple M3


INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W0000 00:00:1734086521.960781  243742 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1734086521.972310  243748 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.


# Listing Available Camera Devices

Before starting the video capture, it's important to know which camera devices are available in the system. This step scans for connected cameras and lists their indices (used to access them in OpenCV) along with their names (if supported). 

Use this information to select the correct camera for the application. If no cameras are detected, ensure your device is connected properly or check for driver issues.


In [3]:
def list_camera_devices():
    """Lists available camera devices with their names (if supported)."""
    print("Scanning for available camera devices...")
    available_cameras = []
    for index in range(6):  # Check up to 6 camera indices
        cap = cv2.VideoCapture(index)
        if cap.isOpened():
            # Try to fetch the camera's name, if available
            camera_name = cap.get(cv2.CAP_PROP_DEVICE_NAME) if hasattr(cv2, 'CAP_PROP_DEVICE_NAME') else "Unknown"
            print(f"Index {index}: {camera_name}")
            available_cameras.append((index, camera_name))
            cap.release()
        else:
            print(f"Index {index}: No device detected.")
    if not available_cameras:
        print("No camera devices found.")
    return available_cameras

# Call the function to list devices before starting video capture
cameras = list_camera_devices()
if not cameras:
    print("No cameras available. Exiting.")
    exit()


Scanning for available camera devices...




Index 0: Unknown
Index 1: Unknown
Index 2: Unknown
[12/13 11:42:07.762632][info][240593][Context.cpp:69] Context created with config: default config!
[12/13 11:42:07.762653][info][240593][Context.cpp:74] Context work_dir=/Users/linalopes/Desktop/tracking-body
[12/13 11:42:07.762654][info][240593][Context.cpp:77] 	- SDK version: 1.9.4
[12/13 11:42:07.762656][info][240593][Context.cpp:78] 	- SDK stage version: main
[12/13 11:42:07.762658][info][240593][Context.cpp:82] get config EnumerateNetDevice:false
[12/13 11:42:07.762660][info][240593][MacPal.cpp:36] createObPal: create MacPal!
[12/13 11:42:07.767297][info][240593][MacPal.cpp:104] Create PollingDeviceWatcher!
[12/13 11:42:07.767305][info][240593][DeviceManager.cpp:15] Current found device(s): (0)
[12/13 11:42:07.767308][info][240593][Pipeline.cpp:15] Try to create pipeline with default device.
[12/13 11:42:07.768842][info][240593][Context.cpp:90] Context destroyed
Index 3: No device detected.
[12/13 11:42:07.783432][info][240593][Co

OpenCV: out device of bound (0-2): 3
OpenCV: camera failed to properly initialize!
[ WARN:0@8.537] global cap.cpp:323 open VIDEOIO(OBSENSOR): raised unknown C++ exception!


OpenCV: out device of bound (0-2): 4
OpenCV: camera failed to properly initialize!
[ WARN:0@8.553] global cap.cpp:323 open VIDEOIO(OBSENSOR): raised unknown C++ exception!


OpenCV: out device of bound (0-2): 5
OpenCV: camera failed to properly initialize!
[ WARN:0@8.566] global cap.cpp:323 open VIDEOIO(OBSENSOR): raised unknown C++ exception!


OpenCV: out device of bound (0-2): 6
OpenCV: camera failed to properly initialize!
[ WARN:0@8.578] global cap.cpp:323 open VIDEOIO(OBSENSOR): raised unknown C++ exception!


OpenCV: out device of bound (0-2): 7
OpenCV: camera failed to properly initialize!
[ WARN:0@8.590] global cap.cpp:323 open VIDEOIO(OBSENSOR): raised unknown C++ exception!


OpenCV: out device of bound (0-2): 8
OpenCV: camera failed to properly initialize!
[ WARN:0@8.601] global cap.cpp:323 open VIDEOIO

# Video Capture and Processing Loop

This part of the code is the heart of the program, where video frames are captured, processed, and visualized in real-time. Here's a breakdown of what each section does:

1. **Video Capture Initialization**:
   - `cv2.VideoCapture(2)`: Opens a connection to the webcam (or a specific video source). Replace `2` with `0` or `1` if the camera index needs adjustment for your system.

2. **Frame Reading**:
   - `ret, frame = cap.read()`: Reads each frame from the webcam. If `ret` is `False`, it means there are no more frames (e.g., the camera was disconnected).

3. **Image Conversion**:
   - `cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)`: Converts the frame from BGR to RGB, as MediaPipe expects RGB images for processing.

4. **MediaPipe Pose Processing**:
   - `pose.process(rgb_frame)`: Processes the frame to detect pose landmarks. These landmarks represent keypoints (e.g., joints) of a human pose.

5. **Visualization (Optional)**:
   - `mp_drawing.draw_landmarks`: Draws the detected pose landmarks and their connections on the video frame for visualization.

6. **Send Data via OSC**:
   - The keypoints from `results.pose_landmarks` are normalized (values between 0 and 1) and prepared as a list. These are sent via OSC to external applications, such as Processing, using the `/pose` OSC address.

7. **Video Display (Optional)**:
   - `cv2.imshow('MediaPipe Pose', frame)`: Displays the video frame with landmarks overlaid. This helps verify the system's operation.

8. **Quit the Loop**:
   - `cv2.waitKey(1) & 0xFF == ord('q')`: Checks if the "q" key is pressed. If so, it breaks the loop and ends the program.

9. **Release Resources**:
   - `cap.release()`: Releases the webcam resource.
   - `cv2.destroyAllWindows()`: Closes any OpenCV-created windows.

This loop ensures continuous video capture and processing, making it possible to track pose landmarks in real-time and transmit their data seamlessly to other tools.


In [5]:
# Video capture
cap = cv2.VideoCapture(0) # Put here the index of the camera device

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    
    # Convert image to RGB (MediaPipe uses RGB)
    rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    results = pose.process(rgb_frame)

    # Draw points on the frame (optional, for visualization)
    if results.pose_landmarks:
        mp_drawing.draw_landmarks(frame, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)
        
        # Send keypoints via OSC
        keypoints = []
        for landmark in results.pose_landmarks.landmark:
            keypoints.extend([landmark.x, landmark.y])  # Normalized (0-1)
        client.send_message("/pose", keypoints)

    # Show video with points (optional)
    cv2.imshow('MediaPipe Pose', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

KeyboardInterrupt: 