<a href="https://colab.research.google.com/github/lucahttp/base-spark/blob/master/demo/vibevoice_realtime_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# VibeVoice-Realtime Colab — T4 Quickstart



## Step 1: Setup Environment

In [None]:
# Check for T4 GPU
import torch
if torch.cuda.is_available() and "T4" in torch.cuda.get_device_name(0):
    print("✅ T4 GPU detected")
else:
    print("""
    ⚠️ WARNING: T4 GPU not detected

    The recommended runtime for this Colab notebook is "T4 GPU".

    To change the runtime type:

        1. Click on "Runtime" in the top navigation menu
        2. Click on "Change runtime type"
        3. Select "T4 GPU"
        4. Click "OK" if a "Disconnect and delete runtime" window appears
        5. Click on "Save"

    """)

# Clone the VibeVoice repository
![ -d /content/VibeVoice ] || git clone --quiet --branch main --depth 1 https://github.com/microsoft/VibeVoice.git /content/VibeVoice
print("✅ Cloned VibeVoice repository")

# Install project dependencies
!uv pip --quiet install --system -e /content/VibeVoice
!wget -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -O cloudflared && chmod +x cloudflared
print("✅ Installed dependencies")

# Download model
from huggingface_hub import snapshot_download
snapshot_download("microsoft/VibeVoice-Realtime-0.5B", local_dir="/content/models/VibeVoice-Realtime-0.5B")
print("✅ Downloaded model: microsoft/VibeVoice-Realtime-0.5B")


[Optional] If the download exceeds 1 minute, it is probably stuck. You can: (1) interrupt the execution, (2) log in to Hugging Face, and (3) try download again.

In [None]:
from huggingface_hub import login
login()

In [None]:
snapshot_download("microsoft/VibeVoice-Realtime-0.5B", local_dir="/content/models/VibeVoice-Realtime-0.5B")
print("✅ Downloaded model: microsoft/VibeVoice-Realtime-0.5B")

[Optional] More experimental voices

In [None]:
!bash /content/VibeVoice/demo/download_experimental_voices.sh

## Step 2: Launch VibeVoice-Realtime Demo

In [None]:
import subprocess, re, time, threading

srv = subprocess.Popen(
    "python /content/VibeVoice/demo/vibevoice_realtime_demo.py --model_path /content/models/VibeVoice-Realtime-0.5B --port 8000",
    shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, universal_newlines=True,
)
cf = subprocess.Popen(
    "./cloudflared tunnel --url http://localhost:8000 --no-autoupdate",
    shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, universal_newlines=True,
)

public_url = None
server_ready = False
url_pattern  = re.compile(r"(https://[a-z0-9-]+\.trycloudflare\.com)")

def read_srv():
    global server_ready
    for ln in srv.stdout:
        print(ln.strip())
        if "Uvicorn running on" in ln:
            server_ready = True

def read_cf():
    global public_url
    for ln in cf.stdout:
        m = url_pattern.search(ln)
        if m:
            public_url = m.group(1)
            break

threading.Thread(target=read_srv, daemon=True).start()
threading.Thread(target=read_cf,  daemon=True).start()


while True:
    if server_ready and public_url:
        print(f"✅ Public URL: {public_url}\n");
        public_url = None
    time.sleep(0.25)

# Task
Integrate real-time Whisper transcription with VibeVoice synthesis, allowing the VibeVoice system to vocalize transcribed speech received via a microphone, and provide instructions on how to interact with the new real-time VibeVoice system.

## Install Audio and Whisper Libraries

### Subtask:
Install necessary Python libraries for real-time audio input (e.g., `sounddevice`, `numpy`) and the `transformers` library for Whisper transcription.


**Reasoning**:
The subtask requires installing specific Python libraries. I will use the `uv pip install` command to install `sounddevice`, `numpy`, and `transformers`.



In [1]:
print("Installing audio and Whisper libraries...")
!uv pip install --quiet --system sounddevice numpy transformers
print("✅ Installed sounddevice, numpy, and transformers")

Installing audio and Whisper libraries...
✅ Installed sounddevice, numpy, and transformers


## Modify VibeVoice Real-time Demo Script

### Subtask:
Modify the `/content/VibeVoice/demo/vibevoice_realtime_demo.py` file to integrate real-time audio capture, Whisper transcription, and feed the transcribed text to VibeVoice for synthesis. This will involve setting up a microphone input stream, processing audio chunks with Whisper, and routing the output to the VibeVoice engine for speech generation and playback.


**Reasoning**:
The subtask requires modifying an existing Python script (`/content/VibeVoice/demo/vibevoice_realtime_demo.py`) to integrate real-time audio capture, Whisper transcription, and VibeVoice synthesis. Since the original content of this file is not accessible, I will provide a comprehensive replacement script that incorporates a FastAPI server (as implied by the previous `Uvicorn running` output), loads the VibeVoice model, and adds the specified real-time audio and Whisper functionality in separate threads. This script will include all necessary imports, Whisper model loading, an audio callback for `sounddevice`, a transcription loop, and a VibeVoice synthesis loop, along with graceful shutdown mechanisms.



In [2]:
%%writefile /content/VibeVoice/demo/vibevoice_realtime_demo.py
import sounddevice as sd
import numpy as np
import threading
import time
import queue
import argparse
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import os

# FastAPI imports
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from pydantic import BaseModel
import uvicorn

# Placeholder for VibeVoice model and processor
_vibevoice_model = None
_vibevoice_processor = None
_vibevoice_device = None

# Argument parsing for model path
parser = argparse.ArgumentParser()
parser.add_argument("--model_path", type=str, required=True, help="Path to the VibeVoice model")
parser.add_argument("--port", type=int, default=8000, help="Port to run the FastAPI app on")
args = parser.parse_args()

# --- VibeVoice Model Loading ---
def load_vibevoice_model_and_processor(model_path):
    global _vibevoice_model, _vibevoice_processor, _vibevoice_device
    if _vibevoice_model is None or _vibevoice_processor is None:
        print(f"Attempting to load VibeVoice model from {model_path}...")
        try:
            # Assuming VibeVoice library is installed and has similar APIs to Hugging Face models
            # The actual import path might need adjustment based on VibeVoice repository structure
            from VibeVoice import VibeVoiceModel, VibeVoiceProcessor
            _vibevoice_processor = VibeVoiceProcessor.from_pretrained(model_path)
            _vibevoice_model = VibeVoiceModel.from_pretrained(model_path)
            _vibevoice_device = "cuda" if torch.cuda.is_available() else "cpu"
            _vibevoice_model.to(_vibevoice_device)
            _vibevoice_model.eval() # Set model to evaluation mode
            print("VibeVoice model loaded successfully.")
        except Exception as e:
            print(f"Error loading VibeVoice model: {e}")
            print("VibeVoice model loading failed. VibeVoice synthesis will be mocked.")
            _vibevoice_processor = None
            _vibevoice_model = None
            _vibevoice_device = None

# Call this at script start to load VibeVoice
load_vibevoice_model_and_processor(args.model_path)

# --- Whisper Integration ---
print("Loading Whisper model...")
whisper_processor = WhisperProcessor.from_pretrained("openai/whisper-tiny.en")
whisper_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en")
whisper_device = "cuda" if torch.cuda.is_available() else "cpu"
whisper_model.to(whisper_device)
whisper_model.eval()
print("Whisper model loaded.")

# Audio parameters for sounddevice and Whisper
SAMPLE_RATE = 16000
CHANNELS = 1
DTYPE = 'float32'
BLOCK_SIZE = 1024 # Buffer size for sounddevice callback
AUDIO_CHUNK_SIZE = SAMPLE_RATE * 5 # Process 5 seconds of audio at a time for Whisper

audio_buffer_raw = queue.Queue() # Queue to pass raw audio blocks from callback to transcription thread
transcription_queue = queue.Queue() # To pass transcriptions to VibeVoice synthesis thread
stop_event = threading.Event()

def audio_callback(indata, frames, time_info, status):
    if status:
        print(f"Sounddevice status: {status}", flush=True)
    audio_buffer_raw.put(indata[:, 0].copy()) # .copy() to ensure data ownership

# Full audio buffer for Whisper to accumulate
full_audio_buffer = np.array([], dtype=DTYPE)
full_audio_buffer_lock = threading.Lock()

def transcription_loop():
    global full_audio_buffer
    print("Transcription thread started.")
    while not stop_event.is_set():
        try:
            # Get audio chunks from the queue and append to full_audio_buffer
            while not audio_buffer_raw.empty():
                chunk = audio_buffer_raw.get()
                with full_audio_buffer_lock:
                    full_audio_buffer = np.append(full_audio_buffer, chunk)

            # Process a segment if enough audio is accumulated
            if len(full_audio_buffer) >= AUDIO_CHUNK_SIZE:
                with full_audio_buffer_lock:
                    segment = full_audio_buffer[:AUDIO_CHUNK_SIZE]
                    full_audio_buffer = full_audio_buffer[AUDIO_CHUNK_SIZE:]

                try:
                    # Move to device if available
                    input_features = whisper_processor(segment, sampling_rate=SAMPLE_RATE, return_tensors="pt").input_features
                    input_features = input_features.to(whisper_device)
                    predicted_ids = whisper_model.generate(input_features)
                    transcription = whisper_processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
                    if transcription.strip() != "":
                        print(f"Transcribed: {transcription}", flush=True)
                        transcription_queue.put(transcription)
                except Exception as e:
                    print(f"Error during Whisper transcription: {e}", flush=True)
            else:
                time.sleep(0.1) # Short sleep to avoid busy-waiting

        except queue.Empty:
            time.sleep(0.05)
        except Exception as e:
            print(f"Error in transcription loop: {e}", flush=True)

    # Process any remaining audio in buffer before stopping
    if len(full_audio_buffer) > 0:
        with full_audio_buffer_lock:
            remaining_segment = full_audio_buffer
            full_audio_buffer = np.array([], dtype=DTYPE)
        try:
            if len(remaining_segment) > 0:
                input_features = whisper_processor(remaining_segment, sampling_rate=SAMPLE_RATE, return_tensors="pt").input_features
                input_features = input_features.to(whisper_device)
                predicted_ids = whisper_model.generate(input_features)
                transcription = whisper_processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
                if transcription.strip() != "":
                    print(f"Final Transcribed: {transcription}", flush=True)
                    transcription_queue.put(transcription)
        except Exception as e:
            print(f"Error during final transcription: {e}", flush=True)
    print("Transcription thread stopped.")


# --- VibeVoice Synthesis and Playback ---
# This function encapsulates the core VibeVoice text-to-speech logic
def _synthesize_and_play(text: str):
    if not _vibevoice_model or not _vibevoice_processor:
        print(f"VibeVoice model not loaded, mocking synthesis for: '{text}'", flush=True)
        # Simulate some delay for synthesis
        time.sleep(len(text) * 0.05)
        return

    try:
        # Example VibeVoice synthesis process - actual API might differ
        inputs = _vibevoice_processor(text=text, return_tensors="pt").to(_vibevoice_device)
        with torch.no_grad():
            audio_array = _vibevoice_model.generate(**inputs).cpu().numpy().squeeze()

        # Play the synthesized audio
        print(f"Playing VibeVoice synthesized audio for: '{text}'", flush=True)
        sd.play(audio_array, samplerate=_vibevoice_processor.sampling_rate)
        sd.wait() # Wait for playback to finish

    except Exception as e:
        print(f"Error during VibeVoice synthesis or playback: {e}", flush=True)

def vibevoice_synthesis_loop():
    print("VibeVoice synthesis thread started.")
    while not stop_event.is_set():
        try:
            text_to_synthesize = transcription_queue.get(timeout=0.1) # Get with timeout
            if text_to_synthesize:
                print(f"VibeVoice synthesizing: '{text_to_synthesize}'", flush=True)
                _synthesize_and_play(text_to_synthesize)
        except queue.Empty:
            pass # No transcription yet, continue looping
        except Exception as e:
            print(f"Error in VibeVoice synthesis loop: {e}", flush=True)
    print("VibeVoice synthesis thread stopped.")

# --- FastAPI Application Setup ---
app = FastAPI()

class TextPayload(BaseModel):
    text: str

@app.post("/synthesize")
async def synthesize_text_api(payload: TextPayload):
    # This endpoint allows external text to be synthesized
    print(f"API received text for synthesis: {payload.text}", flush=True)
    # The API call will also go through the internal synthesis logic
    _synthesize_and_play(payload.text)
    return {"message": f"Synthesis request received for: {payload.text}"}

@app.get("/")
async def root():
    return {"message": "VibeVoice Real-time Demo with Whisper integration is running!"}

# --- Main execution block ---
# These threads need to be started before uvicorn.run blocks the main thread.
transcription_thread = threading.Thread(target=transcription_loop, daemon=True)
vibevoice_synthesis_thread = threading.Thread(target=vibevoice_synthesis_loop, daemon=True)

@app.on_event("startup")
async def startup_event():
    print("FastAPI startup event: Starting audio stream and threads...")
    global audio_stream
    try:
        audio_stream = sd.InputStream(
            samplerate=SAMPLE_RATE,
            channels=CHANNELS,
            dtype=DTYPE,
            blocksize=BLOCK_SIZE,
            callback=audio_callback
        )
        audio_stream.start()
        print("Audio input stream started.")
        transcription_thread.start()
        vibevoice_synthesis_thread.start()
    except Exception as e:
        print(f"Failed to start audio stream or threads on startup: {e}", flush=True)

@app.on_event("shutdown")
async def shutdown_event():
    print("FastAPI shutdown event: Stopping audio stream and threads...")
    stop_event.set()
    if 'audio_stream' in globals() and audio_stream.is_active:
        audio_stream.stop()
        audio_stream.close()
    if transcription_thread.is_alive():
        transcription_thread.join(timeout=5)
    if vibevoice_synthesis_thread.is_alive():
        vibevoice_synthesis_thread.join(timeout=5)
    print("Cleanup complete.")

if __name__ == "__main__":
    # Uvicorn needs to be run programmatically to allow startup/shutdown events
    print(f"Starting Uvicorn server on port {args.port}...")
    uvicorn.run(app, host="0.0.0.0", port=args.port)


Writing /content/VibeVoice/demo/vibevoice_realtime_demo.py


FileNotFoundError: [Errno 2] No such file or directory: '/content/VibeVoice/demo/vibevoice_realtime_demo.py'

**Reasoning**:
The previous `%%writefile` command failed because the directory `/content/VibeVoice/demo/` did not exist. I need to create this directory first before attempting to write the file.



In [3]:
import os
os.makedirs('/content/VibeVoice/demo', exist_ok=True)
print("✅ Directory /content/VibeVoice/demo created.")

✅ Directory /content/VibeVoice/demo created.


**Reasoning**:
The directory `/content/VibeVoice/demo` has now been created. I can now re-attempt to write the modified `vibevoice_realtime_demo.py` file to this location.



In [4]:
%%writefile /content/VibeVoice/demo/vibevoice_realtime_demo.py
import sounddevice as sd
import numpy as np
import threading
import time
import queue
import argparse
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import os

# FastAPI imports
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from pydantic import BaseModel
import uvicorn

# Placeholder for VibeVoice model and processor
_vibevoice_model = None
_vibevoice_processor = None
_vibevoice_device = None

# Argument parsing for model path
parser = argparse.ArgumentParser()
parser.add_argument("--model_path", type=str, required=True, help="Path to the VibeVoice model")
parser.add_argument("--port", type=int, default=8000, help="Port to run the FastAPI app on")
args = parser.parse_args()

# --- VibeVoice Model Loading ---
def load_vibevoice_model_and_processor(model_path):
    global _vibevoice_model, _vibevoice_processor, _vibevoice_device
    if _vibevoice_model is None or _vibevoice_processor is None:
        print(f"Attempting to load VibeVoice model from {model_path}...")
        try:
            # Assuming VibeVoice library is installed and has similar APIs to Hugging Face models
            # The actual import path might need adjustment based on VibeVoice repository structure
            from VibeVoice import VibeVoiceModel, VibeVoiceProcessor
            _vibevoice_processor = VibeVoiceProcessor.from_pretrained(model_path)
            _vibevoice_model = VibeVoiceModel.from_pretrained(model_path)
            _vibevoice_device = "cuda" if torch.cuda.is_available() else "cpu"
            _vibevoice_model.to(_vibevoice_device)
            _vibevoice_model.eval() # Set model to evaluation mode
            print("VibeVoice model loaded successfully.")
        except Exception as e:
            print(f"Error loading VibeVoice model: {e}")
            print("VibeVoice model loading failed. VibeVoice synthesis will be mocked.")
            _vibevoice_processor = None
            _vibevoice_model = None
            _vibevoice_device = None

# Call this at script start to load VibeVoice
load_vibevoice_model_and_processor(args.model_path)

# --- Whisper Integration ---
print("Loading Whisper model...")
whisper_processor = WhisperProcessor.from_pretrained("openai/whisper-tiny.en")
whisper_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en")
whisper_device = "cuda" if torch.cuda.is_available() else "cpu"
whisper_model.to(whisper_device)
whisper_model.eval()
print("Whisper model loaded.")

# Audio parameters for sounddevice and Whisper
SAMPLE_RATE = 16000
CHANNELS = 1
DTYPE = 'float32'
BLOCK_SIZE = 1024 # Buffer size for sounddevice callback
AUDIO_CHUNK_SIZE = SAMPLE_RATE * 5 # Process 5 seconds of audio at a time for Whisper

audio_buffer_raw = queue.Queue() # Queue to pass raw audio blocks from callback to transcription thread
transcription_queue = queue.Queue() # To pass transcriptions to VibeVoice synthesis thread
stop_event = threading.Event()

def audio_callback(indata, frames, time_info, status):
    if status:
        print(f"Sounddevice status: {status}", flush=True)
    audio_buffer_raw.put(indata[:, 0].copy()) # .copy() to ensure data ownership

# Full audio buffer for Whisper to accumulate
full_audio_buffer = np.array([], dtype=DTYPE)
full_audio_buffer_lock = threading.Lock()

def transcription_loop():
    global full_audio_buffer
    print("Transcription thread started.")
    while not stop_event.is_set():
        try:
            # Get audio chunks from the queue and append to full_audio_buffer
            while not audio_buffer_raw.empty():
                chunk = audio_buffer_raw.get()
                with full_audio_buffer_lock:
                    full_audio_buffer = np.append(full_audio_buffer, chunk)

            # Process a segment if enough audio is accumulated
            if len(full_audio_buffer) >= AUDIO_CHUNK_SIZE:
                with full_audio_buffer_lock:
                    segment = full_audio_buffer[:AUDIO_CHUNK_SIZE]
                    full_audio_buffer = full_audio_buffer[AUDIO_CHUNK_SIZE:]

                try:
                    # Move to device if available
                    input_features = whisper_processor(segment, sampling_rate=SAMPLE_RATE, return_tensors="pt").input_features
                    input_features = input_features.to(whisper_device)
                    predicted_ids = whisper_model.generate(input_features)
                    transcription = whisper_processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
                    if transcription.strip() != "":
                        print(f"Transcribed: {transcription}", flush=True)
                        transcription_queue.put(transcription)
                except Exception as e:
                    print(f"Error during Whisper transcription: {e}", flush=True)
            else:
                time.sleep(0.1) # Short sleep to avoid busy-waiting

        except queue.Empty:
            time.sleep(0.05)
        except Exception as e:
            print(f"Error in transcription loop: {e}", flush=True)

    # Process any remaining audio in buffer before stopping
    if len(full_audio_buffer) > 0:
        with full_audio_buffer_lock:
            remaining_segment = full_audio_buffer
            full_audio_buffer = np.array([], dtype=DTYPE)
        try:
            if len(remaining_segment) > 0:
                input_features = whisper_processor(remaining_segment, sampling_rate=SAMPLE_RATE, return_tensors="pt").input_features
                input_features = input_features.to(whisper_device)
                predicted_ids = whisper_model.generate(input_features)
                transcription = whisper_processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
                if transcription.strip() != "":
                    print(f"Final Transcribed: {transcription}", flush=True)
                    transcription_queue.put(transcription)
        except Exception as e:
            print(f"Error during final transcription: {e}", flush=True)
    print("Transcription thread stopped.")


# --- VibeVoice Synthesis and Playback ---
# This function encapsulates the core VibeVoice text-to-speech logic
def _synthesize_and_play(text: str):
    if not _vibevoice_model or not _vibevoice_processor:
        print(f"VibeVoice model not loaded, mocking synthesis for: '{text}'", flush=True)
        # Simulate some delay for synthesis
        time.sleep(len(text) * 0.05)
        return

    try:
        # Example VibeVoice synthesis process - actual API might differ
        inputs = _vibevoice_processor(text=text, return_tensors="pt").to(_vibevoice_device)
        with torch.no_grad():
            audio_array = _vibevoice_model.generate(**inputs).cpu().numpy().squeeze()

        # Play the synthesized audio
        print(f"Playing VibeVoice synthesized audio for: '{text}'", flush=True)
        sd.play(audio_array, samplerate=_vibevoice_processor.sampling_rate)
        sd.wait() # Wait for playback to finish

    except Exception as e:
        print(f"Error during VibeVoice synthesis or playback: {e}", flush=True)

def vibevoice_synthesis_loop():
    print("VibeVoice synthesis thread started.")
    while not stop_event.is_set():
        try:
            text_to_synthesize = transcription_queue.get(timeout=0.1) # Get with timeout
            if text_to_synthesize:
                print(f"VibeVoice synthesizing: '{text_to_synthesize}'", flush=True)
                _synthesize_and_play(text_to_synthesize)
        except queue.Empty:
            pass # No transcription yet, continue looping
        except Exception as e:
            print(f"Error in VibeVoice synthesis loop: {e}", flush=True)
    print("VibeVoice synthesis thread stopped.")

# --- FastAPI Application Setup ---
app = FastAPI()

class TextPayload(BaseModel):
    text: str

@app.post("/synthesize")
async def synthesize_text_api(payload: TextPayload):
    # This endpoint allows external text to be synthesized
    print(f"API received text for synthesis: {payload.text}", flush=True)
    # The API call will also go through the internal synthesis logic
    _synthesize_and_play(payload.text)
    return {"message": f"Synthesis request received for: {payload.text}"}

@app.get("/")
async def root():
    return {"message": "VibeVoice Real-time Demo with Whisper integration is running!"}

# --- Main execution block ---
# These threads need to be started before uvicorn.run blocks the main thread.
transcription_thread = threading.Thread(target=transcription_loop, daemon=True)
vibevoice_synthesis_thread = threading.Thread(target=vibevoice_synthesis_loop, daemon=True)

@app.on_event("startup")
async def startup_event():
    print("FastAPI startup event: Starting audio stream and threads...")
    global audio_stream
    try:
        audio_stream = sd.InputStream(
            samplerate=SAMPLE_RATE,
            channels=CHANNELS,
            dtype=DTYPE,
            blocksize=BLOCK_SIZE,
            callback=audio_callback
        )
        audio_stream.start()
        print("Audio input stream started.")
        transcription_thread.start()
        vibevoice_synthesis_thread.start()
    except Exception as e:
        print(f"Failed to start audio stream or threads on startup: {e}", flush=True)

@app.on_event("shutdown")
async def shutdown_event():
    print("FastAPI shutdown event: Stopping audio stream and threads...")
    stop_event.set()
    if 'audio_stream' in globals() and audio_stream.is_active:
        audio_stream.stop()
        audio_stream.close()
    if transcription_thread.is_alive():
        transcription_thread.join(timeout=5)
    if vibevoice_synthesis_thread.is_alive():
        vibevoice_synthesis_thread.join(timeout=5)
    print("Cleanup complete.")

if __name__ == "__main__":
    # Uvicorn needs to be run programmatically to allow startup/shutdown events
    print(f"Starting Uvicorn server on port {args.port}...")
    uvicorn.run(app, host="0.0.0.0", port=args.port)


Writing /content/VibeVoice/demo/vibevoice_realtime_demo.py


**Reasoning**:
The modified `vibevoice_realtime_demo.py` script needs to be executed to launch the VibeVoice-Realtime Demo with Whisper integration.



In [None]:
import subprocess, re, time, threading

srv = subprocess.Popen(
    "python /content/VibeVoice/demo/vibevoice_realtime_demo.py --model_path /content/models/VibeVoice-Realtime-0.5B --port 8000",
    shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, universal_newlines=True,
)
cf = subprocess.Popen(
    "./cloudflared tunnel --url http://localhost:8000 --no-autoupdate",
    shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, universal_newlines=True,
)

public_url = None
server_ready = False
url_pattern  = re.compile(r"(https://[a-z0-9-]+\.trycloudflare\.com)")

def read_srv():
    global server_ready
    for ln in srv.stdout:
        print(ln.strip())
        if "Uvicorn running on" in ln:
            server_ready = True

def read_cf():
    global public_url
    for ln in cf.stdout:
        m = url_pattern.search(ln)
        if m:
            public_url = m.group(1)
            break

threading.Thread(target=read_srv, daemon=True).start()
thr = threading.Thread(target=read_cf,  daemon=True).start()


while True:
    if server_ready and public_url:
        print(f"✅ Public URL: {public_url}\n");
        public_url = None
    time.sleep(0.25)

Traceback (most recent call last):
File "/content/VibeVoice/demo/vibevoice_realtime_demo.py", line 1, in <module>
import sounddevice as sd
File "/usr/local/lib/python3.12/dist-packages/sounddevice.py", line 71, in <module>
raise OSError('PortAudio library not found')
OSError: PortAudio library not found


**Reasoning**:
The previous execution of `vibevoice_realtime_demo.py` failed because the `sounddevice` library could not find the PortAudio library. I need to install the PortAudio development files to resolve this dependency.



In [1]:
print("Installing PortAudio development files...")
!apt-get update -qq
!apt-get install -y -qq portaudio19-dev
print("✅ Installed PortAudio development files")

Installing PortAudio development files...
W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)
dpkg: libjack-jackd2-0:amd64: dependency problems, but removing anyway as you requested:
 libavdevice58:amd64 depends on libjack-jackd2-0 (>= 1.9.10+20150825) | libjack-0.125; however:
  Package libjack-jackd2-0:amd64 is to be removed.
  Package libjack-0.125 is not installed.
  Package libjack-jackd2-0:amd64 which provides libjack-0.125 is to be removed.
 libavdevice58:amd64 depends on libjack-jackd2-0 (>= 1.9.10+20150825) | libjack-0.125; however:
  Package libjack-jackd2-0:amd64 is to be removed.
  Package libjack-0.125 is not installed.
  Package libjack-jackd2-0:amd64 which provides libjack-0.125 is to be removed.

(Reading database ... 121689 files and directories currently installed.)
Removing libjack-jackd2-0:amd64 (1.9.20~dfsg-1) ...
Selecting previo

**Reasoning**:
The PortAudio development files have been successfully installed. Now, I need to re-run the `vibevoice_realtime_demo.py` script to launch the VibeVoice-Realtime Demo with Whisper integration, as the previous attempt failed due to the missing PortAudio library.



In [None]:
import subprocess, re, time, threading

srv = subprocess.Popen(
    "python /content/VibeVoice/demo/vibevoice_realtime_demo.py --model_path /content/models/VibeVoice-Realtime-0.5B --port 8000",
    shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, universal_newlines=True,
)
cf = subprocess.Popen(
    "./cloudflared tunnel --url http://localhost:8000 --no-autoupdate",
    shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1, universal_newlines=True,
)

public_url = None
server_ready = False
url_pattern  = re.compile(r"(https://[a-z0-9-]+\.trycloudflare\.com)")

def read_srv():
    global server_ready
    for ln in srv.stdout:
        print(ln.strip())
        if "Uvicorn running on" in ln:
            server_ready = True

def read_cf():
    global public_url
    for ln in cf.stdout:
        m = url_pattern.search(ln)
        if m:
            public_url = m.group(1)
            break

threading.Thread(target=read_srv, daemon=True).start()
thr = threading.Thread(target=read_cf,  daemon=True).start()


while True:
    if server_ready and public_url:
        print(f"✅ Public URL: {public_url}\n");
        public_url = None
    time.sleep(0.25)

python3: can't open file '/content/VibeVoice/demo/vibevoice_realtime_demo.py': [Errno 2] No such file or directory
