# MuseTalk Lip-Sync Service on Kaggle

This notebook sets up MuseTalk for real-time lip-sync video generation.

**Requirements:**
- GPU enabled (Settings ‚Üí Accelerator ‚Üí GPU T4 x2)
- Internet enabled (Settings ‚Üí Internet ‚Üí On)

**What this does:**
1. Installs MuseTalk
2. Connects to LiveKit
3. Listens for agent audio
4. Generates lip-synced video
5. Streams back to your React app

## Step 1: Install Dependencies

In [None]:
# Check GPU is available
import torch
print(f"GPU Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU Name: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("‚ö†Ô∏è NO GPU! Go to Settings ‚Üí Accelerator ‚Üí GPU T4 x2")

In [None]:
# Install required packages
!pip install -q livekit livekit-api python-dotenv
!pip install -q diffusers transformers accelerate omegaconf einops
!pip install -q opencv-python librosa soundfile av pydub

## Step 2: Clone MuseTalk Repository

In [None]:
# Clone MuseTalk
!git clone https://github.com/TMElyralab/MuseTalk.git
%cd MuseTalk

# Download pretrained models
!mkdir -p models/musetalk
!wget -O models/musetalk/pytorch_model.bin https://huggingface.co/TMElyralab/MuseTalk/resolve/main/musetalk/pytorch_model.bin

# Download VAE
!git clone https://huggingface.co/stabilityai/sd-vae-ft-mse models/sd-vae-ft-mse

print("‚úÖ MuseTalk installed!")

## Step 3: Upload Your Idle Video

**Action needed:**
1. Click 'Add Data' button (top right)
2. Upload your `idle-avatar.mp4`
3. Run the cell below to verify

In [None]:
# List uploaded files
import os
print("Files in /kaggle/input:")
for root, dirs, files in os.walk('/kaggle/input'):
    for file in files:
        print(f"  {os.path.join(root, file)}")

# Set path to your idle video
IDLE_VIDEO_PATH = '/kaggle/input/idle-avatar/idle-avatar.mp4'  # Adjust this path
print(f"\nUsing video: {IDLE_VIDEO_PATH}")
print(f"Exists: {os.path.exists(IDLE_VIDEO_PATH)}")

## Step 4: Test MuseTalk Standalone

In [None]:
# Create a test audio file (or upload your own)
!pip install gTTS
from gtts import gTTS

# Generate test audio
tts = gTTS("Hello, this is a test of the lip sync system.", lang='en')
tts.save('test_audio.mp3')

# Convert to WAV
!ffmpeg -i test_audio.mp3 -ar 16000 test_audio.wav -y
print("‚úÖ Test audio created")

In [None]:
# Run MuseTalk inference
!python inference.py \
    --video_path {IDLE_VIDEO_PATH} \
    --audio_path test_audio.wav \
    --result_dir ./output

print("\n‚úÖ Video generated! Check ./output folder")
!ls -lh output/

## Step 5: Set Up LiveKit Connection

In [None]:
# LiveKit credentials (get these from your .env.local file)
LIVEKIT_URL = "wss://emotion-test-k1t69r4e.livekit.cloud"  # YOUR URL HERE
LIVEKIT_API_KEY = "YOUR_API_KEY_HERE"  # FROM .env.local
LIVEKIT_API_SECRET = "YOUR_API_SECRET_HERE"  # FROM .env.local

print("LiveKit Config:")
print(f"URL: {LIVEKIT_URL}")
print(f"API Key: {LIVEKIT_API_KEY[:10]}...")

## Step 6: Run MuseTalk LiveKit Service

In [None]:
import asyncio
import cv2
import numpy as np
from livekit import rtc, api
from livekit.rtc import VideoFrame, AudioFrame
import tempfile
import os

class MuseTalkService:
    """MuseTalk Lip-Sync Service for LiveKit"""
    
    def __init__(self, idle_video_path):
        self.idle_video_path = idle_video_path
        self.is_speaking = False
        
    async def generate_lipsync(self, audio_path):
        """Generate lip-synced video from audio"""
        output_dir = tempfile.mkdtemp()
        
        # Run MuseTalk
        cmd = f"""python inference.py \
            --video_path {self.idle_video_path} \
            --audio_path {audio_path} \
            --result_dir {output_dir}
        """
        os.system(cmd)
        
        # Find generated video
        videos = [f for f in os.listdir(output_dir) if f.endswith('.mp4')]
        if videos:
            return os.path.join(output_dir, videos[0])
        return None
    
    async def stream_video(self, video_path, video_source):
        """Stream video frames to LiveKit"""
        cap = cv2.VideoCapture(video_path)
        fps = cap.get(cv2.CAP_PROP_FPS)
        frame_duration = 1.0 / fps
        
        while cap.isOpened():
            ret, frame = cap.read()
            if not ret:
                break
                
            # Convert BGR to RGB
            frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            
            # Create VideoFrame
            video_frame = VideoFrame(
                width=frame_rgb.shape[1],
                height=frame_rgb.shape[0],
                type=rtc.VideoBufferType.RGBA,
                data=frame_rgb.tobytes()
            )
            
            # Send to LiveKit
            video_source.capture_frame(video_frame)
            
            await asyncio.sleep(frame_duration)
        
        cap.release()

print("‚úÖ MuseTalk service ready")

In [None]:
# Main LiveKit connection
async def run_musetalk_agent():
    """Connect to LiveKit and handle lip-sync"""
    
    # Generate access token
    token = api.AccessToken(LIVEKIT_API_KEY, LIVEKIT_API_SECRET)
    token.with_identity("musetalk-service")
    token.with_name("MuseTalk Video Generator")
    token.with_grants(api.VideoGrants(
        room_join=True,
        room="voice-chat-test",  # Adjust room name
        can_publish=True,
        can_subscribe=True,
    ))
    
    jwt = token.to_jwt()
    
    # Connect to room
    room = rtc.Room()
    
    @room.on("track_subscribed")
    def on_track_subscribed(track, publication, participant):
        print(f"Track subscribed: {track.kind} from {participant.identity}")
        
        # Listen for agent audio
        if track.kind == rtc.TrackKind.KIND_AUDIO and "agent" in participant.identity:
            print("üì¢ Detected agent audio! Generating lip-sync...")
            # TODO: Capture audio, generate video, stream back
    
    # Connect
    print(f"Connecting to {LIVEKIT_URL}...")
    await room.connect(LIVEKIT_URL, jwt)
    print(f"‚úÖ Connected to room: {room.name}")
    
    # Publish video track
    video_source = rtc.VideoSource(1280, 720)
    video_track = rtc.LocalVideoTrack.create_video_track("musetalk-video", video_source)
    await room.local_participant.publish_track(video_track)
    print("‚úÖ Video track published")
    
    # Stream idle video
    musetalk = MuseTalkService(IDLE_VIDEO_PATH)
    await musetalk.stream_video(IDLE_VIDEO_PATH, video_source)
    
# Run the agent
# await run_musetalk_agent()  # Uncomment when ready to connect

## Step 7: Test Connection

Run this cell to start the MuseTalk service and connect to LiveKit:

In [None]:
# Start the service
await run_musetalk_agent()

# This will run until you stop the cell
# You should see the video stream in your React app!