# 2D to 3D Side-by-Side Video Converter (GPU Optimized)

This notebook converts standard 2D videos into stereoscopic 3D Side-by-Side (SBS) format for VR viewing. It uses MiDaS for depth estimation and creates a stereoscopic effect by synthesizing left and right eye views.

## Features
- Optimized for maximum GPU utilization (works with any available GPU memory)
- Video segment selection for processing specific portions of longer videos
- Upload videos (up to 500MB) or provide video URLs
- Adjustable depth parameters (intensity, convergence, eye separation)
- High-quality H.264 encoded MP4 output in SBS format with 16:9 overall aspect ratio
- **Preserves original audio track** in the output video
- Real-time preview and parameter adjustment
- Progress tracking and error handling

## Setup
Run the installation cell below to set up the required libraries.

In [8]:
# Install required packages
!pip install opencv-python-headless
!pip install numpy
!pip install gradio
!pip install torch torchvision
!pip install timm
!pip install yt-dlp
!pip install pytube
!pip install gdown
!apt-get update && apt-get install -y ffmpeg

# Clone MiDaS repository and install its dependencies
!git clone https://github.com/isl-org/MiDaS.git
!pip install -q -r MiDaS/requirements.txt

Hit:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:3 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:6 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:7 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Reading package lists... Done
W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)
Reading package lists... Done
Building dependency tree... Done
Reading

## Import Libraries

Import all necessary libraries for video processing, depth estimation, and the user interface.

In [9]:
import os
import cv2
import numpy as np
import torch
import urllib.request
import gradio as gr
import tempfile
import time
import re
import shutil
import threading
import gc
from pathlib import Path
from tqdm.notebook import tqdm
import subprocess
import gdown
from google.colab import files

# Check if CUDA is available
if torch.cuda.is_available():
    print(f"CUDA is available. Using GPU: {torch.cuda.get_device_name(0)}")
    gpu_mem = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"GPU Memory: {gpu_mem:.2f} GB")
    # Set CUDA device to GPU 0
    torch.cuda.set_device(0)
else:
    print("CUDA is not available. Using CPU.")

# Set default tensor type to cuda if available
if torch.cuda.is_available():
    torch.set_default_tensor_type(torch.cuda.FloatTensor)

# Optional: Set environment variable for PyTorch memory management
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:512'

CUDA is available. Using GPU: Tesla T4
GPU Memory: 15.83 GB


## MiDaS Setup

Initialize the MiDaS depth estimation model with GPU optimizations.

In [10]:
# Audio processing functions
def check_audio_stream(file_path):
    """Check if the video file has an audio stream"""
    try:
        result = subprocess.run(
            ['ffprobe', '-v', 'error', '-select_streams', 'a:0',
             '-show_entries', 'stream=codec_name', '-of', 'default=noprint_wrappers=1:nokey=1',
             file_path],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            universal_newlines=True
        )

        # If there's output, an audio stream was found
        return bool(result.stdout.strip())
    except Exception as e:
        print(f"Error checking audio stream: {str(e)}")
        return False

def extract_audio(input_path, output_path):
    """Extract audio from a video file using ffmpeg"""
    try:
        os.makedirs(os.path.dirname(output_path), exist_ok=True)

        # Use ffmpeg to extract audio
        cmd = [
            'ffmpeg',
            '-i', input_path,        # Input file
            '-vn',                   # Disable video
            '-acodec', 'copy',       # Copy audio codec without re-encoding
            '-y',                    # Overwrite output file if it exists
            output_path
        ]

        print(f"Extracting audio from {input_path} to {output_path}...")
        result = subprocess.run(cmd, capture_output=True, text=True)

        if result.returncode != 0:
            print(f"Error extracting audio: {result.stderr}")
            return None

        if not os.path.exists(output_path) or os.path.getsize(output_path) == 0:
            print("Audio extraction failed - output file is empty or missing")
            return None

        print("Audio extracted successfully")
        return output_path
    except Exception as e:
        print(f"Error during audio extraction: {str(e)}")
        return None

def combine_video_audio(video_path, audio_path, output_path):
    """Combine video and audio files using ffmpeg"""
    try:
        # Use ffmpeg to merge video and audio
        cmd = [
            'ffmpeg',
            '-i', video_path,        # Video file
            '-i', audio_path,        # Audio file
            '-c:v', 'copy',          # Copy video without re-encoding
            '-c:a', 'aac',           # Use AAC for audio (better compatibility)
            '-b:a', '192k',          # Audio bitrate
            '-shortest',             # Match the duration of the shorter file
            '-y',                    # Overwrite output file if it exists
            output_path
        ]

        print(f"Combining video and audio into {output_path}...")
        result = subprocess.run(cmd, capture_output=True, text=True)

        if result.returncode != 0:
            print(f"Error combining video and audio: {result.stderr}")
            return False

        if not os.path.exists(output_path) or os.path.getsize(output_path) == 0:
            print("Combination failed - output file is empty or missing")
            return False

        print("Video and audio combined successfully")
        return True
    except Exception as e:
        print(f"Error during combination: {str(e)}")
        return False

def setup_midas():
    """Initialize and return the MiDaS model for depth estimation using torch.hub
    with optimizations for GPU usage"""
    print("Loading MiDaS depth estimation model...")

    # Clean up any existing GPU memory
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        gc.collect()

    # Select device
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"Using device: {device}")

    if torch.cuda.is_available():
        # Print GPU info
        print(f"GPU: {torch.cuda.get_device_name(0)}")
        print(f"Memory Allocated: {torch.cuda.memory_allocated(0)/1e9:.2f} GB")
        print(f"Memory Reserved: {torch.cuda.memory_reserved(0)/1e9:.2f} GB")

    # Load model - using DPT Large for best quality
    try:
        # Try to disable torch hub cache to ensure we get a fresh model
        torch.hub.set_dir(tempfile.mkdtemp())
        midas = torch.hub.load("intel-isl/MiDaS", "DPT_Large")
    except Exception as e:
        print(f"Error loading model: {e}")
        # Fallback method
        print("Trying alternate loading method...")
        midas = torch.hub.load("intel-isl/MiDaS", "DPT_Large", trust_repo=True)

    midas.to(device)
    midas.eval()  # Set to evaluation mode

    # If using CUDA, optimize model for inference
    if device.type == 'cuda':
        # Enable cuDNN benchmark mode for best performance with fixed input sizes
        torch.backends.cudnn.benchmark = True

        # We'll skip TorchScript optimization as it's causing issues
        print("Skipping TorchScript optimization due to compatibility issues")

    # Load transforms
    try:
        midas_transforms = torch.hub.load("intel-isl/MiDaS", "transforms")
    except Exception as e:
        print(f"Error loading transforms: {e}")
        # Fallback
        midas_transforms = torch.hub.load("intel-isl/MiDaS", "transforms", trust_repo=True)

    transform = midas_transforms.dpt_transform

    # Report GPU memory usage after model loading
    if torch.cuda.is_available():
        print(f"GPU Memory After Model Load: {torch.cuda.memory_allocated(0)/1e9:.2f} GB")

    print("MiDaS model loaded successfully!")
    return midas, transform, device

## Video Processing Functions

Functions for video input validation, frame extraction, audio processing, and depth map generation.

In [11]:
def validate_video(file_path):
    """Validate if the input video file is supported"""
    # Check if file exists
    if not os.path.exists(file_path):
        return False, "File does not exist"

    # Check file extension
    valid_extensions = [".mp4", ".avi", ".mov", ".webm", ".mkv"]
    file_ext = os.path.splitext(file_path)[1].lower()
    if file_ext not in valid_extensions:
        return False, f"Unsupported file format: {file_ext}. Supported formats: {', '.join(valid_extensions)}"

    # Check if OpenCV can open the file
    cap = cv2.VideoCapture(file_path)
    if not cap.isOpened():
        return False, "Cannot open video file with OpenCV"

    # Check resolution
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    if width > 3840 or height > 2160:
        cap.release()
        return False, f"Video resolution ({width}x{height}) exceeds maximum supported resolution (3840x2160)"

    # The file size check that was here has been removed.

    # Get video info
    fps = cap.get(cv2.CAP_PROP_FPS)
    frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    duration_sec = frame_count / fps if fps > 0 else 0
    file_size_mb = os.path.getsize(file_path) / (1024 * 1024) # This line is now just for info
    cap.release()

    return True, {"width": width, "height": height, "fps": fps, "frame_count": frame_count,
                  "size_mb": file_size_mb, "duration_sec": duration_sec}

def extract_video_segment(input_path, output_path, start_time, end_time):
    """Extract a segment from a video file using ffmpeg"""
    try:
        # Ensure the output directory exists
        os.makedirs(os.path.dirname(output_path), exist_ok=True)

        print(f"Extracting segment from {start_time:.2f}s to {end_time:.2f}s...")

        # Use ffmpeg to extract the segment with stream copy
        cmd = [
            'ffmpeg',
            '-i', input_path,        # Input file
            '-ss', str(start_time),  # Start time in seconds
            '-to', str(end_time),    # End time in seconds
            '-c:v', 'copy',          # Copy video stream without re-encoding
            '-c:a', 'copy',          # Copy audio stream without re-encoding
            '-avoid_negative_ts', '1',  # Avoid negative timestamps
            '-y',                    # Overwrite output file if it exists
            output_path
        ]

        # Run the command and capture output
        result = subprocess.run(cmd, capture_output=True, text=True)

        # Check if the command was successful
        if result.returncode != 0:
            print(f"Error extracting segment: {result.stderr}")
            raise Exception(f"ffmpeg error: {result.stderr}")

        # Verify the output file exists and has content
        if not os.path.exists(output_path) or os.path.getsize(output_path) == 0:
            raise ValueError("Segment extraction failed - output file is empty or missing")

        print(f"Segment extracted successfully: {output_path}")
        return output_path

    except Exception as e:
        print(f"Error extracting video segment: {str(e)}")
        raise

def ensure_h264_mp4(input_path, temp_dir="temp_videos"):
    """Convert video to H.264 MP4 format if needed - optimized for speed"""
    # Generate a new filename for the converted video
    output_path = os.path.join(temp_dir, f"h264_{int(time.time())}.mp4")

    # Use ffprobe to check if the video is already H.264 encoded
    try:
        print(f"Checking encoding of {input_path}...")
        # Get video codec information with a short timeout
        result = subprocess.run(
            ['ffprobe', '-v', 'error', '-select_streams', 'v:0',
             '-show_entries', 'stream=codec_name', '-of', 'default=noprint_wrappers=1:nokey=1',
             input_path],
            capture_output=True, text=True, check=True, timeout=10
        )
        codec = result.stdout.strip()

        if codec.lower() in ['h264', 'avc1']:
            print(f"Video is already H.264 encoded (codec: {codec})")
            return input_path
        else:
            print(f"Video is not H.264 encoded (detected codec: {codec}). Converting with fast settings...")
    except subprocess.TimeoutExpired:
        print("Codec detection timed out. Proceeding with conversion...")
    except Exception as e:
        print(f"Error checking video codec: {str(e)}. Converting with fast settings...")

    # Check if input file exists and has content
    if not os.path.exists(input_path) or os.path.getsize(input_path) == 0:
        raise ValueError(f"Input file {input_path} does not exist or is empty")

    # Convert to H.264 MP4 with hardware acceleration if available
    try:
        print("Starting fast H.264 conversion...")

        # Try using hardware acceleration if available
        # NVIDIA GPU acceleration
        hw_accel_commands = [
            # NVIDIA NVENC (if available)
            [
                'ffmpeg',
                '-i', input_path,
                '-c:v', 'h264_nvenc',  # NVIDIA GPU acceleration
                '-preset', 'p1',  # Fast encoding preset
                '-tune', 'hq',  # High quality tuning
                '-rc:v', 'vbr',  # Variable bitrate
                '-cq:v', '23',  # Quality level
                '-b:v', '5M',  # Target bitrate
                '-maxrate:v', '10M',  # Maximum bitrate
                '-bufsize:v', '10M',  # Buffer size
                '-c:a', 'aac',  # Audio codec
                '-b:a', '128k',  # Audio bitrate
                '-y',  # Overwrite output if exists
                output_path
            ],
            # Fallback to CPU with ultrafast preset
            [
                'ffmpeg',
                '-i', input_path,
                '-c:v', 'libx264',  # CPU encoding
                '-preset', 'ultrafast',  # Fastest encoding
                '-tune', 'fastdecode',  # Fast decoding optimization
                '-crf', '28',  # Lower quality for speed
                '-g', '30',  # Keyframe every 30 frames
                '-bf', '0',  # No B-frames (faster)
                '-c:a', 'aac',  # Audio codec
                '-b:a', '128k',  # Low audio bitrate
                '-ac', '2',  # Stereo audio
                '-ar', '44100',  # Standard audio sample rate
                '-strict', 'experimental',
                '-y',  # Overwrite output
                output_path
            ]
        ]

        # Try each acceleration method in order
        success = False
        for i, command in enumerate(hw_accel_commands):
            try:
                print(f"Trying encoding method {i+1}...")

                # Run the command
                print(f"Running conversion command: {' '.join(command)}")
                process = subprocess.Popen(
                    command,
                    stdout=subprocess.PIPE,
                    stderr=subprocess.PIPE,
                    universal_newlines=True
                )

                # Set timeout for conversion (5 minutes)
                timeout = 300  # seconds
                start_time = time.time()

                # Monitor progress
                while process.poll() is None:
                    # Check if timeout has been reached
                    if time.time() - start_time > timeout:
                        process.terminate()
                        raise TimeoutError(f"Conversion timed out after {timeout} seconds")

                    # Print progress indicator
                    print(".", end="", flush=True)
                    time.sleep(1)

                # Check if successful
                if process.returncode == 0 and os.path.exists(output_path) and os.path.getsize(output_path) > 0:
                    print(f"\nSuccessfully converted to H.264 MP4 using method {i+1}: {output_path}")
                    success = True
                    break
                else:
                    print(f"\nMethod {i+1} failed with error code {process.returncode}")
            except Exception as e:
                print(f"Error with method {i+1}: {str(e)}")

        if success:
            return output_path
        else:
            # Fallback to simple copy method (no re-encoding)
            try:
                print("Attempting direct copy method as fallback...")
                subprocess.run([
                    'ffmpeg',
                    '-i', input_path,
                    '-c', 'copy',  # Just copy streams without re-encoding
                    '-y',
                    output_path
                ], check=True, timeout=300)

                if os.path.exists(output_path) and os.path.getsize(output_path) > 0:
                    print(f"Successfully copied video to MP4 container: {output_path}")
                    return output_path
                else:
                    print("Copy method failed to produce a valid output file")
                    # If all conversion methods fail, return the original file path
                    return input_path
            except Exception as e2:
                print(f"All conversion methods failed: {str(e2)}")
                # If all conversion methods fail, return the original file path
                return input_path

    except Exception as e:
        print(f"Error during conversion: {str(e)}")
        # Return the original file path if all else fails
        return input_path

def get_video_duration(file_path):
    """Get the duration of a video file in seconds using ffprobe"""
    try:
        # Use ffprobe to get the duration
        result = subprocess.run(
            ['ffprobe', '-v', 'error', '-show_entries', 'format=duration',
             '-of', 'default=noprint_wrappers=1:nokey=1', file_path],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            universal_newlines=True,
            check=True
        )
        duration = float(result.stdout.strip())
        return duration
    except Exception as e:
        print(f"Error getting video duration: {str(e)}")
        # Fall back to OpenCV
        try:
            cap = cv2.VideoCapture(file_path)
            if not cap.isOpened():
                return 0
            fps = cap.get(cv2.CAP_PROP_FPS)
            frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
            duration = frame_count / fps if fps > 0 else 0
            cap.release()
            return duration
        except Exception as e2:
            print(f"Error getting duration with OpenCV: {str(e2)}")
            return 0

def download_from_url(url):
    """Download video from URL and return local file path"""
    # Create temp directory if it doesn't exist
    temp_dir = "temp_videos"
    os.makedirs(temp_dir, exist_ok=True)

    # Generate a temporary filename
    timestamp = int(time.time())
    temp_file = os.path.join(temp_dir, f"downloaded_video_{timestamp}.mp4")

    try:
        # Check if it's a Google Drive URL
        if "drive.google.com" in url:
            print(f"Detected Google Drive URL. Downloading with gdown...")
            gdown.download(url, temp_file, quiet=False, fuzzy=True)

            if not os.path.exists(temp_file) or os.path.getsize(temp_file) == 0:
                raise Exception("gdown failed to download the file.")

            print(f"Successfully downloaded from Google Drive to {temp_file}")
            # Ensure the downloaded file is in a compatible format
            return ensure_h264_mp4(temp_file, temp_dir)

        # Check if it's a YouTube URL (logic from your original code)
        elif "youtube.com/" in url or "youtu.be/" in url:
            print(f"Attempting to download YouTube video from: {url}")
            # Using yt-dlp for YouTube downloads
            !yt-dlp -f "best[ext=mp4]/best" -o "{temp_file}" "{url}"

            if not os.path.exists(temp_file) or os.path.getsize(temp_file) == 0:
                raise Exception("yt-dlp failed to download the video.")

            print(f"Successfully downloaded video with 'best' format to {temp_file}")
            return ensure_h264_mp4(temp_file, temp_dir)

        # Fallback for other direct URLs
        else:
            print(f"Downloading from direct URL: {url}")
            urllib.request.urlretrieve(url, temp_file)
            print(f"Successfully downloaded to {temp_file}")
            return ensure_h264_mp4(temp_file, temp_dir)

    except Exception as e:
        print(f"An error occurred during download: {str(e)}")
        # Check if a partial file was created and clean it up
        if os.path.exists(temp_file):
            os.remove(temp_file)
        return None

# Function to estimate depth for a single frame
def estimate_depth(frame, model, transform, device):
    """Estimate depth for a single frame using MiDaS"""
    # Preprocess image for MiDaS (using torch.hub transforms)
    img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    input_batch = transform(img).to(device)

    # Compute depth prediction
    with torch.no_grad():
        prediction = model(input_batch)
        # Resize prediction to original frame size
        prediction = torch.nn.functional.interpolate(
            prediction.unsqueeze(1),
            size=frame.shape[:2],
            mode="bicubic",
            align_corners=False,
        ).squeeze()

    depth = prediction.cpu().numpy()

    # Normalize depth map to 0-1 range
    depth_min = depth.min()
    depth_max = depth.max()
    if depth_max - depth_min > 0:
        depth = (depth - depth_min) / (depth_max - depth_min)
    else:
        depth = np.zeros(depth.shape, dtype=depth.dtype)

    return depth

# Process batches of frames efficiently
def process_batch(frames, model, transform, device):
    """Process a batch of frames to get depth maps"""
    depth_maps = []

    # Process each frame in the batch separately
    # This is more compatible than trying to batch process
    for frame in frames:
        depth_map = estimate_depth(frame, model, transform, device)
        depth_maps.append(depth_map)

    return depth_maps

## Stereoscopic Conversion

Functions to create stereoscopic side-by-side views from original frames and depth maps.

In [12]:
def create_depth_based_disparity(depth_map, depth_intensity, convergence, eye_separation):
    """Create disparity map from depth map using the control parameters"""
    # Invert depth map since closer objects should have larger disparity
    inverted_depth = 1.0 - depth_map

    # Apply intensity control
    disparity = inverted_depth * depth_intensity

    # Apply eye separation and convergence adjustment
    disparity = disparity * eye_separation / convergence

    return disparity

def generate_stereo_views(frame, depth_map, depth_intensity, convergence, eye_separation):
    """Generate left and right eye views for stereoscopic 3D"""
    h, w = frame.shape[:2]

    # Create disparity map from depth map
    disparity = create_depth_based_disparity(depth_map, depth_intensity, convergence, eye_separation)

    # Scale disparity to pixel displacement (max 5% of image width)
    max_shift = int(w * 0.05)
    disparity_scaled = disparity * max_shift

    # Create empty images for left and right views
    left_view = np.zeros_like(frame)
    right_view = np.zeros_like(frame)

    # For each row in the image
    for y in range(h):
        for x in range(w):
            # Calculate shift for this pixel
            shift = disparity_scaled[y, x]

            # Calculate left and right positions
            left_x = max(0, min(w-1, int(x - shift/2)))
            right_x = max(0, min(w-1, int(x + shift/2)))

            # Copy pixel values
            left_view[y, left_x] = frame[y, x]
            right_view[y, right_x] = frame[y, x]

    # Fill holes using inpainting
    # Create masks for unfilled areas
    left_mask = np.all(left_view == 0, axis=2).astype(np.uint8) * 255
    right_mask = np.all(right_view == 0, axis=2).astype(np.uint8) * 255

    # Inpainting
    if np.any(left_mask):
        left_view = cv2.inpaint(left_view, left_mask, 3, cv2.INPAINT_TELEA)
    if np.any(right_mask):
        right_view = cv2.inpaint(right_view, right_mask, 3, cv2.INPAINT_TELEA)

    return left_view, right_view

def create_side_by_side(left_view, right_view):
    """Combine left and right views into a side-by-side 3D format with 16:9 overall aspect ratio
    and 4:3 aspect ratio for each eye. Each eye view is embedded in a 16:9 frame with black bars."""

    # Fixed dimensions for the final 16:9 output
    total_width = 1920   # Total width for 16:9 aspect ratio
    total_height = 1080  # Total height for 16:9 aspect ratio

    # Each eye gets half the width
    eye_width = total_width // 2  # 960px per eye

    # Determine content height for 4:3 aspect ratio within each eye view
    content_height = int(eye_width * 3/4)  # 720px for 4:3 ratio at 960px width

    # Resize views to exact 4:3 dimensions for each eye
    left_resized = cv2.resize(left_view, (eye_width, content_height))
    right_resized = cv2.resize(right_view, (eye_width, content_height))

    # Create a black canvas with 16:9 aspect ratio
    sbs_frame = np.zeros((total_height, total_width, 3), dtype=np.uint8)

    # Calculate vertical offset to center content (black bars at top and bottom)
    vertical_offset = (total_height - content_height) // 2

    # Place the views side by side in the center of the frame with black bars
    sbs_frame[vertical_offset:vertical_offset+content_height, 0:eye_width] = left_resized
    sbs_frame[vertical_offset:vertical_offset+content_height, eye_width:total_width] = right_resized

    return sbs_frame

## Main Video Processing Function

Implements the core processing pipeline that converts the 2D video to 3D SBS format with GPU acceleration.

In [13]:
def process_video_to_3d_sbs(input_path, output_path, depth_intensity, convergence, eye_separation,
                           progress=None, use_segment=False, segment_start=0, segment_end=None):
    """Convert a 2D video to 3D SBS using MiDaS depth estimation with GPU optimization"""
    try:
        # Validate input video
        valid, result = validate_video(input_path)
        if not valid:
            raise ValueError(result)

        video_info = result

        # Create temporary directory for intermediate files
        temp_dir = "temp_videos"
        os.makedirs(temp_dir, exist_ok=True)

        # Base names for temp files
        timestamp = int(time.time())
        temp_base = os.path.join(temp_dir, f"temp_{timestamp}")
        temp_video_path = f"{temp_base}_video.mp4"  # For video without audio
        temp_audio_path = f"{temp_base}_audio.aac"  # For extracted audio

        # Track whether we're processing a segment
        is_segment = False
        original_input = input_path

        # Extract audio from the source video (original or segment)
        has_audio = check_audio_stream(input_path)
        if has_audio:
            print("Detected audio stream in the video")
            if extract_audio(input_path, temp_audio_path):
                print(f"Audio extracted to {temp_audio_path}")
            else:
                print("Could not extract audio. Output will have no sound.")
                has_audio = False
        else:
            print("No audio stream detected in the video")

        # If using a segment, extract it first
        segment_path = None
        if use_segment and segment_start is not None and segment_end is not None and segment_start < segment_end:
            try:
                # Create temporary segment file
                temp_dir = "temp_videos"
                os.makedirs(temp_dir, exist_ok=True)
                segment_path = os.path.join(temp_dir, f"segment_{int(time.time())}.mp4")

                # Extract the segment
                segment_path = extract_video_segment(input_path, segment_path, segment_start, segment_end)

                # Use the segment file for processing
                input_path = segment_path

                # Re-validate the segment
                valid, result = validate_video(input_path)
                if not valid:
                    raise ValueError(f"Segment validation failed: {result}")

                video_info = result
                print(f"Using video segment from {segment_start}s to {segment_end}s")

            except Exception as e:
                print(f"Error extracting segment: {str(e)}. Processing entire video instead.")
                # Continue with the original file if segment extraction fails

        width, height = video_info["width"], video_info["height"]
        fps = video_info["fps"]
        frame_count = int(video_info["frame_count"])

        # Clear GPU memory before starting
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            gc.collect()

        # Setup MiDaS model - note: only getting 3 return values now
        model, transform, device = setup_midas()

        # Determine optimal batch size based on available GPU memory and resolution
        batch_size = 1  # Default
        if torch.cuda.is_available():
            # Calculate available memory
            available_mem = torch.cuda.get_device_properties(0).total_memory
            current_mem = torch.cuda.memory_allocated()
            free_mem = available_mem - current_mem

            # Heuristic for batch size based on resolution
            pixel_count = width * height
            if pixel_count <= 640 * 480:  # SD video
                batch_size = 8
            elif pixel_count <= 1280 * 720:  # HD video
                batch_size = 4
            elif pixel_count <= 1920 * 1080:  # Full HD
                batch_size = 2
            else:  # 4K
                batch_size = 1

            print(f"Using batch size: {batch_size} for {width}x{height} video")
            print(f"Available GPU memory: {free_mem/1e9:.2f}GB")

        # Open input video
        cap = cv2.VideoCapture(input_path)

        # Create output video writer
        target_width, target_height = 1920, 1080  # 16:9 overall aspect ratio
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')  # H.264 codec
        out = cv2.VideoWriter(temp_video_path, fourcc, fps, (target_width, target_height))

        # Process frames
        frame_index = 0
        prev_depth_map = None

        # Report initial memory usage
        if torch.cuda.is_available():
            print(f"Initial GPU Memory: {torch.cuda.memory_allocated(0)/1e9:.2f}GB / {torch.cuda.get_device_properties(0).total_memory/1e9:.2f}GB")

        # Process video in batches
        while True:
            # Read batch of frames
            frames = []
            for _ in range(batch_size):
                ret, frame = cap.read()
                if not ret:
                    break
                frames.append(frame)

            if not frames:
                break  # End of video

            # Process batch of frames to get depth maps
            depth_maps = process_batch(frames, model, transform, device)

            # Process each frame with its depth map
            for i in range(len(frames)):
                frame = frames[i]
                depth_map = depth_maps[i]

                # Apply temporal smoothing
                if prev_depth_map is not None:
                    depth_map = 0.8 * depth_map + 0.2 * prev_depth_map
                prev_depth_map = depth_map.copy()

                # Generate stereo views
                left_view, right_view = generate_stereo_views(frame, depth_map, depth_intensity, convergence, eye_separation)

                # Create side-by-side frame
                sbs_frame = create_side_by_side(left_view, right_view)

                # Write frame to output
                out.write(sbs_frame)

                # Update progress
                frame_index += 1
                if progress is not None:
                    progress(min(1.0, frame_index / frame_count))

                # Report GPU memory periodically
                if frame_index % 10 == 0 and torch.cuda.is_available():
                    memory_used_gb = torch.cuda.memory_allocated(0)/1e9
                    total_mem_gb = torch.cuda.get_device_properties(0).total_memory/1e9
                    usage_percent = (memory_used_gb / total_mem_gb) * 100
                    print(f"Frame {frame_index}/{frame_count} - GPU Memory: {memory_used_gb:.2f}GB / {total_mem_gb:.2f}GB ({usage_percent:.1f}%)")

            # Clean GPU memory every 50 frames
            if frame_index % 50 == 0 and torch.cuda.is_available():
                torch.cuda.empty_cache()
                gc.collect()

        # Ensure 100% progress at the end
        if progress is not None:
            progress(1.0)

        # Release resources
        cap.release()
        out.release()

        print("Processing complete!")

        # Now combine the processed video with the original audio
        if has_audio:
            print("Combining video with original audio...")
            if combine_video_audio(temp_video_path, temp_audio_path, output_path):
                print("Successfully combined video with audio")
            else:
                print("Audio combination failed. Using high quality encoding for video-only output...")
                # Fall back to just processing the video without audio
                if torch.cuda.is_available():
                    subprocess.run([
                        'ffmpeg',
                        '-i', temp_video_path,
                        '-c:v', 'h264_nvenc',  # NVIDIA hardware encoding
                        '-preset', 'p2',       # Medium quality/speed
                        '-b:v', '8M',          # Bitrate
                        '-y',                  # Overwrite output if exists
                        output_path
                    ], check=True, timeout=600)
                else:
                    subprocess.run([
                        'ffmpeg',
                        '-i', temp_video_path,
                        '-c:v', 'libx264',     # CPU encoding
                        '-preset', 'medium',   # Medium quality/speed
                        '-crf', '23',          # Quality level
                        '-y',                  # Overwrite output if exists
                        output_path
                    ], check=True, timeout=600)
        else:
            # No audio to add, just convert the video
            print("No audio to add. Finalizing video with high quality encoding...")
            if torch.cuda.is_available():
                subprocess.run([
                    'ffmpeg',
                    '-i', temp_video_path,
                    '-c:v', 'h264_nvenc',  # NVIDIA hardware encoding
                    '-preset', 'p2',       # Medium quality/speed
                    '-b:v', '8M',          # Bitrate
                    '-y',                  # Overwrite output if exists
                    output_path
                ], check=True, timeout=600)
            else:
                subprocess.run([
                    'ffmpeg',
                    '-i', temp_video_path,
                    '-c:v', 'libx264',     # CPU encoding
                    '-preset', 'medium',   # Medium quality/speed
                    '-crf', '23',          # Quality level
                    '-y',                  # Overwrite output if exists
                    output_path
                ], check=True, timeout=600)

        # Clean up GPU memory
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            gc.collect()

        # Clean up segment file if we created one
        if segment_path and os.path.exists(segment_path) and segment_path != input_path:
            try:
                os.remove(segment_path)
                print(f"Cleaned up temporary segment file: {segment_path}")
            except Exception as e:
                print(f"Warning: Could not remove temporary segment file: {str(e)}")

        return output_path

    except Exception as e:
        print(f"Error in process_video_to_3d_sbs: {str(e)}")
        raise

def generate_preview_frame(input_path, depth_intensity, convergence, eye_separation, frame_position=0.5):
    """Generate a preview frame for the given parameters"""
    try:
        # Validate input video
        valid, result = validate_video(input_path)
        if not valid:
            raise ValueError(result)

        # Setup MiDaS model
        model, transform, device = setup_midas()

        # Open input video
        cap = cv2.VideoCapture(input_path)

        # Get frame count and set position
        frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
        target_frame = int(frame_count * frame_position)
        cap.set(cv2.CAP_PROP_POS_FRAMES, target_frame)

        # Read frame
        ret, frame = cap.read()
        if not ret:
            cap.release()
            raise ValueError("Could not read frame from video")

        # Estimate depth
        depth_map = estimate_depth(frame, model, transform, device)

        # Generate stereo views
        left_view, right_view = generate_stereo_views(frame, depth_map, depth_intensity, convergence, eye_separation)

        # Create side-by-side frame
        sbs_frame_full = create_side_by_side(left_view, right_view)

        # Create preview (smaller version)
        preview_height = 360
        preview_width = int(1920 * (preview_height / 1080))
        sbs_frame = cv2.resize(sbs_frame_full, (preview_width, preview_height))

        # Create comparison view with original frame
        h, w = frame.shape[:2]
        original_resized = cv2.resize(frame, (int(w * preview_height / h), preview_height))

        # Create final preview
        preview_width_total = original_resized.shape[1] + sbs_frame.shape[1] + 10
        preview = np.zeros((preview_height, preview_width_total, 3), dtype=np.uint8)

        # Add original frame
        preview[:, :original_resized.shape[1]] = original_resized
        # Add SBS frame
        preview[:, original_resized.shape[1]+10:] = sbs_frame

        # Add labels
        cv2.putText(preview, "Original", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
        cv2.putText(preview, "3D SBS (16:9)", (original_resized.shape[1]+20, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)

        # Release resources
        cap.release()

        # Convert to RGB for display
        preview_rgb = cv2.cvtColor(preview, cv2.COLOR_BGR2RGB)

        # Clean up GPU memory
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            gc.collect()

        return preview_rgb

    except Exception as e:
        print(f"Error in generate_preview_frame: {str(e)}")
        raise

## Gradio Interface

Create an intuitive user interface for the 2D to 3D conversion with parameter controls and real-time preview.

In [None]:
def create_gradio_interface():
    """Create and launch the Gradio interface for 3D SBS conversion with audio support"""
    # Global variables for state management
    input_video_path = None
    output_video_path = None
    video_duration = 0  # Store video duration for segment selection

    def save_uploaded_file(file_obj):
        """Helper function to save an uploaded file to disk"""
        temp_dir = "temp_videos"
        os.makedirs(temp_dir, exist_ok=True)

        # Generate a filename with timestamp to avoid conflicts
        timestamp = int(time.time())
        file_name = f"uploaded_video_{timestamp}.mp4"
        file_path = os.path.join(temp_dir, file_name)

        print(f"Saving uploaded file to {file_path}")

        try:
            # Handle different file object types based on Gradio version
            if isinstance(file_obj, str):
                # It's a file path string, just copy the file
                shutil.copy(file_obj, file_path)
            elif hasattr(file_obj, 'name') and os.path.exists(file_obj.name):
                # It's an object with a name attribute that points to a real file
                shutil.copy(file_obj.name, file_path)
            else:
                # Try multiple approaches based on different versions of Gradio
                if hasattr(file_obj, 'read') and callable(file_obj.read):
                    # It's a file-like object, read and write it
                    with open(file_path, 'wb') as f:
                        f.write(file_obj.read())
                elif hasattr(file_obj, '_path') and os.path.exists(file_obj._path):
                    # Some versions of Gradio use a _path attribute
                    shutil.copy(file_obj._path, file_path)
                else:
                    # Fall back to trying to directly access file object (may not work in all cases)
                    with open(file_path, 'wb') as f:
                        if isinstance(file_obj, bytes):
                            f.write(file_obj)
                        else:
                            f.write(str(file_obj).encode('utf-8'))
        except Exception as e:
            print(f"Error saving file: {str(e)}")
            raise e

        # Ensure the video is in H.264 format
        return ensure_h264_mp4(file_path, temp_dir)

    def upload_video(video_file):
        """Handle video upload"""
        nonlocal input_video_path, video_duration

        if video_file is None:
            return None, "Please upload a video file", gr.Slider(minimum=0, maximum=1), gr.Slider(minimum=0, maximum=1), False

        try:
            # Save the uploaded file to disk and ensure H.264 encoding
            input_video_path = save_uploaded_file(video_file)

            # Validate video
            valid, result = validate_video(input_video_path)
            if not valid:
                return None, result, gr.Slider(minimum=0, maximum=1), gr.Slider(minimum=0, maximum=1), False

            # Get video duration for segment selection
            video_duration = result.get("duration_sec", 0)
            if video_duration <= 0:
                video_duration = get_video_duration(input_video_path)

            # Update segment sliders
            start_slider = gr.Slider(minimum=0, maximum=video_duration, value=0, step=0.1, label="Start Time (seconds)")
            end_slider = gr.Slider(minimum=0, maximum=video_duration, value=video_duration, step=0.1, label="End Time (seconds)")

            # Generate a preview frame
            preview = generate_preview_frame(input_video_path, 0.5, 5.0, 2.5)

            # Enable segment checkbox only if video is longer than 30 seconds
            enable_segment = video_duration > 30

            return preview, f"Video loaded successfully: {result['width']}x{result['height']}, {result['fps']:.2f} FPS, {result['frame_count']} frames, {result['size_mb']:.2f}MB, Duration: {video_duration:.2f}s", start_slider, end_slider, enable_segment
        except Exception as e:
            return None, f"Error processing video upload: {str(e)}", gr.Slider(minimum=0, maximum=1), gr.Slider(minimum=0, maximum=1), False

    def download_from_url_handler(url):
        """Handle video URL input"""
        nonlocal input_video_path, video_duration

        if not url or url.strip() == "":
            return None, "Please enter a valid URL", gr.Slider(minimum=0, maximum=1), gr.Slider(minimum=0, maximum=1), False

        try:
            # Download video and convert to H.264 if needed
            input_video_path = download_from_url(url)

            # Validate video
            valid, result = validate_video(input_video_path)
            if not valid:
                return None, result, gr.Slider(minimum=0, maximum=1), gr.Slider(minimum=0, maximum=1), False

            # Get video duration for segment selection
            video_duration = result.get("duration_sec", 0)
            if video_duration <= 0:
                video_duration = get_video_duration(input_video_path)

            # Update segment sliders
            start_slider = gr.Slider(minimum=0, maximum=video_duration, value=0, step=0.1, label="Start Time (seconds)")
            end_slider = gr.Slider(minimum=0, maximum=video_duration, value=video_duration, step=0.1, label="End Time (seconds)")

            # Generate a preview frame
            preview = generate_preview_frame(input_video_path, 0.5, 5.0, 2.5)

            # Enable segment checkbox only if video is longer than 30 seconds
            enable_segment = video_duration > 30

            return preview, f"Video downloaded and converted successfully: {result['width']}x{result['height']}, {result['fps']:.2f} FPS, {result['frame_count']} frames, {result['size_mb']:.2f}MB, Duration: {video_duration:.2f}s", start_slider, end_slider, enable_segment

        except Exception as e:
            return None, f"Error downloading video: {str(e)}", gr.Slider(minimum=0, maximum=1), gr.Slider(minimum=0, maximum=1), False

    def update_preview(depth_intensity, convergence, eye_separation):
        """Update preview based on parameter changes"""
        nonlocal input_video_path

        if input_video_path is None or not os.path.exists(input_video_path):
            return None, "No video loaded"

        try:
            # Generate new preview with current parameters
            preview = generate_preview_frame(input_video_path, depth_intensity, convergence, eye_separation)
            return preview, "Preview updated with new parameters"
        except Exception as e:
            return None, f"Error updating preview: {str(e)}"

    def update_end_time(start_time):
        """Update the end time slider to ensure it's always greater than start time"""
        return gr.Slider(minimum=start_time + 0.1, maximum=video_duration, value=max(start_time + 0.1, video_duration))

    def sync_segment_values(use_segment, segment_start, segment_end):
        """Synchronize segment values between tabs"""
        # For Gradio compatibility, return a tuple of values instead of a dictionary
        return use_segment, segment_start, segment_end

    def process_video(depth_intensity, convergence, eye_separation, use_segment, segment_start, segment_end, progress=gr.Progress()):
        """Process the video with the given parameters"""
        nonlocal input_video_path, output_video_path, video_duration

        if input_video_path is None or not os.path.exists(input_video_path):
            return None, "No video loaded"

        try:
            # Create output directory
            output_dir = "output_videos"
            os.makedirs(output_dir, exist_ok=True)

            # Generate output filename
            base_name = os.path.basename(input_video_path)
            name, ext = os.path.splitext(base_name)

            # Add segment info to output filename if using segment
            if use_segment and segment_start is not None and segment_end is not None and segment_start < segment_end:
                output_video_path = os.path.join(output_dir, f"{name}_3D_SBS_{int(segment_start)}-{int(segment_end)}s.mp4")
            else:
                output_video_path = os.path.join(output_dir, f"{name}_3D_SBS.mp4")

            # Process video
            process_video_to_3d_sbs(
                input_path=input_video_path,
                output_path=output_video_path,
                depth_intensity=depth_intensity,
                convergence=convergence,
                eye_separation=eye_separation,
                progress=progress,
                use_segment=use_segment,
                segment_start=segment_start if use_segment else None,
                segment_end=segment_end if use_segment else None
            )

            segment_info = f" (segment {segment_start:.1f}s-{segment_end:.1f}s)" if use_segment else ""
            return output_video_path, f"Video processed successfully{segment_info}. Saved to {output_video_path} with 16:9 aspect ratio (1920x1080) as requested."

        except Exception as e:
            return None, f"Error processing video: {str(e)}"

    # Create the Gradio interface
    with gr.Blocks(title="2D to 3D SBS Video Converter (GPU Optimized)") as app:
        gr.Markdown("# 2D to 3D Side-by-Side Video Converter (GPU Optimized)")
        gr.Markdown("Convert standard 2D videos to stereoscopic 3D SBS format for VR viewing. Output has a 16:9 aspect ratio (1920x1080) with both eye views side by side. **Preserves original audio track** in the output video.")

        if torch.cuda.is_available():
            gpu_info = f"Using GPU: {torch.cuda.get_device_name(0)} with {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f}GB memory"
            gr.Markdown(f"**{gpu_info}**")
        else:
            gr.Markdown("**Running in CPU mode. Processing will be slower without GPU acceleration.**")

        with gr.Tab("Upload Video"):
            with gr.Row():
                with gr.Column(scale=1):
                    # Video upload widget
                    upload_input = gr.File(
                        label="Upload Video File (max 500MB)",
                        file_types=["video"],
                        file_count="single"
                    )
                    upload_button = gr.Button("Upload and Preview")

                with gr.Column(scale=2):
                    # Preview and status
                    preview = gr.Image(label="Preview")
                    status = gr.Textbox(label="Status", interactive=False)

                    # Segment selection (initially hidden/disabled)
                    use_segment = gr.Checkbox(label="Process a specific segment of the video", value=False, interactive=False)
                    segment_start = gr.Slider(minimum=0, maximum=1, value=0, step=0.1, label="Start Time (seconds)")
                    segment_end = gr.Slider(minimum=0, maximum=1, value=1, step=0.1, label="End Time (seconds)")

            # Connect upload button
            upload_button.click(
                upload_video,
                inputs=[upload_input],
                outputs=[preview, status, segment_start, segment_end, use_segment]
            )

            # Update end time slider when start time changes to maintain valid range
            segment_start.change(update_end_time, inputs=[segment_start], outputs=[segment_end])

        with gr.Tab("Video URL"):
            with gr.Row():
                with gr.Column(scale=1):
                    # URL input widget
                    url_input = gr.Textbox(label="Video URL (YouTube or direct link)")
                    url_button = gr.Button("Download and Preview")

                with gr.Column(scale=2):
                    # Preview and status (shared with upload tab)
                    url_preview = gr.Image(label="Preview")
                    url_status = gr.Textbox(label="Status", interactive=False)

                    # Segment selection (initially hidden/disabled)
                    url_use_segment = gr.Checkbox(label="Process a specific segment of the video", value=False, interactive=False)
                    url_segment_start = gr.Slider(minimum=0, maximum=1, value=0, step=0.1, label="Start Time (seconds)")
                    url_segment_end = gr.Slider(minimum=0, maximum=1, value=1, step=0.1, label="End Time (seconds)")

            # Connect URL button
            url_button.click(
                download_from_url_handler,
                inputs=[url_input],
                outputs=[url_preview, url_status, url_segment_start, url_segment_end, url_use_segment]
            )

            # Update end time slider when start time changes to maintain valid range
            url_segment_start.change(update_end_time, inputs=[url_segment_start], outputs=[url_segment_end])

        with gr.Tab("Convert to 3D"):
            with gr.Row():
                with gr.Column(scale=1):
                    # Depth control parameters
                    depth_intensity = gr.Slider(
                        minimum=0.0, maximum=1.0, value=0.5, step=0.01,
                        label="Depth Intensity",
                        info="Controls the strength of the 3D effect (0.0-1.0)"
                    )

                    convergence = gr.Slider(
                        minimum=1.0, maximum=10.0, value=5.0, step=0.1,
                        label="Convergence Distance",
                        info="Adjusts the perceived distance of objects (1.0-10.0)"
                    )

                    eye_separation = gr.Slider(
                        minimum=0.1, maximum=5.0, value=2.5, step=0.1,
                        label="Eye Separation",
                        info="Controls the distance between virtual cameras (0.1-5.0)"
                    )

                    # Segment selection (duplicated for this tab for better UX)
                    conv_use_segment = gr.Checkbox(label="Process a specific segment of the video", value=False)
                    conv_segment_start = gr.Slider(minimum=0, maximum=video_duration, value=0, step=0.1, label="Start Time (seconds)")
                    conv_segment_end = gr.Slider(minimum=0, maximum=video_duration, value=video_duration, step=0.1, label="End Time (seconds)")

                    # Update end time slider when start time changes
                    conv_segment_start.change(update_end_time, inputs=[conv_segment_start], outputs=[conv_segment_end])

                    # Update preview button
                    update_button = gr.Button("Update Preview")

                    # Process button
                    process_button = gr.Button("Process Video", variant="primary")

                with gr.Column(scale=2):
                    # Preview and status (shared)
                    convert_preview = gr.Image(label="Preview")
                    convert_status = gr.Textbox(label="Status", interactive=False)

                    # Output video
                    output_video = gr.Video(label="Converted 3D SBS Video (16:9 aspect ratio with audio)")

            # Connect update preview button
            update_button.click(
                update_preview,
                inputs=[depth_intensity, convergence, eye_separation],
                outputs=[convert_preview, convert_status]
            )

            # Connect process button
            process_button.click(
                process_video,
                inputs=[depth_intensity, convergence, eye_separation, conv_use_segment, conv_segment_start, conv_segment_end],
                outputs=[output_video, convert_status]
            )

            # Synchronize segment values between tabs
            # Connect the segment controls to the sync function
            use_segment.change(
                sync_segment_values,
                inputs=[use_segment, segment_start, segment_end],
                outputs=[conv_use_segment, conv_segment_start, conv_segment_end]
            )

            url_use_segment.change(
                sync_segment_values,
                inputs=[url_use_segment, url_segment_start, url_segment_end],
                outputs=[conv_use_segment, conv_segment_start, conv_segment_end]
            )

            # Sync back from Convert tab to others
            conv_use_segment.change(
                sync_segment_values,
                inputs=[conv_use_segment, conv_segment_start, conv_segment_end],
                outputs=[use_segment, segment_start, segment_end]
            )

            conv_use_segment.change(
                sync_segment_values,
                inputs=[conv_use_segment, conv_segment_start, conv_segment_end],
                outputs=[url_use_segment, url_segment_start, url_segment_end]
            )

        # Help tab
        with gr.Tab("Help"):
            gr.Markdown("""
            ## How to Use This Tool

            1. Upload a video file or provide a URL to a video (supports YouTube).
            2. For longer videos, you can choose to process only a specific segment to save time and memory:
               - Check the "Process a specific segment" box
               - Set the start and end times in seconds
            3. Adjust the depth parameters to control the 3D effect:
               - **Depth Intensity**: Controls the strength of the 3D effect. Higher values create more pronounced depth.
               - **Convergence Distance**: Adjusts where objects appear to be in relation to the screen plane.
               - **Eye Separation**: Controls the virtual camera separation. Higher values create more extreme 3D effects.
            4. Click "Update Preview" to see how your settings affect the 3D output.
            5. Click "Process Video" to convert the entire video (or selected segment) to 3D SBS format.
            6. Download the converted video for viewing in a VR headset or 3D display.

            ## Video Segmentation

            The video segment feature allows you to process only a portion of a longer video. This is useful for:
            - Testing different 3D settings on a small clip before processing the entire video
            - Processing very long videos in manageable chunks to avoid memory issues or timeouts
            - Creating highlights in 3D from specific parts of a longer video

            ## Output Format

            - The final video will have a 16:9 aspect ratio (1920x1080)
            - Each eye view is positioned side by side with appropriate proportions
            - Black bars are added as needed to maintain the proper 16:9 aspect ratio
            - H.264 encoded MP4 format for maximum compatibility
            - Maintains the original video's frame rate

            ## Supported Formats

            - Input: MP4, AVI, MOV, WebM, MKV (up to 4K resolution, max 500MB)
            - Output: H.264 encoded MP4 in Side-by-Side format (1920x1080)

            ## Viewing the 3D Video

            The output video is in Side-by-Side (SBS) format, which can be viewed in:
            - VR headsets using video players that support SBS format
            - 3D TVs with SBS viewing mode
            - Special 3D viewers like Google Cardboard with SBS-compatible apps

            ## GPU Optimization

            This version of the converter is optimized to take advantage of NVIDIA GPUs for faster processing:

            - Batch processing of multiple frames at once to maximize GPU utilization
            - GPU-accelerated depth map generation
            - Optimized memory management to handle larger videos
            - Hardware-accelerated video encoding when available

            ## Troubleshooting

            - If processing fails, try using a smaller segment of the video.
            - For best results, use videos with good lighting and clear objects.
            - If the 3D effect is too strong or causes discomfort, lower the Depth Intensity and Eye Separation values.
            - If you experience issues with YouTube downloads, try using a direct video URL instead.
            - If you experience any issues, check the status messages for error details.
            """)

    # Launch the app
    app.launch(debug=True, share=True)

# Initialize and launch the Gradio application
create_gradio_interface()

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://49d51384e9fa8adbe0.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Detected Google Drive URL. Downloading with gdown...


Downloading...
From (original): https://drive.google.com/uc?id=1t0YHqmC4ZImt8rp5BvqDkJcxzqHHUeaP
From (redirected): https://drive.google.com/uc?id=1t0YHqmC4ZImt8rp5BvqDkJcxzqHHUeaP&confirm=t&uuid=a90d5c50-ed67-4e77-a304-c2e67644ca4a
To: /content/temp_videos/downloaded_video_1751922101.mp4
100%|██████████| 4.13G/4.13G [00:51<00:00, 80.3MB/s]


Successfully downloaded from Google Drive to temp_videos/downloaded_video_1751922101.mp4
Checking encoding of temp_videos/downloaded_video_1751922101.mp4...
Video is already H.264 encoded (codec: h264)
Loading MiDaS depth estimation model...
Using device: cuda
GPU: Tesla T4
Memory Allocated: 0.00 GB
Memory Reserved: 0.00 GB


Downloading: "https://github.com/intel-isl/MiDaS/zipball/master" to /tmp/tmp1vli0p1b/master.zip
Downloading: "https://github.com/isl-org/MiDaS/releases/download/v3/dpt_large_384.pt" to /tmp/tmp1vli0p1b/checkpoints/dpt_large_384.pt
100%|██████████| 1.28G/1.28G [00:13<00:00, 100MB/s] 


Skipping TorchScript optimization due to compatibility issues


Using cache found in /tmp/tmp1vli0p1b/intel-isl_MiDaS_master


GPU Memory After Model Load: 1.39 GB
MiDaS model loaded successfully!
No audio stream detected in the video
Extracting segment from 0.00s to 1233.52s...
Segment extracted successfully: temp_videos/segment_1751922285.mp4
Using video segment from 0s to 1233.5249999999999s
Loading MiDaS depth estimation model...
Using device: cuda
GPU: Tesla T4
Memory Allocated: 0.03 GB
Memory Reserved: 0.05 GB


Downloading: "https://github.com/intel-isl/MiDaS/zipball/master" to /tmp/tmprk0e9_nk/master.zip
Downloading: "https://github.com/isl-org/MiDaS/releases/download/v3/dpt_large_384.pt" to /tmp/tmprk0e9_nk/checkpoints/dpt_large_384.pt
100%|██████████| 1.28G/1.28G [00:15<00:00, 90.8MB/s]


Skipping TorchScript optimization due to compatibility issues


Using cache found in /tmp/tmprk0e9_nk/intel-isl_MiDaS_master


GPU Memory After Model Load: 1.40 GB
MiDaS model loaded successfully!
Using batch size: 1 for 3840x2160 video
Available GPU memory: 14.43GB
Initial GPU Memory: 1.40GB / 15.83GB
Frame 10/73713 - GPU Memory: 1.40GB / 15.83GB (8.9%)
Frame 20/73713 - GPU Memory: 1.40GB / 15.83GB (8.9%)
Frame 30/73713 - GPU Memory: 1.40GB / 15.83GB (8.9%)


## Troubleshooting and Tips

If you experience issues with the application, here are some tips and solutions:

1. **GPU Memory Errors**: If you encounter CUDA out of memory errors:
   - Use the segment feature to process smaller portions of the video
   - Restart the runtime to clear memory
   - Process a shorter or lower resolution video

2. **Loading Time**: The MiDaS model takes time to download and load initially. Be patient during first use.

3. **Quality Issues**: The quality of the 3D effect depends on the input video quality and the accuracy of the depth map. Videos with clear objects and good lighting work best.

4. **Processing Speed**: Even with GPU acceleration, depth estimation is computationally intensive. Processing time depends on video length, resolution, and available GPU resources.

5. **View Distance**: If objects appear too close or too far in the 3D output, adjust the Convergence Distance parameter.

6. **Eye Strain**: If the 3D effect causes discomfort, reduce the Depth Intensity and Eye Separation values for a more comfortable viewing experience.

7. **YouTube Downloads**: If YouTube downloads fail, try using a different web browser to copy a direct video URL.

8. **Runtime Disconnections**: For long videos, Colab might disconnect. Use the segment feature to process the video in chunks.

9. **Video Conversion**: If video conversion seems stuck, try restarting the notebook and using a smaller video segment.

10. **Maximizing GPU Usage**: This version attempts to use the full capacity of your GPU. You can monitor GPU usage in Colab using the command: `!nvidia-smi` in a new cell.

11. **Processing Segments**: For videos longer than a few minutes, it's recommended to process them in 1-3 minute segments to avoid Colab timeouts and memory issues.

12. **Best Results**: For optimal 3D conversion, use videos with:
   - Clear foreground and background separation
   - Good lighting conditions
   - Minimal fast camera movement
   - Higher resolution (1080p or above)