# Notebook Description  
This notebook demonstrates video processing using the Florence2 Large Vision Model (LVM). It includes steps for setting up the environment, importing essential libraries, and configuring logging for streamlined operations.

### **Usage Notes**

- **Supported Tasks**:  
  A total of **10 tasks** are supported for processing videos. Each task performs a specific function, and the output videos are saved with numbered filenames for easier identification. Below is the list of tasks:

  1. **Object Detection (`<OD>`)**:  
     Detects objects and draws bounding boxes around them.  
     **Example**:  
     ```python
     task = "<OD>"
     text = ""
     ```

  2. **Caption to Phrase Grounding (`<CAPTION_TO_PHRASE_GROUNDING>`)**:  
     Highlights specific objects (e.g., "person") using bounding boxes.  
     **Example**:  
     ```python
     task = "<CAPTION_TO_PHRASE_GROUNDING>"
     text = "person"
     ```

  3. **Region Proposal (`<REGION_PROPOSAL>`)**:  
     Identifies and marks object regions with bounding boxes, also displaying object counts.  
     **Example**:  
     ```python
     task = "<REGION_PROPOSAL>"
     text = ""
     ```

  4. **Open Vocabulary Detection (`<OPEN_VOCABULARY_DETECTION>`)**:  
     Detects objects, draws bounding boxes, and overlays captions describing the objects.  
     **Example**:  
     ```python
     task = "<OPEN_VOCABULARY_DETECTION>"
     text = "person"
     ```

  5. **Detailed Caption (`<DETAILED_CAPTION>`)**:  
     Adds detailed captions describing the scene in each frame of the video.  
     **Example**:  
     ```python
     task = "<DETAILED_CAPTION>"
     text = ""
     ```

  6. **More Detailed Caption (`<MORE_DETAILED_CAPTION>`)**:  
     Adds a more detailed caption, offering finer scene descriptions in the video.  
     **Example**:  
     ```python
     task = "<MORE_DETAILED_CAPTION>"
     text = ""
     ```

  7. **OCR with Region (`<OCR_WITH_REGION>`)**:  
     Extracts text regions and overlays OCR results on the video with bounding boxes.  
     **Example**:  
     ```python
     task = "<OCR_WITH_REGION>"
     text = ""
     ```

  8. **OCR (`<OCR>`)**:  
     Recognizes text in the video frames and overlays the recognized text directly.  
     **Example**:  
     ```python
     task = "<OCR>"
     text = ""
     ```

  9. **Region to Segmentation (`<REGION_TO_SEGMENTATION>`)**:  
     Segments regions of interest in the video, highlighting them minimally. This task primarily outputs subtle visual cues and is mainly experimental.  
     Still not sure why it is not showing any changes on the video.
     **Example**:  
     ```python
     task = "<REGION_TO_SEGMENTATION>"
     text = "person"
     ```

  10. **Referring Expression Segmentation (`<REFERRING_EXPRESSION_SEGMENTATION>`)**:  
      Segments a specified object (e.g., "person") and highlights it with a mask. This task can take significantly longer processing times.  
      **Example**:  
      ```python
      task = "<REFERRING_EXPRESSION_SEGMENTATION>"
      text = "person"
      ```

- **Processing Time**:  
  The estimated processing time for each task is included in the task list. The script logs the actual time taken for processing each task after completion.

- **Customizing Frame Processing**:  
  Modify the `frame_step` parameter in the `process_video` function to process fewer frames for faster results. For example:  
  ```python
  frame_step = 5  # Processes every 5th frame.
  ```

- **Output Filenames**:  
  The output videos are saved with a descriptive filename pattern:  
  `"<task_number>_<task_description>_<input_video_name>.mp4"`  
  Example:  
  For task 1 (`<OD>`), processing an input video named `hot_air_balloons.mp4` generates:  
  `01_object_detection_hot_air_balloons.mp4`

- **Viewing Output**:  
  Use the `display_video` function to view processed videos directly in the Jupyter notebook. For example:  
  ```python
  display_video(output_video_path)
  ```

## Code START: 
Start Time

In [1]:
import time

# Record the start time
start_time = time.time()
print("Notebook execution started.")


Notebook execution started.


In [2]:
# Import necessary libraries
import os
import cv2
import torch
from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image, ImageDraw, ImageFont
import supervision as sv
import numpy as np
from IPython.display import HTML
import base64
import logging
import warnings
from tqdm import tqdm  # For the progress bar

# Set HOME directory
HOME = os.getcwd()

warnings.filterwarnings("ignore")

# Configure logging to write to a file and log important information
logging.basicConfig(filename='video_processing.log', level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)


2024-12-20 03:13:16.087473: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-12-20 03:13:16.095359: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1734660796.105493  980309 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1734660796.108468  980309 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-20 03:13:16.118515: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instr

### Function: `initialize_model`  
Initializes the Florence-2 model and processor, automatically selecting GPU or CPU for execution.

In [3]:
def initialize_model(checkpoint="microsoft/Florence-2-large-ft", device=None):
    """
    Initialize the Florence-2 model and processor.

    Parameters:
    - checkpoint: The model checkpoint to use.
    - device: The device to run the model on.

    Returns:
    - model: The initialized model.
    - processor: The initialized processor.
    - device: The device used.
    """
    if device is None:
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    logger.info(f"Using device: {device}")

    # Load the model and processor
    model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
    processor = AutoProcessor.from_pretrained(checkpoint, trust_remote_code=True)

    return model, processor, device


### Function: `process_frame`  
Processes a single video frame using the Florence-2 model, applying tasks such as object detection, captioning, segmentation, or OCR, and annotates the frame accordingly.

In [4]:
def process_frame(frame, model, processor, device, task, text):
    """
    Process a single video frame with the specified task.

    Parameters:
    - frame: The video frame to process.
    - model: The initialized model.
    - processor: The initialized processor.
    - device: The device to run the model on.
    - task: The task to perform (e.g., "<OD>", "<DETAILED_CAPTION>", etc.).
    - text: The text input for the task.

    Returns:
    - frame_bgr: The processed frame in BGR format.
    """
    # Convert the frame to RGB and then to a PIL Image
    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    image = Image.fromarray(frame_rgb)

    # Preprocess the input for the model
    prompt = task + text
    inputs = processor(text=prompt, images=image, return_tensors="pt").to(device)

    # Generate results
    generated_ids = model.generate(
        input_ids=inputs["input_ids"],
        pixel_values=inputs["pixel_values"],
        max_new_tokens=1024,
        num_beams=3
    )

    # Post-process the generated text
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
    response = processor.post_process_generation(generated_text, task=task, image_size=image.size)

    # Handle different tasks based on the response
    if task in ["<OD>", "<OPEN_VOCABULARY_DETECTION>", "<CAPTION_TO_PHRASE_GROUNDING>", "<REGION_PROPOSAL>"]:
        # Tasks that output bounding boxes
        detections = sv.Detections.from_lmm(sv.LMM.FLORENCE_2, response, resolution_wh=image.size)
        bounding_box_annotator = sv.BoundingBoxAnnotator(color_lookup=sv.ColorLookup.INDEX)
        label_annotator = sv.LabelAnnotator(color_lookup=sv.ColorLookup.INDEX)
        image = bounding_box_annotator.annotate(image, detections)
        image = label_annotator.annotate(image, detections)

    elif task == "<REFERRING_EXPRESSION_SEGMENTATION>":
        # Task that outputs segmentation masks
        detections = sv.Detections.from_lmm(sv.LMM.FLORENCE_2, response, resolution_wh=image.size)
        mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)
        label_annotator = sv.LabelAnnotator(color_lookup=sv.ColorLookup.INDEX)
        image = mask_annotator.annotate(image, detections)
        image = label_annotator.annotate(image, detections)

    elif task in ["<OCR_WITH_REGION>"]:
        # Task that outputs OCR results with regions
        detections = sv.Detections.from_lmm(sv.LMM.FLORENCE_2, response, resolution_wh=image.size)
        bounding_box_annotator = sv.BoundingBoxAnnotator(color_lookup=sv.ColorLookup.INDEX)
        label_annotator = sv.LabelAnnotator(color_lookup=sv.ColorLookup.INDEX, text_scale=1.5, text_thickness=2)
        image = bounding_box_annotator.annotate(image, detections)
        image = label_annotator.annotate(image, detections)

    elif task in ["<OCR>", "<DETAILED_CAPTION>", "<MORE_DETAILED_CAPTION>"]:
        # Tasks that output text captions
        caption = response.get(task, "")
        # Split the caption into lines with a maximum of 10 words per line
        words = caption.split()
        caption_with_line_breaks = '\n'.join([' '.join(words[i:i+10]) for i in range(0, len(words), 10)])
        draw = ImageDraw.Draw(image)
        font = ImageFont.load_default()
        text_position = (10, 10)
        draw.text(text_position, caption_with_line_breaks, fill="red", font=font)

    else:
        # For any other tasks, we can print the response or handle accordingly
        logger.warning(f"Unhandled task or output format for task: {task}")

    # Convert back to OpenCV format (BGR) for saving the video
    frame_bgr = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)

    return frame_bgr


### Function: `process_video`  
Processes a video frame by frame using the Florence-2 model, applying a specified task (e.g., object detection, captioning). Outputs an annotated video, with options to skip frames for faster processing.

In [5]:
def process_video(input_video_path, output_video_path, model, processor, device, task, text, frame_step=1):
    """
    Process a video with the specified task.

    Parameters:
    - input_video_path: Path to the input video file.
    - output_video_path: Path to save the processed video.
    - model: The initialized model.
    - processor: The initialized processor.
    - device: Device to run the model on.
    - task: The task to perform (e.g., "<OD>", "<DETAILED_CAPTION>", etc.).
    - text: The text input for the task.
    - frame_step: Process every Nth frame (default is 1, i.e., process every frame).
    """
    cap = cv2.VideoCapture(input_video_path)
    if not cap.isOpened():
        logger.error(f"Error opening video file {input_video_path}")
        return

    frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    logger.info(f"Total number of frames: {frame_count}")

    # Get video properties
    fps = cap.get(cv2.CAP_PROP_FPS)
    width  = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

    # Prepare to save the processed video
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')  # Codec for MP4
    out = cv2.VideoWriter(output_video_path, fourcc, fps / frame_step, (width, height))

    # Process frames
    frame_idx = 0
    processed_frames = 0

    # Use tqdm to visualize progress
    with tqdm(total=frame_count, desc=f"Processing {task}", leave=True) as pbar:
        while True:
            ret, frame = cap.read()
            if not ret:
                break

            if frame_idx % frame_step == 0:
                logger.info(f"Processing frame {frame_idx+1}/{frame_count}")
                processed_frame = process_frame(frame, model, processor, device, task, text)
                processed_frames += 1
            else:
                processed_frame = frame  # Use the original frame if not processing

            # Write the frame to the output video
            out.write(processed_frame)
            frame_idx += 1

            # Update the tqdm progress bar
            pbar.update(1)

    # Release video capture and writer objects
    cap.release()
    out.release()
    logger.info(f"Processed video saved to: {output_video_path}")
    logger.info(f"Total processed frames: {processed_frames}")


### Function: `display_video`  
Displays a video directly in the notebook with adjustable width for visualization.

In [6]:
def display_video(video_path, video_width=600):
    """
    Display a video inside the notebook.

    Parameters:
    - video_path: Path to the video file.
    - video_width: Width of the video in the display.

    Returns:
    - HTML object to display the video.
    """
    video_file = open(video_path, "rb").read()
    data_url = "data:video/mp4;base64," + base64.b64encode(video_file).decode()
    return HTML(f"""
    <video width={video_width} controls>
          <source src="{data_url}" type="video/mp4">
    </video>
    """)


### Code: Initialize Model and Processor  
Initializes the Florence-2 model and processor and sets the execution device (GPU or CPU).

In [7]:
# Initialize model and processor
model, processor, device = initialize_model()


Florence2LanguageForConditionalGeneration has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.


### Code: Check and Download Video (Optional)
Checks if the video file already exists in the `data` directory. If not, downloads the video from the specified URL and logs the progress. Skips downloading if the file is already present.

In [8]:
import os
from urllib.parse import urlparse

# Define generic variables for data directory, video URL, and desired video name
videos_to_process_dir = "data/videos_to_process"
processed_dir = "data/processed_videos"
video_url = "https://videos.pexels.com/video-files/3015510/3015510-hd_1920_1080_24fps.mp4"
desired_video_name = "hot_air_balloons.mp4"

# Create data directory if it doesn't exist
data_dir = os.path.join(HOME, videos_to_process_dir)
os.makedirs(data_dir, exist_ok=True)

# Define the final path for the video after renaming
input_video_path = os.path.join(data_dir, desired_video_name)

# Check if the video already exists
if not os.path.exists(input_video_path):
    logger.info(f"Downloading the video from {video_url}...")
    temp_video_path = os.path.join(data_dir, os.path.basename(urlparse(video_url).path))
    
    # Download the video to a temporary path
    os.system(f"wget -q {video_url} -O {temp_video_path}")
    logger.info("Download complete.")
    
    # Rename to the desired name
    os.rename(temp_video_path, input_video_path)
    logger.info(f"Video renamed to {desired_video_name}.")
else:
    logger.info("Video already exists. Skipping download.")

print("Video path:", input_video_path)

Video path: /home/hamzaz/hamza/apw/Large-Vision-Models/data/videos_to_process/hot_air_balloons.mp4


### Code: Process Video for Multiple Tasks  
Iterates through a predefined list of tasks, processes the video for each task using the Florence-2 model, and saves the output with task-specific filenames. Allows optional display of processed videos within the notebook.

In [9]:
# Define a list of tasks and corresponding texts
# Supported tasks list
tasks = [
    # Object Detection and Proposal Tasks:
    # 1. Object Detection - works: good object detection and bounding boxes.
    {"task": "<OD>", "text": "", "time_minutes": 5, "description": "object_detection"},

    # 2. Caption to Phrase Grounding - works: detects the object and highlights using bounding box.
    {"task": "<CAPTION_TO_PHRASE_GROUNDING>", "text": "person", "time_minutes": 6, "description": "caption_to_phrase"},

    # 3. Region Proposal - works: draws bounding boxes around objects and puts count of objects.
    {"task": "<REGION_PROPOSAL>", "text": "", "time_minutes": 7, "description": "region_proposal"},

    # 4. Open Vocabulary Detection - works: draws bounding boxes and puts captions on the object.
    {"task": "<OPEN_VOCABULARY_DETECTION>", "text": "person", "time_minutes": 8, "description": "open_vocab_detection"},

    # Captioning Tasks:
    # 5. Detailed Caption - works: puts caption on the video.
    {"task": "<DETAILED_CAPTION>", "text": "", "time_minutes": 10, "description": "detailed_caption"},

    # 6. More Detailed Caption - works: puts a more detailed caption on the video.
    {"task": "<MORE_DETAILED_CAPTION>", "text": "", "time_minutes": 10, "description": "more_detailed_caption"},

    # Text Recognition (OCR) Tasks:
    # 7. OCR with Region - works: puts OCR text on the video with bounding boxes.
    {"task": "<OCR_WITH_REGION>", "text": "", "time_minutes": 4, "description": "ocr_with_region"},

    # 8. OCR - works: performs OCR without specific region marking.
    {"task": "<OCR>", "text": "", "time_minutes": 10, "description": "ocr"},

    # Segmentation and Highlighting Tasks:
    # 9. Region to Segmentation - works: does minimal segmentation on the video.
    # Also takes a long time to process.
    {"task": "<REGION_TO_SEGMENTATION>", "text": "person", "time_minutes": 10, "description": "region_to_segmentation"},

    # 10. Referring Expression Segmentation - works: takes a long time. 
    # Does segmentation of the object, highlighting it with a purple mask. not very accurate (50min for 19sec video).
    {"task": "<REFERRING_EXPRESSION_SEGMENTATION>", "text": "person", "time_minutes": 50, "description": "referring_expression_segmentation"}
]

# videos_to_process_dir = "data/UCF-101"
# processed_dir = "data/processed_videos"

# print("Videos to process directory:", videos_to_process_dir)
# print("Processed videos directory:", processed_dir)


In [10]:
import os
import time
import cv2  # For frame count and resolution
import csv  # For writing CSV files
import random  # To select random files from subdirectories

# Directories for input videos and processed videos
videos_to_process_dir = "data/UCF-101"
processed_dir = "data/processed_videos_ucf101"

print("Videos to process directory:", videos_to_process_dir)
print("Processed videos directory:", processed_dir)

# Define the CSV file for storing analytics
csv_file = "video_processing_analytics_ucf.csv"

# Initialize the CSV file with headers if it doesn't exist
if not os.path.exists(csv_file):
    with open(csv_file, mode="w", newline="") as file:
        writer = csv.writer(file)
        writer.writerow(["Video Name", "Frame Count", "Resolution", "Task Index", "Task Name", "Time Taken (minutes)", "Total Time (minutes)"])

# Function to recursively find video files in UCF101 directory
def get_videos_from_ucf101(root_dir, max_videos_per_class=3):
    video_files = []
    for root, _, files in sorted(os.walk(root_dir)):  # Ensure directories are sorted
        # Filter for video files with specific extensions
        video_files_in_dir = sorted([os.path.join(root, f) for f in files if f.lower().endswith(('.mp4', '.avi', '.mov', '.mkv'))])
        
        if video_files_in_dir:
            # Randomly select a subset of videos from this directory
            selected_videos = random.sample(video_files_in_dir, min(len(video_files_in_dir), max_videos_per_class))
            video_files.extend(selected_videos)

    return video_files

# Get the list of all video files from UCF-101
video_files = get_videos_from_ucf101(videos_to_process_dir, max_videos_per_class=3)
total_videos = len(video_files)
processed_videos = 0

print(f"Total videos found: {total_videos}")
logging.info(f"Total videos found: {total_videos}")

# Loop through each video in the UCF-101 dataset
for video_file in video_files:
    try:
        input_video_path = video_file
        relative_path = os.path.relpath(input_video_path, videos_to_process_dir)  # Get relative path from root directory
        print(f"Processing video: {relative_path} ({processed_videos + 1}/{total_videos})")
        logging.info(f"Processing video: {relative_path} ({processed_videos + 1}/{total_videos})")

        # Gather video details (frame count and resolution)
        cap = cv2.VideoCapture(input_video_path)
        total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
        width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
        resolution = f"{width}x{height}"
        cap.release()

        logging.info(f"Video details - Name: {relative_path}, Total Frames: {total_frames}, Resolution: {resolution}")

        # Create a directory for the current video inside the processed directory, mirroring the UCF-101 folder structure
        base_video_name = os.path.splitext(os.path.basename(video_file))[0]  # Strip the file extension
        video_output_dir = os.path.join(processed_dir, os.path.dirname(relative_path))
        os.makedirs(video_output_dir, exist_ok=True)  # Create directory if not exists

        # Track time taken for each task
        video_task_times = []

        # Loop through each task and process the video
        for idx, item in enumerate(tasks, start=1):
            task = item["task"]
            text = item["text"]
            time_minutes = item["time_minutes"]
            description = item["description"]

            # Output file path inside the video-specific directory
            output_video_path = os.path.join(
                video_output_dir, 
                f"{base_video_name}_{idx:02d}_{description}.mp4"
            )

            # Check if the output video already exists
            if os.path.exists(output_video_path):
                logging.info(f"Skipping Task {idx} for video {relative_path}: {task} as output video already exists at {output_video_path}")
                continue

            logging.info(f"Processing Task {idx} for video {relative_path}: {task} with text: '{text}' (Estimated time: {time_minutes} minutes)")

            # Process the video (assuming process_video is a working function)
            start_time = time.time()

            process_video(
                input_video_path=input_video_path,
                output_video_path=output_video_path,
                model=model,
                processor=processor,
                device=device,
                task=task,
                text=text,
                frame_step=1  # Process every frame; increase for faster processing
            )
            end_time = time.time()

            # Calculate and log the actual time taken
            elapsed_minutes = (end_time - start_time) / 60
            video_task_times.append((idx, task, elapsed_minutes))
            logging.info(f"Task {idx} for video {relative_path} completed in {elapsed_minutes:.2f} minutes. Output video: {output_video_path}")

        # Calculate total processing time for the video
        total_video_time = sum(task_time for _, _, task_time in video_task_times)

        # Write analytics to the CSV file
        try:
            logger.info(f"Attempting to write analytics to: {os.path.abspath(csv_file)}")
            
            if not video_task_times:
                logger.warning(f"No video task times to write for {relative_path}. Skipping CSV write.")
            
            with open(csv_file, mode="a", newline="") as file:
                writer = csv.writer(file)
                for task_idx, task_name, task_time in video_task_times:
                    row = [relative_path, total_frames, resolution, task_idx, task_name, task_time, total_video_time]
                    writer.writerow(row)
                    logger.info(f"Successfully wrote row: {row}")  # Log the data written
                file.flush()  # Ensure data is written immediately to disk
            
            logger.info(f"CSV write completed successfully for {relative_path}")

        except Exception as e:
            logger.error(f"Failed to write analytics for {relative_path}. Error: {e}")

        processed_videos += 1
        remaining_videos = total_videos - processed_videos
        logging.info(f"Completed processing video: {relative_path}. Remaining videos: {remaining_videos}/{total_videos}")
    except Exception as e:
        logging.error(f"Error processing video: {video_file}. Error: {e}")
        continue  # Continue processing next video, even if an error occurs

# Final message
logging.info(f"\nAll video analytics have been saved to {csv_file}. You can process it further in Excel or any data analysis tool.")
print(f"\nAll video analytics have been saved to {csv_file}. You can process it further in Excel or any data analysis tool.")


Videos to process directory: data/UCF-101
Processed videos directory: data/processed_videos_ucf101
Total videos found: 303
Processing video: ApplyEyeMakeup/v_ApplyEyeMakeup_g23_c05.avi (1/303)


Processing <OD>: 100%|██████████| 204/204 [00:38<00:00,  5.34it/s]
Processing <CAPTION_TO_PHRASE_GROUNDING>: 100%|██████████| 204/204 [00:26<00:00,  7.76it/s]
Processing <REGION_PROPOSAL>: 100%|██████████| 204/204 [00:33<00:00,  6.06it/s]
Processing <OPEN_VOCABULARY_DETECTION>: 100%|██████████| 204/204 [00:28<00:00,  7.08it/s]
Processing <DETAILED_CAPTION>: 100%|██████████| 204/204 [00:57<00:00,  3.54it/s]
Processing <MORE_DETAILED_CAPTION>:  60%|█████▉    | 122/204 [00:43<00:29,  2.80it/s]


KeyboardInterrupt: 

### Code: Display Processed Video  
Displays the processed video directly inside the notebook, allowing for immediate visualization of the results.

In [12]:
output_video_path = "data/videos_to_process/hot_air_balloons.mp4"

# Display the processed video inside the notebook
display_video(output_video_path)


Captions

End Time and Total Time Logging

In [19]:
# Record the end time
end_time = time.time()

# output: Notebook execution completed in 99.19 minutes.

# Calculate and log the total execution time
total_time = end_time - start_time
print(f"Notebook execution completed in {total_time/60:.2f} minutes.")


Notebook execution completed in 46.77 minutes.
