<a href="https://colab.research.google.com/github/prisar/llm-video-analysis/blob/main/nb_064.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [7]:
!pwd

/content


In [17]:
import cv2

def get_keyframes(video_path, output_dir="keyframes", frame_interval=10):
    """
    Extracts frames at a fixed interval from a video file.
    Note: This is not true keyframe extraction (I-frame detection)
    but a workaround to get a subset of frames.

    Args:
        video_path (str): Path to the video file.
        output_dir (str): Directory to save the extracted frames.
    """
    # Create output directory if it doesn't exist
    import os
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        print(f"Error: Could not open video file {video_path}")
        return

    frame_count = 0
    frame_interval = 30 # Save a frame every 30 frames as an example

    while True:
        ret, frame = cap.read()
        if not ret:
            break

        if frame_count % frame_interval == 0:
            output_path = os.path.join(output_dir, f"frame_{frame_count:06d}.png")
            cv2.imwrite(output_path, frame)
            print(f"Saved frame: {output_path}")

        frame_count += 1

    cap.release()
    print("Frame extraction finished.")

# Example usage:
# Replace 'your_video.mp4' with the path to your video file
get_keyframes('/content/video/3195394-uhd_3840_2160_25fps.mp4', 'extracted_frames')

Saved frame: extracted_frames/frame_000000.png
Saved frame: extracted_frames/frame_000030.png
Saved frame: extracted_frames/frame_000060.png
Saved frame: extracted_frames/frame_000090.png
Saved frame: extracted_frames/frame_000120.png
Saved frame: extracted_frames/frame_000150.png
Saved frame: extracted_frames/frame_000180.png
Saved frame: extracted_frames/frame_000210.png
Saved frame: extracted_frames/frame_000240.png
Saved frame: extracted_frames/frame_000270.png
Saved frame: extracted_frames/frame_000300.png
Saved frame: extracted_frames/frame_000330.png
Frame extraction finished.


# Task
Extract keyframes from a video file using clustering and summarize the video based on the keyframes using the Gemini model.

## Extract frames

### Subtask:
Extract frames from the video file. We can reuse the existing `get_keyframes` function, but we might need to adjust the frame interval or extract more frames to have enough data for clustering.


**Reasoning**:
Review the existing `get_keyframes` function, determine a suitable `frame_interval`, and call the function again with the new parameters.



In [19]:
# Reviewing the get_keyframes function:
# The function extracts frames at a fixed interval using frame_count % frame_interval == 0.
# The current frame_interval is set to 30.

# Determining a suitable frame_interval:
# To get more frames for clustering, we should decrease the frame_interval.
# Let's try a frame_interval of 10 to extract more frames.

# Calling the get_keyframes function again:
get_keyframes('/content/video/3195394-uhd_3840_2160_25fps.mp4', 'extracted_frames_for_clustering', frame_interval=10)

Saved frame: extracted_frames_for_clustering/frame_000000.png
Saved frame: extracted_frames_for_clustering/frame_000030.png
Saved frame: extracted_frames_for_clustering/frame_000060.png
Saved frame: extracted_frames_for_clustering/frame_000090.png
Saved frame: extracted_frames_for_clustering/frame_000120.png
Saved frame: extracted_frames_for_clustering/frame_000150.png
Saved frame: extracted_frames_for_clustering/frame_000180.png
Saved frame: extracted_frames_for_clustering/frame_000210.png
Saved frame: extracted_frames_for_clustering/frame_000240.png
Saved frame: extracted_frames_for_clustering/frame_000270.png
Saved frame: extracted_frames_for_clustering/frame_000300.png
Saved frame: extracted_frames_for_clustering/frame_000330.png
Frame extraction finished.


**Reasoning**:
The previous code block did not successfully change the frame interval because the `get_keyframes` function was called without the `frame_interval` argument. The function needs to be redefined to accept the `frame_interval` as a parameter, or the existing function needs to be called with the keyword argument. Let's redefine the function to accept the interval and call it again.



In [21]:
import cv2
import os

def get_keyframes(video_path, output_dir="keyframes", frame_interval=30):
    """
    Extracts frames at a fixed interval from a video file.
    Note: This is not true keyframe extraction (I-frame detection)
    but a workaround to get a subset of frames.

    Args:
        video_path (str): Path to the video file.
        output_dir (str): Directory to save the extracted frames.
        frame_interval (int): Interval between saved frames.
    """
    # Create output directory if it doesn't exist
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        print(f"Error: Could not open video file {video_path}")
        return

    frame_count = 0

    while True:
        ret, frame = cap.read()
        if not ret:
            break

        if frame_count % frame_interval == 0:
            output_path = os.path.join(output_dir, f"frame_{frame_count:06d}.png")
            cv2.imwrite(output_path, frame)
            print(f"Saved frame: {output_path}")

        frame_count += 1

    cap.release()
    print("Frame extraction finished.")

# Call the function with a smaller frame interval
get_keyframes('/content/video/3195394-uhd_3840_2160_25fps.mp4', 'extracted_frames_for_clustering', frame_interval=10)

Saved frame: extracted_frames_for_clustering/frame_000000.png
Saved frame: extracted_frames_for_clustering/frame_000010.png
Saved frame: extracted_frames_for_clustering/frame_000020.png
Saved frame: extracted_frames_for_clustering/frame_000030.png
Saved frame: extracted_frames_for_clustering/frame_000040.png
Saved frame: extracted_frames_for_clustering/frame_000050.png
Saved frame: extracted_frames_for_clustering/frame_000060.png
Saved frame: extracted_frames_for_clustering/frame_000070.png
Saved frame: extracted_frames_for_clustering/frame_000080.png
Saved frame: extracted_frames_for_clustering/frame_000090.png
Saved frame: extracted_frames_for_clustering/frame_000100.png
Saved frame: extracted_frames_for_clustering/frame_000110.png
Saved frame: extracted_frames_for_clustering/frame_000120.png
Saved frame: extracted_frames_for_clustering/frame_000130.png
Saved frame: extracted_frames_for_clustering/frame_000140.png
Saved frame: extracted_frames_for_clustering/frame_000150.png
Saved fr

## Feature extraction

### Subtask:
Extract features from the extracted frames. These features will be used for clustering. We can use pre-trained models like those available in libraries like TensorFlow or PyTorch to get meaningful representations of the images.


**Reasoning**:
Import necessary libraries from TensorFlow for loading a pre-trained model and processing images, and then list the extracted image paths.



In [24]:
import tensorflow as tf
import numpy as np
import os

# List of paths to the extracted image frames
image_dir = 'extracted_frames_for_clustering'
image_paths = [os.path.join(image_dir, f) for f in os.listdir(image_dir) if f.endswith('.png')]
image_paths.sort() # Ensure consistent order
print(f"Found {len(image_paths)} images in {image_dir}")

Found 35 images in extracted_frames_for_clustering


**Reasoning**:
Load a pre-trained CNN model, remove the classification layer, and then iterate through the image paths to load, preprocess, and extract features from each image.



In [26]:
# Load a pre-trained model (e.g., ResNet50)
# Include top=False to exclude the final classification layer
base_model = tf.keras.applications.ResNet50(weights='imagenet', include_top=False, pooling='avg')

# Create a model that outputs the features
feature_extractor = tf.keras.Model(inputs=base_model.input, outputs=base_model.output)

# Store the extracted features
extracted_features = []

# Iterate through the list of image paths
for img_path in image_paths:
    # Load the image
    img = tf.keras.preprocessing.image.load_img(img_path, target_size=(224, 224)) # ResNet50 input size
    img_array = tf.keras.preprocessing.image.img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0) # Add batch dimension

    # Preprocess the image
    img_array = tf.keras.applications.resnet50.preprocess_input(img_array)

    # Extract features
    features = feature_extractor.predict(img_array)
    extracted_features.append(features[0]) # Remove batch dimension

# Convert the list of features to a NumPy array
extracted_features_array = np.array(extracted_features)

print(f"Extracted features shape: {extracted_features_array.shape}")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3s/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 217ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 205ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 201ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 209ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 211ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 503ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 204ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 222ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 205ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 204ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 214ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 206ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1

## Clustering

### Subtask:
Apply a clustering algorithm (e.g., K-Means) to the extracted features to group similar frames together.


**Reasoning**:
Apply the KMeans clustering algorithm to the extracted features to group similar frames.



In [27]:
from sklearn.cluster import KMeans

# Determine a suitable number of clusters (e.g., 7)
n_clusters = 7

# Instantiate KMeans with the chosen number of clusters and a random state for reproducibility
kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)

# Fit the KMeans model to the extracted features
kmeans.fit(extracted_features_array)

# Store the cluster labels
cluster_labels = kmeans.labels_

print(f"Cluster labels shape: {cluster_labels.shape}")
print(f"Sample cluster labels: {cluster_labels[:10]}")

Cluster labels shape: (35,)
Sample cluster labels: [2 2 5 5 5 5 5 5 0 0]


## Keyframe selection

### Subtask:
Select representative frames (keyframes) from each cluster. This could be the centroid of the cluster or a frame closest to the centroid.


**Reasoning**:
Create a dictionary to group frame indices by cluster, find the frame closest to the centroid for each cluster using Euclidean distance, store the indices of these keyframes, and retrieve their file paths.



In [29]:
import numpy as np
from scipy.spatial.distance import euclidean

# 1. Create a dictionary to store frame indices by cluster
cluster_frame_indices = {}
for i, label in enumerate(cluster_labels):
    if label not in cluster_frame_indices:
        cluster_frame_indices[label] = []
    cluster_frame_indices[label].append(i)

# 2. Find the index of the frame closest to the centroid for each cluster
keyframe_indices = []
for cluster_label, indices in cluster_frame_indices.items():
    cluster_features = extracted_features_array[indices]
    centroid = kmeans.cluster_centers_[cluster_label]

    # Calculate distances from each frame in the cluster to the centroid
    distances = [euclidean(features, centroid) for features in cluster_features]

    # Find the index of the frame with the minimum distance within the cluster
    closest_frame_index_in_cluster = np.argmin(distances)

    # Get the original index of the keyframe
    original_keyframe_index = indices[closest_frame_index_in_cluster]
    keyframe_indices.append(original_keyframe_index)

# 3. Store the indices of the selected keyframes (already done in the previous step)

# 4. Retrieve the file paths of the selected keyframes
keyframe_paths = [image_paths[i] for i in keyframe_indices]

print("Selected Keyframe Indices:", keyframe_indices)
print("Selected Keyframe Paths:", keyframe_frame_paths)

Selected Keyframe Indices: [0, 4, 10, 17, 21, 25, 32]


NameError: name 'keyframe_frame_paths' is not defined

**Reasoning**:
Correct the variable name typo in the print statement to display the `keyframe_paths`.



In [31]:
import numpy as np
from scipy.spatial.distance import euclidean

# 1. Create a dictionary to store frame indices by cluster
cluster_frame_indices = {}
for i, label in enumerate(cluster_labels):
    if label not in cluster_frame_indices:
        cluster_frame_indices[label] = []
    cluster_frame_indices[label].append(i)

# 2. Find the index of the frame closest to the centroid for each cluster
keyframe_indices = []
for cluster_label, indices in cluster_frame_indices.items():
    cluster_features = extracted_features_array[indices]
    centroid = kmeans.cluster_centers_[cluster_label]

    # Calculate distances from each frame in the cluster to the centroid
    distances = [euclidean(features, centroid) for features in cluster_features]

    # Find the index of the frame with the minimum distance within the cluster
    closest_frame_index_in_cluster = np.argmin(distances)

    # Get the original index of the keyframe
    original_keyframe_index = indices[closest_frame_index_in_cluster]
    keyframe_indices.append(original_keyframe_index)

# 3. Store the indices of the selected keyframes (already done in the previous step)

# 4. Retrieve the file paths of the selected keyframes
keyframe_paths = [image_paths[i] for i in keyframe_indices]

print("Selected Keyframe Indices:", keyframe_indices)
print("Selected Keyframe Paths:", keyframe_paths)

Selected Keyframe Indices: [0, 4, 10, 17, 21, 25, 32]
Selected Keyframe Paths: ['extracted_frames_for_clustering/frame_000000.png', 'extracted_frames_for_clustering/frame_000040.png', 'extracted_frames_for_clustering/frame_000100.png', 'extracted_frames_for_clustering/frame_000170.png', 'extracted_frames_for_clustering/frame_000210.png', 'extracted_frames_for_clustering/frame_000250.png', 'extracted_frames_for_clustering/frame_000320.png']


## Summarization with gemini

### Subtask:
Use the selected keyframes as input for the Gemini model to generate a summary of the video. This might involve describing the content of the keyframes and inferring the overall narrative or key events.


**Reasoning**:
Import the necessary libraries for interacting with the Gemini model and authenticate and initialize the Gemini model.



In [36]:
import google.generativeai as genai
import os
from PIL import Image

# Load the API key from environment variables or a secrets management system
# Replace with your actual method for securing and loading the API key
# Example using environment variable:
# from google.colab import userdata
# GEMINI_API_KEY = userdata.get('GEMINI_API_KEY')

# Assuming the API key is stored as an environment variable named 'GEMINI_API_KEY'
# In a real application, use a more secure way to handle API keys
try:
    GEMINI_API_KEY = os.environ.get('GEMINI_API_KEY') or 'AIzaSyCsPiuuQUQDiUQcSYG4X5Q-EGx6rfhslCk'
    if not GEMINI_API_KEY:
         raise ValueError("GEMINI_API_KEY not found in environment variables.")
except Exception as e:
    print(f"Error loading API key: {e}")
    # If running in Colab, you might use:
    # from google.colab import userdata
    # GEMINI_API_KEY = userdata.get('GEMINI_API_KEY')
    # If the above fails, the API key needs to be set up correctly.
    GEMINI_API_KEY = None # Set to None if API key loading fails


if GEMINI_API_KEY:
    genai.configure(api_key=GEMINI_API_KEY)
    # Initialize the Gemini model (e.g., 'gemini-pro-vision')
    # Use a model suitable for multimodal input (text and images)
    try:
        model = genai.GenerativeModel('gemini-1.5-flash')
        print("Gemini model initialized successfully.")
    except Exception as e:
        print(f"Error initializing Gemini model: {e}")
        model = None
else:
    print("Gemini model initialization skipped due to missing API key.")
    model = None

Gemini model initialized successfully.


**Reasoning**:
Prepare the input for the Gemini model by loading the keyframe images and creating a text prompt, then call the model to generate the summary.



In [38]:
if model:
    # Prepare the input for the Gemini model
    # Load the keyframe images
    keyframe_images = [Image.open(path) for path in keyframe_paths]

    # Create a text prompt
    prompt = "Based on these keyframes, please provide a summary of the video."

    # Combine prompt and images as input
    # The input format for multimodal models is typically a list containing text and image objects
    input_content = [prompt] + keyframe_images

    # Call the Gemini model to generate the summary
    try:
        response = model.generate_content(input_content)

        # Store the generated summary
        video_summary = response.text
        print("Video summary generated successfully.")
        print("\n--- Video Summary ---")
        print(video_summary)

    except Exception as e:
        print(f"Error generating video summary: {e}")
        video_summary = "Error generating summary."
else:
    video_summary = "Gemini model not initialized, summary generation skipped."
    print(video_summary)

Video summary generated successfully.

--- Video Summary ---
The video shows a scientist's gloved hand using a pipette to add a light green liquid to a series of small, clear test tubes contained in a plastic rack. The test tubes already contain a reddish-brown substance.  The focus is on the precise dispensing of the liquid into the tubes.  The video likely depicts a scientific experiment or process involving the precise mixing of chemicals or substances.



## Summary:

### Data Analysis Key Findings

*   The process successfully extracted frames from the video at a reduced interval of 10, resulting in 35 frames for analysis, compared to the default interval of 30.
*   Features were successfully extracted from all 35 frames using a pre-trained ResNet50 model, yielding a feature array of shape (35, 2048).
*   KMeans clustering was successfully applied to the extracted features with 7 clusters, and cluster labels were assigned to each frame.
*   Keyframes, defined as the frames closest to the centroid of each cluster, were successfully identified. The indices of the selected keyframes were 0, 4, 10, 17, 21, 25, and 32.
*   The Gemini model was successfully initialized and used to generate a video summary based on the selected keyframe images and a text prompt.

### Insights or Next Steps

*   The clustering approach based on visual features effectively identified diverse keyframes representing different stages or scenes in the video.
*   Experiment with different numbers of clusters or alternative clustering algorithms to potentially capture different aspects of the video's content and improve the summary's focus.
