![](https://cdn.analyticsvidhya.com/wp-content/uploads/2023/12/final_keyword_header.width-1600.format-webp.webp)

<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #17191e                                ;
    border-radius: 2px;
    color :  #FFFAEC  ;        
    border: 2px solid #FFF6D8;">
  
### Workflow :
1. Download a video from any link (Can also use it by loading it as a dataset folder)
2. Extract Frames from the video
3. Extract key-frames from the video based on structural similarity (adjustable threshold)
4. Pass the extracted key frames as images along with intructions and prompts to the Gemini Pro Vision model.
5. Gemini performs excellently on the given task

In [1]:
# Get the API key from here: https://ai.google.dev/tutorials/setup
# Create a new secret called "GEMINI_API_KEY" via Add-ons -> Secrets in the top menu, and attach it to this notebook.
from IPython.display import display
from IPython.display import Markdown

import pathlib
import textwrap

apiKey = ""

def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #F9F1FF                                ;
    border-radius: 2px;      
    border: 2px solid #FFF6D8;">

- Gemini's API is freely accessible as of now with limited capabilities, get your own API key from this link : https://ai.google.dev/tutorials/setup
- Paste the link named `API_KEY` in the `Secret-key` in the `Add-ons` menu above.

<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #17191e                                ;
    border-radius: 2px;
    color :  #FFFAEC  ;        
    border: 2px solid #FFF6D8;">
    
    
- [Pytube](https://pytube.io/en/latest/) is being used to download one of a youtube video (its highest resolution, i.e. mp4)
- Link is passed in the Youtube Object that saves the video to kaggle directory

In [2]:
!pip install pytube
from pytube import YouTube

YouTube('https://youtube.com/shorts/pEiri8rHpYs?si=gWkKd3tlaMqgSuY0').streams.get_highest_resolution().download()




'/workspaces/CodesWithGemini/HOW TO ORGANISE NOTES shorts.mp4'

<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #F9F1FF                                ;
    border-radius: 2px;       
    border: 2px solid #FFF6D8;">

Importing Required Libraries

In [3]:
%pip install opencv-python

Note: you may need to restart the kernel to use updated packages.


In [4]:
%pip install pandas numpy opencv-python matplotlib tqdm


Note: you may need to restart the kernel to use updated packages.


In [5]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
from glob import glob
import IPython.display as ipd
from tqdm import tqdm
import subprocess

<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #17191e                                ;
    border-radius: 2px;
    color :  #FFFAEC  ;        
    border: 2px solid #FFF6D8;">
Checking the file name of the only file in the directory

In [6]:
file_name = !ls
type(str(file_name))
file_name[0]

'HOW TO ORGANISE NOTES shorts.mp4'

<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #F9F1FF                                ;
    border-radius: 2px;      
    border: 2px solid #FFF6D8;">
    
- The youtube video, if not in mp4 can be converted to mp4 using [subprocess](https://docs.python.org/3/library/subprocess.html) module , for further processing
- Filename of the converted video set to : `mp4_converted_video.mp4`

<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #17191e                                ;
    border-radius: 2px;
    color :  #FFFAEC  ;        
    border: 2px solid #FFF6D8;">
    
- Playing the video

In [7]:
ipd.Video('mp4_converted_video.mp4', width=200,embed= True)

<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #17191e                                ;
    border-radius: 2px;
    color :  #FFFAEC  ;        
    border: 2px solid #FFF6D8;">

- Using [OpenCV](https://opencv.org/) in Python to create a VideoCapture object to read frames from a video file named 'mp4_converted_video.mp4'
    

In [None]:
import cv2

In [None]:
cap = cv2.VideoCapture('HOW TO ORGANISE NOTES shorts.mp4')

In [None]:
# Total number of frames in video
cap.get(cv2.CAP_PROP_FRAME_COUNT)

1780.0

In [None]:
# Video height and width
height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)
width = cap.get(cv2.CAP_PROP_FRAME_WIDTH)
print(f'Height {height}, Width {width}')

Height 1280.0, Width 720.0


In [None]:
# Get frames per second
fps = cap.get(cv2.CAP_PROP_FPS)
print(f'FPS : {fps:0.2f}')  

FPS : 30.00


In [None]:
cap = cv2.VideoCapture('HOW TO ORGANISE NOTES shorts.mp4')
ret, img = cap.read()
print(f'Returned {ret} and img of shape {img.shape}')

Returned True and img of shape (1280, 720, 3)


<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #17191e                                ;
    border-radius: 2px;
    color :  #FFFAEC  ;        
    border: 2px solid #FFF6D8;">

- Plotting some of the frames to visualise the images
    
    

In [None]:
fig, axs = plt.subplots(3, 5, figsize=(30, 20))
axs = axs.flatten()

cap = cv2.VideoCapture("mp4_converted_video.mp4")
n_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

img_idx = 0
for frame in range(n_frames):
    ret, img = cap.read()
    if frame == 1500:
        break
    if frame % 100 == 0:
        axs[img_idx].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
        axs[img_idx].set_title(f'Frame: {frame}')
        axs[img_idx].axis('off')
        img_idx += 1

plt.tight_layout()
plt.show()
cap.release()

<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #F9F1FF ;                               ;
    border-radius: 2px;       
    border: 2px solid #FFF6D8;">
    
- [SSIM](https://en.wikipedia.org/wiki/Structural_similarity#:~:text=The%20structural%20similarity%20index%20measure,of%20digital%20images%20and%20videos.) is used for measuring the similarity between two images : [scikit-image](https://scikit-image.org/docs/stable/auto_examples/transform/plot_ssim.html) 
- Defining a threshold for similarity index
- This is not the only /best way to extract the key frames. Try using different approaches for better performance
- The selected frames are being stored in an output directory named `selected_frames`

In [None]:
from skimage.metrics import structural_similarity as ssim
import os

# Opening the video file
cap = cv2.VideoCapture("mp4_converted_video.mp4")
n_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

# Creating a directory to save the selected frames
output_directory = 'selected_frames'
os.makedirs(output_directory, exist_ok=True)

selected_frames = []
previous_frame = None
threshold = 0.5  # threshold 

for frame_idx in tqdm(range(n_frames), desc="Processing Frames"):
    ret, img = cap.read()

    if not ret:
        break

    # Splitting the frame into RGB channels
    b, g, r = cv2.split(img)

    if previous_frame is not None:
        # Structural Similarity Index (SSI) for each channel
        ssim_b, _ = ssim(previous_frame[0], b, full=True)
        ssim_g, _ = ssim(previous_frame[1], g, full=True)
        ssim_r, _ = ssim(previous_frame[2], r, full=True)

        # Combining the SSIM scores from each channel
        similarity_index = (ssim_b + ssim_g + ssim_r) / 3

        # If frames are distinct enough, then only adding the current frame to the selected frames
        if similarity_index < threshold:
            selected_frames.append(img)

            # Saving the selected frame to the output directory
            frame_filename = os.path.join(output_directory, f"frame_{frame_idx:04d}.png")
            cv2.imwrite(frame_filename, img)

    previous_frame = cv2.split(img)

# Releasing the video capture object to free the space captured
cap.release()


In [None]:
print(f'Total key frames based on the threshold chosen : {len(selected_frames)}')

In [None]:
# Selected frames display
fig, axs = plt.subplots(1, len(selected_frames), figsize=(30, 10))

for i, frame in enumerate(selected_frames):
    axs[i].imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    axs[i].set_title(f'Selected Frame {i}')
    axs[i].axis('off')

plt.show()

<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #F9F1FF                                ;
    border-radius: 2px;      
    border: 2px solid #FFF6D8;">
    
The grayscale images are stored in `selected_frames` directory in png format

In [None]:
os.listdir('selected_frames')

<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #17191e                                ;
    border-radius: 2px;
    color :  #FFFAEC  ;        
    border: 2px solid #FFF6D8;">
    
- Configuring the API Key based on genai module

In [None]:
import google.generativeai as genai

genai.configure(api_key = apiKey)

<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #F9F1FF                                ;
    border-radius: 2px;       
    border: 2px solid #FFF6D8;">
    
Converting the images into PIL format for passing to the model

In [None]:
import PIL.Image
images = []
for i in os.listdir('selected_frames') : 
    img = PIL.Image.open(f'selected_frames/{i}')
    images.append(img)

<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #F9F1FF  ;                           ;
    border-radius: 2px;       
    border: 2px solid #FFF6D8;">
    
The image looks like this :

In [None]:
images[0]

<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #F9F1FF                                ;
    border-radius: 2px;      
    border: 2px solid #FFF6D8;">
    
Generate text from image and text prompts using `gemini-pro-vision`. Calling the model :

In [None]:
model = genai.GenerativeModel('gemini-pro-vision')

<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #17191e                                ;
    border-radius: 2px;
    color :  #FFFAEC  ;        
    border: 2px solid #FFF6D8;">
    
- Passing the instructions, prompts and some saved images  to the model for evaluation

In [None]:
instructions = "Instructions: Consider the following images:"
prompt1 = "What is shown in each of the images ?"
prompt2 = """
Answer the question through these steps:
Step 1: Identify if any text is written in the images
Step 2: Identify any doodles/pictures in the images
Step 3: Grasp the collective meaning of each of the images
Step 4: What does all the images tell as a whole about the personality of the person who wrote it?
Answer and describe the steps taken:
"""
images = images[0:4]

images.insert(0, prompt2)
images.insert(0, prompt1)
images.insert(0, instructions)
display(images)

responses = model.generate_content(images)

print("-------Prompt--------")
print(images)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")
    

<div class="anchor" id="top" style="
    margin-right: auto; 
    margin-left: auto;
    padding: 10px;
   font-size : 120%;
    background-color: #F9F1FF                              ;
    border-radius: 2px;      
    border: 2px solid #FFF6D8;">
    
References : 
    
- [Gemini API Starter Notebook](https://www.kaggle.com/code/prathameshbang/gemini-api-starter-notebook/notebook)
- [Working with video in Python by Rob Mulla](https://www.kaggle.com/code/robikscube/working-with-video-in-python-youtube-tutorial/notebook)

<center><font size = 4><span style="color:#F5F5E6"> <p style="background-color:#532925;font-family:courier;color:#FFFFFF;font-size:180%;text-align:center;border-radius: 10px 5px;padding : 2px">Upvote if you find it useful 👆</p>   </span></font></center> 