# Search within videos with text

## Introduction
This notebook outlines the process of searching for specific textual information within videos and retrieving relevant video segments. To accomplish this, we utilize various libraries and techniques, such as:
* clip: A library for vision and language understanding.
* PIL: Python Imaging Library for image processing.
* torch: The PyTorch library for deep learning.

## Prerequisites

Before diving into the implementation, ensure that you have the necessary libraries installed by running the following commands:

In [5]:
# !pip install pinnacledb
!pip install ipython opencv-python pillow openai-clip

Collecting opencv-python
  Obtaining dependency information for opencv-python from https://files.pythonhosted.org/packages/05/58/7ee92b21cb98689cbe28c69e3cf8ee51f261bfb6bc904ae578736d22d2e7/opencv_python-4.8.1.78-cp37-abi3-macosx_10_16_x86_64.whl.metadata
  Using cached opencv_python-4.8.1.78-cp37-abi3-macosx_10_16_x86_64.whl.metadata (19 kB)
Collecting openai-clip
  Using cached openai_clip-1.0.1-py3-none-any.whl
Using cached opencv_python-4.8.1.78-cp37-abi3-macosx_10_16_x86_64.whl (54.7 MB)
Installing collected packages: opencv-python, openai-clip
Successfully installed openai-clip-1.0.1 opencv-python-4.8.1.78

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Connect to datastore 

First, we need to establish a connection to a MongoDB datastore via SuperDuperDB. You can configure the `MongoDB_URI` based on your specific setup. 
Here are some examples of MongoDB URIs:

* For testing (default connection): `mongomock://test`
* Local MongoDB instance: `mongodb://localhost:27017`
* MongoDB with authentication: `mongodb://pinnacle:pinnacle@mongodb:27017/documents`
* MongoDB Atlas: `mongodb+srv://<username>:<password>@<atlas_cluster>/<database>`

In [1]:
from pinnacledb import pinnacle
from pinnacledb.backends.mongodb import Collection
from pinnacledb import CFG
import os

CFG.downloads.hybrid = True
CFG.downloads.root = './'

mongodb_uri = os.getenv("MONGODB_URI","mongomock://test")
db = pinnacle(mongodb_uri, artifact_store='filesystem://./data/')

video_collection = Collection('videos')

[32m 2023-Nov-14 13:53:39.83[0m| [32m[1mSUCCESS [0m | [36mDuncans-MacBook-Pro.local[0m| [36mpinnacledb.base.build[0m:[36m69  [0m | [32m[1mInitializing DataBackend Client:  mongomock.MongoClient('localhost', 27017)[0m


## Load Dataset

We'll begin by configuring a video encoder.

In [2]:
from pinnacledb import Encoder

vid_enc = Encoder(
    identifier='video_on_file',
    load_hybrid=False,
)

db.add(vid_enc)

[]

Now, let's retrieve a sample video from the internet and insert it into our collection.

In [3]:
from pinnacledb.base.document import Document

db.execute(video_collection.insert_one(
        Document({'video': vid_enc(uri='https://pinnacledb-public.s3.eu-west-1.amazonaws.com/animals_excerpt.mp4')})
    )
)

# Display the list of videos in the collection
list(db.execute(Collection('videos').find()))

[32m 2023-Nov-14 13:53:42.29[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36mpinnacledb.misc.download[0m:[36m358 [0m | [1mfound 1 uris[0m
[32m 2023-Nov-14 13:53:42.54[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36mpinnacledb.misc.download[0m:[36m125 [0m | [1mnumber of workers 0[0m


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.10it/s]


[Document({'video': Encodable(encoder=Encoder(identifier='video_on_file', decoder=<Artifact artifact=5383d16e618f4b51a6396fa628aaf710 serializer=dill>, encoder=<Artifact artifact=c635ff89bf0c422286e0f1fd0212e25d serializer=dill>, shape=None, version=0, load_hybrid=False), x=None, uri='https://pinnacledb-public.s3.eu-west-1.amazonaws.com/animals_excerpt.mp4'), '_fold': 'train', '_id': ObjectId('65536dd6b0e451df3a649bc4')})]

## Register Encoders

Next, we'll create encoders for processing videos and extracting frames. This encoder will help us convert videos into individual frames.

In [6]:
import cv2
import tqdm
from PIL import Image
from pinnacledb.ext.pillow import pil_image
from pinnacledb import Model, Schema


def video2images(video_file):
    sample_freq = 10
    cap = cv2.VideoCapture(video_file)

    frame_count = 0

    fps = cap.get(cv2.CAP_PROP_FPS)
    print(fps)
    extracted_frames = []
    progress = tqdm.tqdm()

    while True:
        ret, frame = cap.read()
        if not ret:
            break
        current_timestamp = frame_count // fps
        
        if frame_count % sample_freq == 0:
            extracted_frames.append({
                'image': Image.fromarray(frame[:,:,::-1]),
                'current_timestamp': current_timestamp,
            })
        frame_count += 1        
        progress.update(1)
    
    cap.release()
    cv2.destroyAllWindows()
    return extracted_frames


video2images = Model(
    identifier='video2images',
    object=video2images,
    flatten=True,
    model_update_kwargs={'document_embedded': False},
    output_schema=Schema(identifier='myschema', fields={'image': pil_image})
)

We'll also set up a listener to continuously download video URLs and save the best frames into another collection.

In [7]:
from pinnacledb import Listener

db.add(
   Listener(
       model=video2images,
       select=video_collection.find(),
       key='video',
   )
)

db.execute(Collection('_outputs.video.video2images').find_one()).unpack()['_outputs']['video']['video2images']['image']

[32m 2023-Nov-14 13:54:27.17[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36mpinnacledb.components.model[0m:[36m207 [0m | [1mAdding model video2images to db[0m
[32m 2023-Nov-14 13:54:27.17[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36mpinnacledb.components.model[0m:[36m210 [0m | [1mDone.[0m


1it [00:00, 1916.08it/s]


30.0


900it [00:00, 1844.03it/s]


InvalidDocument: documents must have only string keys, key was 0

## Create CLIP model
Now, we'll create a model for the CLIP (Contrastive Language-Image Pre-training) model, which will be used for visual and textual analysis.

In [None]:
import clip
from pinnacledb import vector
from pinnacledb.ext.torch import TorchModel

model, preprocess = clip.load("RN50", device='cpu')
t = vector(shape=(1024,))

visual_model = TorchModel(
    identifier='clip_image',
    preprocess=preprocess,
    object=model.visual,
    encoder=t,
    postprocess=lambda x: x.tolist(),
)

text_model = TorchModel(
    identifier='clip_text',
    object=model,
    preprocess=lambda x: clip.tokenize(x)[0],
    forward_method='encode_text',
    encoder=t,
    device='cpu',
    preferred_devices=None,
    postprocess=lambda x: x.tolist(),
)

## Create VectorIndex

We will set up a VectorIndex to index and search the video frames based on both visual and textual content. This involves creating an indexing listener for visual data and a compatible listener for textual data.

In [None]:
from pinnacledb import Listener, VectorIndex
from pinnacledb.backends.mongodb import Collection

db.add(
    VectorIndex(
        identifier='video_search_index',
        indexing_listener=Listener(
            model=visual_model,
            key='_outputs.video.video2images.image',
            select=Collection('_outputs.video.video2images').find(),
        ),
        compatible_listener=Listener(
            model=text_model,
            key='text',
            select=None,
            active=False
        )
    )
)

## Query a text against saved frames.

Now, let's search for something that happened during the video:

In [None]:
# Define the search parameters
search_term = 'Some ducks'
num_results = 1


r = next(db.execute(
    Collection('_outputs.video.video2images').like(Document({'text': search_term}), vector_index='video_search_index', n=num_results).find()
))

search_timestamp = r['_outputs']['video']['video2images']['current_timestamp']

# Get the back reference to the original video
video = db.execute(Collection('videos').find_one({'_id': r['_source']}))

## Start the video from the resultant timestamp:

Finally, we can display and play the video starting from the timestamp where the searched text is found.

In [None]:
from IPython.display import display, HTML

video_html = f"""
<video width="640" height="480" controls>
    <source src="{video['video'].uri}" type="video/mp4">
</video>
<script>
    var video = document.querySelector('video');
    video.currentTime = {search_timestamp};
    video.play();
</script>
"""

display(HTML(video_html))