# Video Frame Search System with BLIP & Pinecone

This notebook sets up a complete video semantic search engine that:
- Extracts frames from videos
- Generates captions using BLIP
- Stores embeddings in Pinecone
- Enables natural language search

---


 ## Step 1: Setup - Clone Repository & Install Dependencies



In [3]:
# Clone the repository
!git clone https://github.com/pranavacchu/capstone-BLIP.git
%cd capstone-BLIP

# Install dependencies
print("üì¶ Installing dependencies... This will take 3-5 minutes")
!pip install -q opencv-python-headless pillow numpy pandas tqdm python-dotenv
!pip install -q torch torchvision transformers sentence-transformers
!pip install -q pinecone FlagEmbedding

print("\n‚úÖ Installation complete!")

# Check GPU availability
import torch
if torch.cuda.is_available():
    print(f"\nüöÄ GPU detected: {torch.cuda.get_device_name(0)}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("\n‚ö†Ô∏è No GPU detected. Using CPU (slower but works)")

Cloning into 'capstone-BLIP'...
remote: Enumerating objects: 123, done.[K
remote: Counting objects: 100% (123/123), done.[K
remote: Compressing objects: 100% (87/87), done.[K
remote: Total 123 (delta 74), reused 85 (delta 36), pack-reused 0 (from 0)[K
Receiving objects: 100% (123/123), 145.27 KiB | 1.18 MiB/s, done.
Resolving deltas: 100% (74/74), done.
/content/capstone-BLIP
üì¶ Installing dependencies... This will take 3-5 minutes
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m163.9/163.9 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m587.6/587.6 kB[0m [31m25.2 MB/s[0m eta [36m0:00:

## üöÄ Step 1.5: Start API Server for React Integration (Optional)

**Run this cell ONLY if you want to use the React frontend.**

This cell:
- Authenticates with ngrok (requires free account at https://ngrok.com)
- Starts a FastAPI server that runs on Colab's GPU
- Creates a public URL you can use from your React app

**‚ö†Ô∏è Important:**
1. You need a free ngrok account: https://dashboard.ngrok.com/signup
2. Get your authtoken: https://dashboard.ngrok.com/get-started/your-authtoken
3. Replace the NGROK_AUTH_TOKEN below with your token
4. Keep this cell running while using the React app
5. Copy the ngrok URL that appears and paste it in your React app

**Note:** If you just want to use the notebook directly (without React), skip this cell and continue to the next steps.

In [None]:
# Install FastAPI and dependencies
!pip install -q fastapi uvicorn pyngrok nest-asyncio python-multipart

# Authenticate ngrok
from pyngrok import ngrok

# Set your ngrok authtoken (get it from: https://dashboard.ngrok.com/get-started/your-authtoken)
NGROK_AUTH_TOKEN = "2uq19SSTYxlDhPoiMXnJXHuKAUV_6w77sQcS4KX3fRnDr1c4W"
ngrok.set_auth_token(NGROK_AUTH_TOKEN)
print("‚úÖ ngrok authenticated successfully!")

# Start the API server with ngrok tunnel
import nest_asyncio
import uvicorn
from colab_api_server import app

nest_asyncio.apply()

print("\nüåê Starting ngrok tunnel on port 8000...")
print("=" * 80)

# Kill any existing tunnels
ngrok.kill()

# Create ngrok tunnel
public_url_obj = ngrok.connect(8000, bind_tls=True)
# Extract the public URL string
public_url = str(public_url_obj).split('"')[1] if '"' in str(public_url_obj) else public_url_obj.public_url

print(f"\n‚úÖ COLAB API SERVER READY")
print("=" * 80)
print(f"\nüåç Public URL: {public_url}")
print(f"\nüìã Copy this URL and paste it into your React app:")
print(f"   {public_url}")
print("\n" + "=" * 80)

# Initialize the video search engine
print("\nüîß Initializing Video Search Engine...")
from video_search_engine import VideoSearchEngine

# Create a global engine instance for the app to use
import colab_api_server
if not colab_api_server.engine:
    colab_api_server.engine = VideoSearchEngine()
    print("‚úÖ Engine initialized successfully!")
else:
    print("‚úÖ Engine already initialized!")

# Display GPU status
import torch
if torch.cuda.is_available():
    print(f"\nüöÄ GPU: {torch.cuda.get_device_name(0)}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("\n‚ö†Ô∏è  Running on CPU (slower)")

print("\nüöÄ Starting FastAPI server...")
print("   Keep this notebook running while using the React app")
print("   The server will process videos using Colab's GPU")
print("\n" + "=" * 80)
print("\nüì° Server is ready to accept requests!")
print("   Go to your React app and paste the URL above to connect")
print("\n" + "=" * 80 + "\n")

# Start the server (this will block and keep running)
# Python 3.12+ compatible approach - run server directly with asyncio
import asyncio

async def start_server():
    """Start the FastAPI server using uvicorn"""
    config = uvicorn.Config(
        app, 
        host="0.0.0.0", 
        port=8000, 
        log_level="info"
    )
    server = uvicorn.Server(config)
    await server.serve()

# Get or create event loop and run the server
try:
    loop = asyncio.get_event_loop()
except RuntimeError:
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)

# Run the server
loop.run_until_complete(start_server())


‚úÖ ngrok authenticated successfully!
üåê Starting ngrok tunnel on port 8000...

‚úÖ COLAB API SERVER READY

üåç Public URL: NgrokTunnel: "https://c17157e4ab5d.ngrok-free.app" -> "http://localhost:8000"

üìã Copy this URL and paste it into your React app (.env file)
   REACT_APP_API_URL=NgrokTunnel: "https://c17157e4ab5d.ngrok-free.app" -> "http://localhost:8000"


üöÄ Starting FastAPI server...
   Keep this notebook running while using the React app
   The server will process videos using Colab's GPU


‚ùå Error starting server: _patch_asyncio.<locals>.run() got an unexpected keyword argument 'loop_factory'


  return asyncio_run(self.serve(sockets=sockets), loop_factory=self.config.get_loop_factory())


## ‚úÖ Server Started! Next Steps:

**If you see the public URL above:**

1. **Copy the ngrok URL** (looks like: `https://xxxx-xx-xxx-xxx-xx.ngrok-free.app`)

2. **Make sure your React app is running:**
   ```bash
   cd C:\Users\prana\OneDrive\Desktop\CAPSTONE\my-react-app
   npm run dev
   ```

3. **Navigate to:** `http://localhost:5173/dashboard`

4. **Click on the "üîç Video Search" or "üì§ Process Video" tab**

5. **Paste the ngrok URL** in the connection section and click "Connect"

6. **Start using the integrated dashboard!** You can now:
   - üìπ Use live webcam (original feature)
   - üîç Search videos using natural language
   - üì§ Upload and process new videos
   - Filter by date and category

**‚ö†Ô∏è Remember:** Keep this notebook running! If you stop this cell, the React app will lose connection.

---

**Continue with the cells below** if you want to use the notebook interface directly (without React).


## Step 1b: Install Object Detection Dependencies (Optional)

Install additional dependencies for object detection pipeline.

In [None]:
# Install Grounding DINO dependencies
print("Installing Grounding DINO and dependencies...")
print("This may take 2-3 minutes...")

import subprocess
subprocess.run(['pip', 'install', '-q', 'timm'], check=False)
subprocess.run(['pip', 'install', '-q', 'supervision'], check=False)

print("\nGrounding DINO dependencies installed!")
print("Models will be downloaded automatically from Hugging Face on first use")

Installing Grounding DINO and dependencies...
This may take 2-3 minutes...

Grounding DINO dependencies installed!
Models will be downloaded automatically from Hugging Face on first use


## Step 2: Configure Pinecone API Key

Enter your Pinecone credentials below:
- **API Key**: Your Pinecone API key
- **Index Host**: Your index URL (from Pinecone dashboard)

Your current settings:
```
API Key: pcsk_51Fgoo_2S9NQf4CHi8LMpX7AXKv4TEHgRdXR3huZcCwBdJkr7BMvmdGHeRASrk5hkz4AH1
Host: https://capstone-b5a0x4x.svc.aped-4627-b74a.pinecone.io
```

In [None]:
import os

# Set your Pinecone credentials
PINECONE_API_KEY = "pcsk_51Fgoo_2S9NQf4CHi8LMpX7AXKv4TEHgRdXR3huZcCwBdJkr7BMvmdGHeRASrk5hkz4AH1"
PINECONE_HOST = "https://capstone-b5a0x4x.svc.aped-4627-b74a.pinecone.io"
PINECONE_ENVIRONMENT = "us-east-1"

# Write to .env file
with open('.env', 'w') as f:
    f.write(f"PINECONE_API_KEY={PINECONE_API_KEY}\n")
    f.write(f"PINECONE_HOST={PINECONE_HOST}\n")
    f.write(f"PINECONE_ENVIRONMENT={PINECONE_ENVIRONMENT}\n")

print("‚úÖ Configuration saved!")

‚úÖ Configuration saved!


##  Step 3: Test Connection to Pinecone



In [None]:
from video_search_engine import VideoSearchEngine

print("üîå Connecting to Pinecone...")
engine = VideoSearchEngine()

# Get database stats
stats = engine.get_index_stats()

print("\n‚úÖ Successfully connected to Pinecone!")
print(f"\nüìä Database Statistics:")
print(f"   Index: {stats.get('index_name', 'N/A')}")
print(f"   Total vectors: {stats.get('total_vectors', 0):,}")
print(f"   Dimension: {stats.get('dimension', 'N/A')}")
print(f"   Capacity: {stats.get('capacity', 'N/A')}")

# Optional: Show namespace statistics if available
if stats.get('namespaces'):
    print(f"\nüìÅ Namespaces:")
    for namespace, ns_info in stats.get('namespaces', {}).items():
        # Handle NamespaceSummary object - access vector_count attribute
        vector_count = ns_info.vector_count if hasattr(ns_info, 'vector_count') else ns_info
        print(f"   - {namespace}: {vector_count:,} vectors")


üîå Connecting to Pinecone...

‚úÖ Successfully connected to Pinecone!

üìä Database Statistics:
   Index: test
   Total vectors: 244
   Dimension: 1024
   Capacity: Serverless

üìÅ Namespaces:
   - backpack: 2 vectors
   - coat_jacket: 3 vectors
   - duffel_bag: 2 vectors
   - : 235 vectors
   - folder: 2 vectors


## Step 4: Upload a Video File



## Step 4b: Choose Captioning Method

Select between standard BLIP or object detection + BLIP pipeline.

In [None]:
from google.colab import files
import os
import subprocess
from urllib.parse import urlparse, parse_qs

print("üì§ Choose how to get your video:\n")
print("1. Upload from computer (recommended for small files < 100MB)")
print("2. Download from URL (direct video file)")
print("3. Download from YouTube URL\n")

choice = input("Enter choice (1/2/3): ").strip()
video_path = None

if choice == "1":
    print("\nüìÅ Please select your video file...")
    uploaded = files.upload()
    if uploaded:
        video_path = list(uploaded.keys())[0]
        print(f"‚úÖ Uploaded: {video_path}")
    else:
        print("‚ùå No file uploaded")

elif choice == "2":
    video_url = input("\nEnter video URL (direct link to .mp4, .avi, etc.): ").strip()

    if not video_url:
        print("‚ùå No URL provided")
    else:
        # Extract filename from URL or use default
        parsed_url = urlparse(video_url)
        url_filename = os.path.basename(parsed_url.path)

        # Use URL filename if it has an extension, otherwise use default
        if url_filename and '.' in url_filename:
            video_filename = url_filename
        else:
            video_filename = "downloaded_video.mp4"

        print(f"‚¨áÔ∏è Downloading from URL...")
        print(f"   Target file: {video_filename}")

        try:
            # Use subprocess for better control
            result = subprocess.run(
                ['wget', '-O', video_filename, video_url, '--no-check-certificate', '-q', '--show-progress'],
                capture_output=True,
                text=True,
                timeout=300
            )

            if result.returncode == 0 and os.path.exists(video_filename):
                if os.path.getsize(video_filename) > 0:
                    video_path = video_filename
                    print(f"‚úÖ Downloaded successfully: {video_filename}")
                else:
                    print(f"‚ùå Download failed: File is empty")
                    if os.path.exists(video_filename):
                        os.remove(video_filename)
            else:
                print(f"‚ùå Download failed: wget returned code {result.returncode}")
                # Try alternative method with curl
                print("\nüîÑ Trying alternative download method (curl)...")
                result2 = subprocess.run(
                    ['curl', '-L', '-o', video_filename, video_url, '--silent', '--show-error'],
                    capture_output=True,
                    text=True,
                    timeout=300
                )

                if result2.returncode == 0 and os.path.exists(video_filename) and os.path.getsize(video_filename) > 0:
                    video_path = video_filename
                    print(f"‚úÖ Downloaded successfully with curl: {video_filename}")
                else:
                    print(f"‚ùå Alternative download also failed")
                    print("   Please check if the URL is accessible and try again")

        except subprocess.TimeoutExpired:
            print("‚ùå Download timed out (>5 minutes). File may be too large.")
        except Exception as e:
            print(f"‚ùå Download error: {e}")

elif choice == "3":
    youtube_url = input("\nEnter YouTube URL (video or shorts): ").strip()

    if not youtube_url:
        print("‚ùå No URL provided")
    else:
        print("‚¨áÔ∏è Downloading from YouTube...")
        print("   Installing/updating yt-dlp...")

        # Install/update yt-dlp to latest version (fixes bot detection)
        subprocess.run(['pip', 'install', '-U', 'yt-dlp'], check=False)

        video_filename = "youtube_video.mp4"

        try:
            print(f"   Fetching video (this may take a minute)...")

            # Enhanced yt-dlp command with multiple bot detection workarounds
            result = subprocess.run(
                [
                    'yt-dlp',
                    # Format selection - prefer lower quality for faster download
                    '-f', 'best[height<=720][ext=mp4]/best[ext=mp4]/best',
                    '-o', video_filename,
                    '--no-playlist',
                    # Bot detection workarounds
                    '--extractor-args', 'youtube:player_client=android,web',
                    '--user-agent', 'Mozilla/5.0 (Linux; Android 11) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.120 Mobile Safari/537.36',
                    # Additional options
                    '--no-check-certificate',
                    '--no-warnings',
                    '--prefer-insecure',
                    # Progress and retries
                    '--progress',
                    '--retries', '3',
                    '--fragment-retries', '3',
                    youtube_url
                ],
                capture_output=True,
                text=True,
                timeout=600  # 10 minute timeout
            )

            if result.returncode == 0 and os.path.exists(video_filename):
                if os.path.getsize(video_filename) > 0:
                    video_path = video_filename
                    print(f"‚úÖ Downloaded successfully: {video_filename}")
                else:
                    print(f"‚ùå Download failed: File is empty")
                    if os.path.exists(video_filename):
                        os.remove(video_filename)
            else:
                print(f"‚ùå YouTube download failed")
                if result.stderr:
                    error_msg = result.stderr[:500]
                    print(f"   Error details: {error_msg}")

                # Try alternative method with mobile client only
                print("\nüîÑ Trying alternative method (mobile client)...")
                result2 = subprocess.run(
                    [
                        'yt-dlp',
                        '-f', 'worst[ext=mp4]/worst',  # Use lowest quality for testing
                        '-o', video_filename,
                        '--extractor-args', 'youtube:player_client=android',
                        '--user-agent', 'com.google.android.youtube/17.31.35 (Linux; U; Android 11) gzip',
                        '--no-check-certificate',
                        '--no-playlist',
                        youtube_url
                    ],
                    capture_output=True,
                    text=True,
                    timeout=300
                )

                if result2.returncode == 0 and os.path.exists(video_filename) and os.path.getsize(video_filename) > 0:
                    video_path = video_filename
                    print(f"‚úÖ Downloaded successfully with alternative method: {video_filename}")
                else:
                    print(f"‚ùå Alternative method also failed")
                    print("\nüí° Troubleshooting tips:")
                    print("   1. Make sure the video is public and not age-restricted")
                    print("   2. Try using Option 1 to upload the video manually")
                    print("   3. Alternative: Download video on your computer first, then upload")
                    print("   4. Some YouTube videos may be restricted in certain regions")

        except subprocess.TimeoutExpired:
            print("‚ùå Download timed out (>10 minutes).")
            print("   Video may be too long or connection is slow.")
        except Exception as e:
            print(f"‚ùå Download error: {e}")

else:
    print("‚ö†Ô∏è Invalid choice. Please choose option 1, 2, or 3.")

# Validate the video file
if video_path:
    if os.path.exists(video_path):
        file_size = os.path.getsize(video_path) / (1024*1024)  # MB
        print(f"\nüìπ Video ready: {video_path} ({file_size:.1f} MB)")

        # Verify it's a valid video file
        import cv2
        cap = cv2.VideoCapture(video_path)
        if cap.isOpened():
            fps = cap.get(cv2.CAP_PROP_FPS)
            frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
            duration = frame_count / fps if fps > 0 else 0
            print(f"   Duration: {duration:.1f} seconds")
            print(f"   FPS: {fps:.1f}")
            print(f"   Total frames: {frame_count:,}")
            cap.release()
        else:
            print("\n‚ö†Ô∏è Warning: Unable to read video file. It may be corrupted.")
            print("   Please try a different video or URL.")
            video_path = None
    else:
        print(f"\n‚ùå Error: File not found at {video_path}")
        video_path = None

if not video_path:
    print("\n‚ùå No valid video file available. Please run this cell again.")

üì§ Choose how to get your video:

1. Upload from computer (recommended for small files < 100MB)
2. Download from URL (direct video file)
3. Download from YouTube URL

Enter choice (1/2/3): 3

Enter YouTube URL (video or shorts): https://www.youtube.com/shorts/QhlroYnundk
‚¨áÔ∏è Downloading from YouTube...
   Installing/updating yt-dlp...
   Fetching video (this may take a minute)...
‚úÖ Downloaded successfully: youtube_video.mp4

üìπ Video ready: youtube_video.mp4 (0.7 MB)
   Duration: 7.6 seconds
   FPS: 30.0
   Total frames: 228


In [None]:
print("Choose your captioning method:\n")
print("1. Standard BLIP (faster, general scene captions)")
print("2. Object Detection + BLIP (slower, object-focused)")
print()

method_choice = input("Enter choice (1/2, default=1): ").strip() or "1"
use_object_detection = (method_choice == "2")

if use_object_detection:
    print("\nUsing Object Detection + BLIP pipeline")
    print("   Detects objects: bags, laptops, helmets, phones, etc.")
else:
    print("\nUsing Standard BLIP captioning")

Choose your captioning method:

1. Standard BLIP (faster, general scene captions)
2. Object Detection + BLIP (slower, object-focused)

Enter choice (1/2, default=1): 2

Using Object Detection + BLIP pipeline
   Detects objects: bags, laptops, helmets, phones, etc.


## Step 4c: Select Video Recording Date

Choose the date when this video was recorded. This enables efficient date-based filtering.


In [None]:
from datetime import date, datetime

# Manual date input (since ipywidgets may not work in all Colab versions)
print("üìÖ Select video recording date")
print("=" * 60)
print(f"Today's date: {date.today()}")
print()

use_today = input("Use today's date? (y/n, default=y): ").strip().lower()

if use_today == 'n':
    date_input = input("Enter date (YYYY-MM-DD): ").strip()
    try:
        # Validate date format
        datetime.strptime(date_input, "%Y-%m-%d")
        video_date = date_input
    except ValueError:
        print("‚ö†Ô∏è Invalid date format. Using today's date.")
        video_date = date.today().strftime("%Y-%m-%d")
else:
    video_date = date.today().strftime("%Y-%m-%d")

print(f"\n‚úÖ Video date set to: {video_date}")
print(f"   This will be used for efficient date-based searching")


üìÖ Select video recording date
Today's date: 2025-11-06

Use today's date? (y/n, default=y): y

‚úÖ Video date set to: 2025-11-06
   This will be used for efficient date-based searching


## Step 5: Process the Video

This will:
1. Extract frames from the video (removing redundant frames)
2. Generate captions using BLIP AI model
3. Create embeddings for semantic search
4. Upload to Pinecone database

**Expected time:**
- 1 minute video: ~2-3 minutes with GPU
- 5 minute video: ~8-10 minutes with GPU
- CPU mode: 3-5x slower

In [None]:
import time
from datetime import datetime

if 'video_path' not in locals() or not video_path:
    print("‚ùå Please upload a video first (run the previous cell)")
else:
    # Set video name
    video_name = input("Enter a name for this video (or press Enter for auto-name): ").strip()
    if not video_name:
        video_name = f"video_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

    print(f"\nüé¨ Processing video: {video_name}")
    print(f"üìÖ Video date: {video_date}")
    print("‚è≥ This will take a few minutes... Please wait.\n")
    print("=" * 80)

    start_time = time.time()

    try:
        # Process the video with improved logging
        stats = engine.process_video(
            video_path=video_path,
            video_name=video_name,
            video_date=video_date,  # Include video date
            save_frames=False,  # Set to True to save frames
            upload_to_pinecone=True,
            use_object_detection=use_object_detection  # Use selected method
        )

        processing_time = time.time() - start_time

        print("\n" + "=" * 80)
        print("\n‚úÖ VIDEO PROCESSING COMPLETE!\n")
        print(f"üìä Processing Statistics:")
        print(f"   Video name: {video_name}")
        print(f"   Video date: {video_date}")
        print(f"   Frames extracted: {stats['total_frames_extracted']:,}")
        print(f"   Frames with captions: {stats['frames_with_captions']:,}")
        print(f"   Captions before dedupe: {stats.get('captions_before_dedupe', stats['frames_with_captions']):,}")
        print(f"   Unique embeddings: {stats.get('embeddings_generated', 0):,}")
        print(f"   ‚úÖ Actually uploaded: {stats['embeddings_uploaded']:,}")
        print(f"   Processing time: {processing_time/60:.1f} minutes")
        print(f"\n   Frame reduction: {stats.get('frame_reduction_percent', 0):.1f}%")

        # Save video_name for next steps
        processed_video_name = video_name

    except Exception as e:
        print(f"\n‚ùå Error processing video: {e}")
        print("\nTroubleshooting tips:")
        print("- If GPU memory error: Restart runtime and try again")
        print("- If video format error: Convert video to MP4 format")
        import traceback
        traceback.print_exc()


Enter a name for this video (or press Enter for auto-name): test2


Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.



üé¨ Processing video: test2
üìÖ Video date: 2025-11-06
‚è≥ This will take a few minutes... Please wait.



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


preprocessor_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/506 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/387 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/690 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.24G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/418 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/201 [00:00<?, ?B/s]

Extracting frames: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 228/228 [00:01<00:00, 202.67it/s]



üéØ OBJECT DETECTION + CAPTIONING MODE


preprocessor_config.json:   0%|          | 0.00/457 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/82.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/689M [00:00<?, ?B/s]




üì∏ Processing Frame: e6d072b6ba7b (t=0.03s)
   Found 1 objects


Processing frames:  17%|‚ñà‚ñã        | 1/6 [00:04<00:24,  4.82s/it]

   ‚îú‚îÄ Object 1: Backpack
   ‚îÇ  Caption: Backpack: A The back of a backpack on a green field.
   ‚îÇ  Namespace: backpack
   ‚îÇ  Confidence: 73.55%
   ‚îî‚îÄ ‚úì Generated 1 valid caption(s)

üì∏ Processing Frame: 6d8f1a3acd55 (t=1.50s)
   Found 2 objects
   ‚îú‚îÄ Object 1: Backpack
   ‚îÇ  Caption: Backpack: An A man with a backpack walking down the street.
   ‚îÇ  Namespace: backpack
   ‚îÇ  Confidence: 66.31%


Processing frames:  33%|‚ñà‚ñà‚ñà‚ñé      | 2/6 [00:05<00:10,  2.58s/it]

   ‚îú‚îÄ Object 2: Duffel
   ‚îÇ  Caption: Duffel: An a red backpack sitting on top of a green field.
   ‚îÇ  Namespace: duffel_bag
   ‚îÇ  Confidence: 26.02%
   ‚îî‚îÄ ‚úì Generated 2 valid caption(s)

üì∏ Processing Frame: 601addaeab1f (t=3.80s)
   Found 1 objects


Processing frames:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 3/6 [00:06<00:05,  1.78s/it]

   ‚îú‚îÄ Object 1: Backpack
   ‚îÇ  Caption: Backpack: An A person with a backpack on their back.
   ‚îÇ  Namespace: backpack
   ‚îÇ  Confidence: 69.57%
   ‚îî‚îÄ ‚úì Generated 1 valid caption(s)

üì∏ Processing Frame: 37cc49bdf07d (t=4.80s)
   Found 2 objects
   ‚îú‚îÄ Object 1: Backpack (skipped - no caption generated)


Processing frames:  67%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã   | 4/6 [00:07<00:02,  1.47s/it]

   ‚îú‚îÄ Object 2: Coat Jacket (skipped - no caption generated)

üì∏ Processing Frame: c91ef6f95a7b (t=6.43s)
   Found 2 objects
   ‚îú‚îÄ Object 1: Backpack (skipped - duplicate)


Processing frames:  83%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé | 5/6 [00:08<00:01,  1.32s/it]

   ‚îú‚îÄ Object 2: File
   ‚îÇ  Caption: File: An a white refrigerator sitting on top of a wooden floor.
   ‚îÇ  Namespace: folder
   ‚îÇ  Confidence: 29.89%
   ‚îî‚îÄ ‚úì Generated 1 valid caption(s)

üì∏ Processing Frame: 810b80eb847b (t=7.23s)
   Found 4 objects
   ‚îú‚îÄ Object 1: Backpack (skipped - no caption generated)
   ‚îú‚îÄ Object 2: Suitcase Luggage (skipped - no caption generated)
   ‚îú‚îÄ Object 3: File (skipped - duplicate)


Processing frames: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6/6 [00:10<00:00,  1.71s/it]

   ‚îú‚îÄ Object 4: Coat Jacket
   ‚îÇ  Caption: Coat Jacket: An A man in a red jacket is walking down the street.
   ‚îÇ  Namespace: coat_jacket
   ‚îÇ  Confidence: 28.04%
   ‚îî‚îÄ ‚úì Generated 1 valid caption(s)





Batches:   0%|          | 0/1 [00:00<?, ?it/s]


‚òÅÔ∏è  UPLOADING TO PINECONE VECTOR DATABASE

üìÅ Namespace: videos:2025-11-06:backpack
   Uploading 2 vectors...


Uploading to Pinecone: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00,  3.03it/s]


   ‚úì Uploaded 2 vectors
   Sample caption: Backpack: A The back of a backpack on a green field....

üìÅ Namespace: videos:2025-11-06:duffel_bag
   Uploading 2 vectors...


Uploading to Pinecone: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00,  8.17it/s]


   ‚úì Uploaded 2 vectors
   Sample caption: Backpack: An A man with a backpack walking down the street....

üìÅ Namespace: videos:2025-11-06:folder
   Uploading 1 vectors...


Uploading to Pinecone: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 12.38it/s]


   ‚úì Uploaded 1 vectors
   Sample caption: File: An a white refrigerator sitting on top of a wooden floor....

üìÅ Namespace: videos:2025-11-06:coat_jacket
   Uploading 1 vectors...


Uploading to Pinecone: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 12.02it/s]


   ‚úì Uploaded 1 vectors
   Sample caption: Coat Jacket: An A man in a red jacket is walking down the street....

‚úÖ UPLOAD COMPLETE: 6 vectors uploaded for 'test2'
   Sample vector IDs: e6d072b6ba7b_obj0_emb, 6d8f1a3acd55_obj1_emb, 6d8f1a3acd55_obj2_emb...



‚úÖ VIDEO PROCESSING COMPLETE!

üìä Processing Statistics:
   Video name: test2
   Video date: 2025-11-06
   Frames extracted: 6
   Frames with captions: 6
   Captions before dedupe: 6
   Unique embeddings: 6
   ‚úÖ Actually uploaded: 6
   Processing time: 1.3 minutes

   Frame reduction: 0.0%


## Step 6: Search Your Video!

Now you can search for content using natural language queries.

**Example queries:**
- "person walking"
- "black bag"
- "someone talking on phone"
- "car driving"
- "red shirt"

The system will return timestamps where that content appears!

## Step 6b: View Namespace Statistics (Object Detection Mode)

If you used object detection mode, check how vectors are distributed across namespaces.

In [None]:
# View namespace distribution
if use_object_detection:
    stats = engine.get_index_stats()

    print("üìä NAMESPACE DISTRIBUTION")
    print("=" * 60)

    if 'namespaces' in stats and stats['namespaces']:
        print(f"\nTotal vectors: {stats['total_vectors']:,}")
        print(f"\nVectors by object category:")

        # Sort namespaces by vector count
        namespace_items = []
        for namespace, ns_info in stats['namespaces'].items():
            vector_count = ns_info.vector_count if hasattr(ns_info, 'vector_count') else ns_info
            namespace_items.append((namespace, vector_count))

        namespace_items.sort(key=lambda x: x[1], reverse=True)

        for namespace, count in namespace_items:
            # Format namespace name for display
            display_name = namespace.replace('_', ' ').title()
            percentage = (count / stats['total_vectors'] * 100) if stats['total_vectors'] > 0 else 0
            bar_length = int(percentage / 2)  # Scale to 50 chars max
            bar = "‚ñà" * bar_length

            print(f"   {display_name:20s} ‚îÇ {bar:50s} {count:4d} ({percentage:5.1f}%)")
    else:
        print("No namespace data available yet. Process a video first.")

    print("=" * 60)
else:
    print("‚ö†Ô∏è Namespace statistics only available in object detection mode")
    print("   Re-run with use_object_detection = True to see namespace breakdown")

üìä NAMESPACE DISTRIBUTION

Total vectors: 250

Vectors by object category:
                        ‚îÇ ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà     235 ( 94.0%)
   Coat Jacket          ‚îÇ                                                       3 (  1.2%)
   Videos:2025-11-06:Backpack ‚îÇ                                                       2 (  0.8%)
   Backpack             ‚îÇ                                                       2 (  0.8%)
   Folder               ‚îÇ                                                       2 (  0.8%)
   Videos:2025-11-06:Duffel Bag ‚îÇ                                                       2 (  0.8%)
   Duffel Bag           ‚îÇ                                                       2 (  0.8%)
   Videos:2025-11-06:Coat Jacket ‚îÇ                                                       1 (  0.4%)
   Videos:2025-11-06:Folder ‚îÇ                               

In [None]:
# Single search query with optional namespace filtering
query = input("üîç Enter your search query: ")

# Optional: Search in specific object namespace
if use_object_detection:
    print("\nüìÅ Available namespaces:")
    print("   - backpack, bag, duffel_bag")
    print("   - laptop, tablet")
    print("   - helmet, bottle, folder, umbrella")
    print("   - coat_jacket, suitcase_luggage")
    print("   - (leave empty to search all)")

    namespace_filter = input("\nFilter by namespace (optional): ").strip().lower()
else:
    namespace_filter = ""

print(f"\nSearching for: '{query}'...")
if namespace_filter:
    print(f"Namespace filter: {namespace_filter}")
print("=" * 60)

# Perform search with namespace if specified
if namespace_filter and use_object_detection:
    # Search in specific namespace using Pinecone directly
    query_embedding = engine.embedding_generator.encode_query(query)
    search_results = engine.pinecone_manager.query(
        query_vector=query_embedding,
        top_k=5,
        namespace=namespace_filter,
        include_metadata=True
    )

    # Format results
    results = []
    for result in search_results:
        formatted_result = {
            "timestamp": result.timestamp,
            "caption": result.caption,
            "similarity_score": result.score,
            "frame_id": result.frame_id,
            "video_name": result.video_name,
            "time_formatted": engine._format_timestamp(result.timestamp)
        }
        results.append(formatted_result)
else:
    # Standard search across all namespaces
    results = engine.search(
        query=query,
        top_k=5,
        similarity_threshold=0.5
    )

if results:
    print(f"\n‚úÖ Found {len(results)} results:\n")

    for i, result in enumerate(results, 1):
        print(f"{i}. ‚è±Ô∏è Timestamp: {result['time_formatted']}")
        print(f"   üìù Caption: {result['caption']}")
        print(f"   üìä Confidence: {result['similarity_score']:.1%}")
        print(f"   üé• Video: {result['video_name']}")
        print()
else:
    print("\n‚ùå No results found. Try:")
    print("   - Different search terms")
    print("   - More general queries")
    print("   - Lowering the similarity threshold")
    if namespace_filter:
        print(f"   - Searching without namespace filter (currently: {namespace_filter})")

üîç Enter your search query: bag

üìÅ Available namespaces:
   - backpack, bag, duffel_bag
   - laptop, tablet
   - helmet, bottle, folder, umbrella
   - coat_jacket, suitcase_luggage
   - (leave empty to search all)

Filter by namespace (optional): backpack

Searching for: 'bag'...
Namespace filter: backpack

‚úÖ Found 2 results:

1. ‚è±Ô∏è Timestamp: 00:03.80
   üìù Caption: Backpack: An A person with a backpack on their back.
   üìä Confidence: 81.1%
   üé• Video: test

2. ‚è±Ô∏è Timestamp: 00:00.03
   üìù Caption: Backpack: A The back of a backpack on a green field.
   üìä Confidence: 80.0%
   üé• Video: test



## Step 6c: Date-Based Search

Search videos from specific dates or date ranges for faster, more targeted results.


In [None]:
# Date-based search
print("üîç DATE-BASED VIDEO SEARCH")
print("=" * 60)

# Show available dates
available_dates = engine.get_available_dates()

if available_dates:
    print(f"\nüìÖ Videos available for {len(available_dates)} dates:")
    for d in available_dates[:10]:  # Show first 10
        print(f"   - {d}")
    if len(available_dates) > 10:
        print(f"   ... and {len(available_dates) - 10} more dates")
else:
    print("\n‚ö†Ô∏è No videos with dates found in index")
    print("   Make sure you processed videos with date information")

# Get search query
print("\n" + "=" * 60)
query = input("Enter your search query: ")

# Date filtering options
print("\nüìÜ Date Filter Options:")
print("1. Search specific date (YYYY-MM-DD)")
print("2. Search date range (YYYY-MM-DD to YYYY-MM-DD)")
print("3. Search all dates (no filter)")

filter_option = input("\nSelect option (1/2/3, default=3): ").strip() or "3"

date_filter = None
start_date = None
end_date = None

if filter_option == "1":
    # Single date
    date_input = input("Enter date (YYYY-MM-DD): ").strip()
    if date_input in available_dates:
        date_filter = date_input
        print(f"‚úÖ Searching only videos from {date_filter}")
    else:
        print(f"‚ö†Ô∏è Date {date_input} not found. Searching all dates.")

elif filter_option == "2":
    # Date range
    start_date = input("Start date (YYYY-MM-DD): ").strip()
    end_date = input("End date (YYYY-MM-DD): ").strip()
    print(f"‚úÖ Searching videos from {start_date} to {end_date}")

# Optional category filter
if use_object_detection:
    print("\nüìÅ Category Filter (optional):")
    print("   backpack, bag, duffel_bag, laptop, tablet,")
    print("   helmet, bottle, folder, umbrella, coat_jacket, suitcase_luggage")
    namespace_filter = input("Filter by category (or press Enter to skip): ").strip().lower() or None
else:
    namespace_filter = None

print("\n" + "=" * 60)
print(f"Searching for: '{query}'")
if date_filter:
    print(f"Date: {date_filter}")
elif start_date and end_date:
    print(f"Date range: {start_date} to {end_date}")
if namespace_filter:
    print(f"Category: {namespace_filter}")
print("=" * 60)

# Perform search
try:
    if start_date and end_date:
        # Date range search
        results = engine.search_by_date_range(
            query=query,
            start_date=start_date,
            end_date=end_date,
            top_k=10,
            namespace_filter=namespace_filter
        )
    else:
        # Single date or no date filter
        results = engine.search(
            query=query,
            top_k=10,
            date_filter=date_filter,
            namespace_filter=namespace_filter
        )

    if results:
        print(f"\n‚úÖ Found {len(results)} results:\n")

        for i, result in enumerate(results, 1):
            video_date_display = result.get('video_date', 'unknown')
            print(f"{i}. üìÖ {video_date_display} | ‚è±Ô∏è {result['time_formatted']}")
            print(f"   üìù {result['caption']}")
            print(f"   üìä Confidence: {result['similarity_score']:.1%}")
            print(f"   üé• Video: {result['video_name']}")
            print()
    else:
        print("\n‚ùå No results found.")
        print("\nTry:")
        print("   - Different search terms")
        print("   - Broader date range")
        print("   - Removing filters")

except Exception as e:
    print(f"\n‚ùå Search error: {e}")
    import traceback
    traceback.print_exc()


üîç DATE-BASED VIDEO SEARCH

üìÖ Videos available for 1 dates:
   - 2025-11-06



## Step 7: Batch Search (Multiple Queries)

Search for multiple things at once!

In [None]:
# Define multiple queries
queries = [
    "person walking",
    "someone sitting",
    "black bag",
    "outdoor scene",
    "person talking"
]

print("üîç Running batch search...\n")
print("=" * 60)

batch_results = engine.batch_search(queries, top_k=3)

for query, results in batch_results.items():
    print(f"\nüìå Query: '{query}'")
    print(f"   Found {len(results)} results")

    if results:
        for result in results[:2]:  # Show top 2
            print(f"   ‚îî‚îÄ {result['time_formatted']} - {result['caption'][:50]}... ({result['similarity_score']:.0%})")
    else:
        print("   ‚îî‚îÄ No results")

print("\n" + "=" * 60)

## Step 8: Advanced Search with Filters

Search with additional filters:
- Filter by specific video
- Search within time range
- Adjust confidence threshold

In [None]:
# Advanced search example
query = input("Enter search query: ")

# Optional: Filter by time window (in seconds)
use_time_filter = input("Filter by time range? (y/n): ").lower() == 'y'

time_window = None
if use_time_filter:
    start_time = float(input("Start time (seconds): "))
    end_time = float(input("End time (seconds): "))
    time_window = (start_time, end_time)

# Optional: Filter by video name
video_filter = None
if 'processed_video_name' in locals():
    filter_video = input(f"Search only in '{processed_video_name}'? (y/n): ").lower() == 'y'
    if filter_video:
        video_filter = processed_video_name

# Perform search
print(f"\nüîç Searching with filters...")
results = engine.search(
    query=query,
    top_k=10,
    similarity_threshold=0.4,  # Lower threshold for more results
    video_filter=video_filter,
    time_window=time_window
)

print(f"\n‚úÖ Found {len(results)} results:\n")
for i, result in enumerate(results, 1):
    print(f"{i}. {result['time_formatted']} - {result['caption'][:60]}... ({result['similarity_score']:.1%})")

## Step 9: Interactive Search Interface


In [None]:
print("üéØ INTERACTIVE VIDEO SEARCH")
print("=" * 60)
print("Enter your search queries (type 'quit' to exit)\n")

while True:
    query = input("\nüîç Search: ").strip()

    if query.lower() in ['quit', 'exit', 'q']:
        print("\nüëã Goodbye!")
        break

    if not query:
        continue

    results = engine.search(query, top_k=5)

    if results:
        print(f"\n‚úÖ Found {len(results)} results:")
        for i, result in enumerate(results, 1):
            score_emoji = "üü¢" if result['similarity_score'] > 0.7 else "üü°" if result['similarity_score'] > 0.5 else "üü†"
            print(f"\n{i}. {score_emoji} {result['time_formatted']} ({result['similarity_score']:.0%})")
            print(f"   {result['caption']}")
    else:
        print("\n‚ùå No results found. Try a different query.")

## üîß Troubleshooting: Verify Pinecone Upload

If vectors aren't showing up in Pinecone, use this diagnostic cell.

In [None]:
# Diagnostic script to verify Pinecone upload
print("üîç PINECONE UPLOAD DIAGNOSTICS")
print("=" * 80)

# 1. Check Pinecone connection
try:
    stats = engine.get_index_stats()
    print(f"\n‚úÖ Connected to Pinecone index: {stats['index_name']}")
    print(f"   Total vectors: {stats['total_vectors']:,}")
    print(f"   Dimension: {stats['dimension']}")
    print(f"   Metric: {stats['metric']}")

    # 2. Check namespaces
    if 'namespaces' in stats and stats['namespaces']:
        print(f"\nüìÅ Namespaces found: {len(stats['namespaces'])}")
        for ns, info in stats['namespaces'].items():
            count = info.vector_count if hasattr(info, 'vector_count') else info
            print(f"   - {ns}: {count} vectors")
    else:
        print("\n‚ö†Ô∏è  No namespaces found in index")
        print("   This means no vectors have been uploaded yet, OR")
        print("   vectors were uploaded to default namespace")

    # 3. Try querying default namespace
    print(f"\nüîé Testing query on default namespace...")
    test_query = "backpack"
    query_emb = engine.embedding_generator.encode_query(test_query)
    results_default = engine.pinecone_manager.query(
        query_vector=query_emb,
        top_k=5,
        namespace=""  # Default namespace
    )
    print(f"   Default namespace results: {len(results_default)}")

    # 4. Try querying specific namespace (if using object detection)
    if use_object_detection:
        print(f"\nüîé Testing query on 'backpack' namespace...")
        results_backpack = engine.pinecone_manager.query(
            query_vector=query_emb,
            top_k=5,
            namespace="backpack"
        )
        print(f"   Backpack namespace results: {len(results_backpack)}")

    # 5. Check if vectors were actually created
    if hasattr(engine, 'processed_frames') and engine.processed_frames:
        print(f"\nüìä Last processing session:")
        print(f"   Embedded frames created: {len(engine.processed_frames)}")
        if engine.processed_frames:
            sample = engine.processed_frames[0]
            print(f"   Sample embedding ID: {sample.embedding_id}")
            print(f"   Sample caption: {sample.captioned_frame.caption[:70]}...")
            print(f"   Embedding dimension: {len(sample.embedding)}")
            print(f"   Expected dimension: {engine.embedding_generator.embedding_dim}")
    else:
        print(f"\n‚ö†Ô∏è  No processed frames found in engine")

    # 6. Recommendations
    print(f"\nüí° RECOMMENDATIONS:")
    if stats['total_vectors'] == 0:
        print("   ‚ùå No vectors in Pinecone index")
        print("   ‚Üí Check if upload actually succeeded (look for error messages above)")
        print("   ‚Üí Verify Pinecone API key and host are correct")
        print("   ‚Üí Try re-running the video processing step")
    else:
        print("   ‚úÖ Vectors found in index!")
        print(f"   ‚Üí You have {stats['total_vectors']} vectors")
        print("   ‚Üí Try searching with the cells above")

except Exception as e:
    print(f"\n‚ùå Error during diagnostics: {e}")
    import traceback
    traceback.print_exc()

print("\n" + "=" * 80)