<a href="https://colab.research.google.com/github/karen-pal/notebooks/blob/master/videogrep_workshop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Videogrep

Here's a quick notebook about using [Videogrep](https://github.com/antiboredom/videogrep) from within Google Colab.

## Install dependencies

In [None]:
!pip install videogrep
!pip install yt-dlp
!pip install vosk

## Set up a preview function for videos

In [None]:
from IPython.display import HTML
from base64 import b64encode
 
def preview(video_path, video_width = 600):
    """Preview a video in ipython. From:
    https://stackoverflow.com/questions/57377185/how-play-mp4-video-in-google-colab"""
    
    video_file = open(video_path, "r+b").read()
    video_url = f"data:video/mp4;base64,{b64encode(video_file).decode()}"
    return HTML(f"""<video width={video_width} controls><source src="{video_url}"></video>""")
 

## Download a video to work with

Use `yt-dlp` to download a youtube video. The option `-f 22` downloads a smaller sized 1280x720 video; `-o shell.mp4` saves the video as `shell.mp4`; and `--write-auto-sub` downloads the video's auto-generated subtitle file.

In [None]:
!yt-dlp "https://www.youtube.com/watch?v=32GQ0zIF8nY" -f 22 -o shell.mp4 --write-auto-sub

## Print out the most common words in the video

In [None]:
!videogrep --input shell.mp4 --ngrams 1

## Create a supercut

`--search-type fragment` tells videogrep to only extract individual words/phrases

`--search` tells videogrep what word to look for

In [None]:
!videogrep --input shell.mp4 --search-type fragment --search energy

## Preview the video

In [None]:
preview("supercut.mp4")

## Transcribe a video

You can use use the `--transcribe` option to transcribe a video. Sometimes this can yield better results than using youtube's auto-transcriber.

In [None]:
!videogrep --input shell.mp4 --transcribe

## Create another supercut

In [None]:
!videogrep --input shell.mp4 --search-type fragment --search billion --output billion.mp4

In [None]:
preview("billion.mp4")