Use OpenAI's CLIP neural network to search inside YouTube videos. You can try it by running the notebook on Google Colab.
- Integrated to Huggingface Spaces with Gradio. See demo:
- Download the YouTube video
- Extract every N-th frame
- Encode all frames using CLIP
- Encode a natural language search query using CLIP
- Find the images that best match the search query
For more details see the notebook.
Here are some example searches from this YouTube video of a car driving around San Francisco.
You can also try my other project to search from 2M photos on Unsplash using natural language queries: