layout | title | github_repo |
---|---|---|
project |
Chatbot for your favorite GIFs |
TBD |
PrepDB.ipynb
TV Show Name
│
└───Season 1
│ │───Episode 1.mp4
│ │───Episode 2.mp4
│ │ ...
│ │───Episode 24.mp4
│ │───Episode 1.srt
│ │───Episode 2.srt
│ │ ...
│ │───Episode 24.srt
│
└───Season 2
│ │ ...
│
└───Season 3
...
- We don't really need any specific file nomenclature. Only that each season's folder has alphabetical video files and alphabetical subtitle files.
- Put all these season folders into one 'Sitcom Name' folder
I've use .mp4
and .srt
here, but they can be other formats too.
The first cell in PrepDB.ipynb
simply reads the names of these files. The rest of this notebook reads individual subtitles, cleans them and then streams them into BigQuery line-by-line.
3 episodes in the GIF above were about 45,000 rows/subtitles.
MakeGIF.ipynb
-
Just set the variable name
selected_dialogue
as the word/dialog you want to search for. -
You will be asked which dialog you want to base your GIF on.
-
Based on your selection and its corresponding timestamps, a video is cut, the text is overlayed and the gif is saved.
Check out jeevz.py
where I made a chat bot to ask for GIFs!
I used this and this to make the chatbot. You should probably write your own, mine breaks easily.
These are easy enough to offer:
- You can make a GIF that includes subtitles adjacent to each other too
- Subtitles are sometimes a few seconds out of sync. Offer a simple +/- 3 seconds option for the video
- Edit the subtitle to a custom text?
--> Also, in BigQuery, implement a UDF for Levenshtein distance/ Cosine similarity so that the user need not remember it word for word
Any cloud based solution will basically need episodes available online to be downloaded ino an execution env
- Upload the seasons to GCS/ YouTube (Unlisted)
- Turn this into a Colab notebook (thus avoiding data cost/time in downloading the whole episode from where GIF is to be picked)
- Maybe even break each episode into small ¬5MB chunks and upload so that user running notebook locally does not have large UL/DL cost.