This project is used to extract subtitles from the video. First, the key frames is extracted from the video, and then the subtitle area of the frame picture is cropped, and the text is recognized by the OCR. Extract key frames from Amanpreet Walia.
- numpy (find here: http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy)
- matplotlib
pip install Matplotlib
- scipy
- opencv-python
- pytesseract
pip install pytesseract
Download and run it, select language support you want. Then modify the ../site-packages/pytesseract/pytesseract.py
:
tesseract_cmd = 'X:\\Tesseract-OCR\\tesseract.exe'
(your install path)
λ python extract_subtitles.py <videopath> <Paremeter to select how many frames you want>
This project is licensed under the MIT License - see the LICENSE.md file for details