Code to extract Chinese hard subs from the TV series 他来了请闭眼
Switch branches/tags
Nothing to show
Clone or download
Latest commit fffbe44 Sep 4, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
files_for_blog_post_part_1 Add a README for files_for_blog_post_part_1 May 29, 2017
test_frames Fix 3 mislabeled test files May 29, 2017
.gitignore Initial commit Feb 25, 2017
LICENCE Create LICENCE Sep 4, 2017
README.md Add a README May 29, 2017
main.py Remove commented debug code Jun 5, 2017

README.md

extracting-chinese-subs

This repository contains code to extract Chinese hard subs from the TV series 他来了请闭眼 (Love Me If You Dare). For further information please see this post on my blog.

To get started, install OpenCV, Tesseract, the chi_sim data pack for Tesseract, and PyOCR. The following commands will work on Arch Linux:

sudo pacman -S opencv python-numpy tesseract tesseract-data-chi_sim
sudo pip install pyocr

Then try running ./main.py --test-all to test the extraction algorithm on all test cases. To run it on a video file, you'll need to track down a 1280x720 video of one of the 他来了请闭眼 episodes with white hard subs at the bottom, similar to the test frames.