Many videos on bilibili.com have subs, but they are hardcoded into the video. I thought of grabbing those subtitles, converting them to text and translating the resulting text into english. The resulting tool has rough edges, but succesfully provides trsnslations in most scenarios.
- Screen capture based character recognition
- Screen region select
- Translation using Deepl api and locally with Argos/MarianMT
- Basic UI to select region, show translation
- paddle 2.4.1
- paddleorc >=2
- python >=3.9
- pytorch >2 (for MariantMT)
- Cuda 11.7
- CudNN 8.4
- At least 4GB NVIDIA GPU with cuda support
Requirement setup
git pull https://github.com/pijuskri/bilibili-caption-project.git
pip install -r requirements.txt
To run
python capture.py
variables.py contains the api setting used for translation. argos, deepl and helsinki(MarianMT) are possible options.
Keep in mind that deepl use requires an API token. Set it using env var "deepl_token".
- Make a proper UI
- Find best local translation option that can hopefully match deepl
- Capture video/browser tab directly instead of screen
Spent a lot of time installing everything, as usual for CUDA. Seems you have to copy cudNN install into the cuda directory. Also put zlib into the cuda bin. Below are links that were instrumental in fixing everything