High-speed subtitle synchronization tool
GSoC 2019 | CCExtractor Development
co-oCCur is a high-speed subtitle synchronization tool.
It is being developed under GSoC 2019 with CCExtractor development.
It consists of two tools:
Tool A Use case: Synchronization of subtitles between two versions (for example, with and without commercials) of the same audiovisual content.
It will take as input the original audiovisual content, the edited audiovisual content and the subtitles document of the original audiovisual content.
Use case: Synchronization of subtitles between two versions of the same audiovisual content in the absence of the
It will take as input the modified audiovisual content and the subtitle document for the original audiovisual content.
This project is in it's early stage and is taking baby steps towards the end goal. Available functionality of the project is going to refactor over time.
- Clone the repository from Github:
git clone https://github.com/sypai/co-oCCur
- Navigate to
./co_oCCur -tool [tool options] <tool specific arguments>
The parameters to be passed to co-oCCur.
[NOTE: This list might change in future]
||NAME A OR B||Select the tool to be used for subtitle synchronization. REQUIREMENT: YES|
||FILE /path/to/original/audio.wav||Original Audio File REQUIREMENT: TOOL A, YES TOOL B, NO|
||FILE /path/to/modified/audio.wav||Modified Audio File REQUIREMENT: YES|
||FILE /path/to/original/subtitle.srt||Original subtitle file REQUIREMENT: YES|
[Restriction: Audio files must be PCM mono sampled at 16000 Hz]
CMake minimum version 3.14 is required.
Building the blocks
build.sh can result in:
bash: ./build.sh: Permission denied
- Give it execute permission (only possible if the file-system gives RW rights)
cd co-oCCur/install chmod +x build.sh ./build.sh
- Use CMake to build it
# Root Directory cmake ./ make
- Audio Files
Make sure the audio is uncompressed raw PCM (16-bit signed int), mono sampled at 16000 Hz (Enough to cover human speech frequency range).
Using ffmpeg you can run:
ffmpeg -i inputVideo.ts -acodec pcm_s16le -ac 1 -ar 16000 audioName.wav
- Subtitle Files
The input subtitle file should be a clean and proper SubRip (SRT) file.
./co_oCCur -t A -o ./install/TestFiles/WavAudio/example.wav -m ./install/TestFiles/WavAudio/example1.wav -s ./install/TestFiles/Subtitles/example.srt
What will this trigger?
- Tool A to be used for synchronization.
- Read "example.wav" as original audio and extract audio fingerprints from it.
- Enrich the "example.srt" file with audio fingerprint anchors at corresponding timestamps.
- Read "example1.wav" as modified audio file. Seek fingerprints at offsets decided by enriched subtitle file, the timestamps of fingerprint anchors.
- Compare the two fingerprints and detect the constant temporal offset.
- Adjust "example.srt" using delta obtained and created a subtitle file "example_co_oCCur.srt".
GNU General Public License 3.0 (GPL-v3.0)
You may reach CCExtractor community through the slack channel where most CCExtractor developers hang out.
- CCExtractor channel on Slack
We foster a welcoming and respectful community.
Any contribution to the project would be highly appreciated!