Roadmap #1

qgustavor · 2022-08-06T19:24:35Z

Decide between using Node or switch to a Deno codebase which would allow easier porting to a browser environment
- It will keep as a Node codebase: I'm already using the programmatic API in two Node codebases and I tested the dnt workflow and didn't liked it.
- In the other hand maybe is a good thing porting the code to TypeScript using esbuild or tsc.
- Deno code will be kept since it includes the only available CLI at the moment.
Split API into multiple small functions instead of two huge async functions
- It would allow using the code more flexibly without requiring to use the current structure of each project having its own folder with multiple .json and .ass on those.
- Allow overriding the default fingerprinter options
- Allow the code to be executed outside Node.js (i.e. in Deno and in browsers) by moving I/O operations outside the main logic
- Export a base library that handles everything but I/O and a higher class library for Node?
- Do not call console.log inside the library
Find a better fingerprint storage format
- Use protocol buffers or message pack to compress all those integers and Brotli or Gzip to compress repeated tcodes?
- Move format from [[tcode, hcode], [tcode, hcode], ...] to {[hcode]: [tcodes, tcodes]} to compress repeated hcodes and make fingerprint matching faster?
- Store metadata along the fingerprints? If so, which metadata? In the same file or in a separated file?
- Create a container format to hold subtitles and synchronization data? Use a extension as .douki for it?
Add some info about the section to the generated .ass file, like timings and the original file name
- I was just making a subtitle and the timing of the section was wrong
Create function to merge the synchronized subtitles with existent ones that can handle overlapping with multiple modes
1. Put synchronized subtitle behind existent by layering (which is what MX Player and forked MPV does)
2. Move the new subtitles above the existent one when needed
3. Use separate styles for overlapping cases and delete lines with non-existent styles
  - Example: if a subtitle have three styles translation, translation-overlap and karaoke then lines with translation style would be switched to translation-overlap when those overlaps with the existing subtitles and, since there is no karaoke-overlap style defined, lines with karaoke style would be removed
Most of what is needed, including some of the ideas above, were implemented here: https://github.com/qgustavor/subtitle-tools
Create a function to auto-detect sections based on a existent subtitle
- It would detect which styles are only present in isolated parts of the subtitle
- In order to make easier to take existent subtitles (like when some group release just one or two episodes and then disappears)
Create a CLI
- Like the experimental Deno CLI, two commands generate-sync-data and generate-subtitles
- It should create a directory where it would store its data by default and allow it to be overridden
- Use a prompt based interface if possible
- Allow inputting videos and make the code detect sections instead of requiring audio, timing, subtitles, fonts to be inputted separately for each section
- Allow outputting videos instead of just subtitles and a list of fonts
Create a GUI
- Same functionality as the CLI
- Implemented as a web app that talks with a local Node.js backend (because making a real GUI application with QT or similar is quite hard and I prefer not bundling a browser like Electron does)
Create a completely client-side web app version
- It would use a custom build of ffmpeg.js with pcm_s16le support instead of ffmpeg as a drop-in replacement
- An alternative which would improve performance and reduce build size is using something like ebml-stream or kontainer-js to mux/demux video files and rely on WebAudio for decoding. The issue with WebAudio is because when testing the fingerprinter on Firefox it currently fails with "Connecting AudioNodes from AudioContexts with different sample-rate is currently not supported."
- Try to process videos using streams to improve performance and reduce memory usage
- Allow people hosting that version to allow users to upload and share subtitles with other users
Make a demo in the Web App where users can upload a video and download the same video but with that song subtitled
- Check with a lawyer if it's legal to include subtitles and metadata for other songs in the repository or if some kind of "synchronization license" is needed
- Probably is better using some song under Creative Commons or some other free license (like this one or the ending of this video) - even if it means using an instrumental song - the subtitle might just describe which instruments are being played as means of demonstration

The text was updated successfully, but these errors were encountered:

qgustavor · 2022-11-08T20:44:04Z

About finding a better fingerprint storage format, I did some tests:

I got all 95 files I made using this script.
Then I ran those thought 210 combinations of preprocessing data, encoding and compression functions;
The best option I found was encoding fingerprint data as Uint32 values, then compressing with LZMA, resulting in a 79,44-83,25% compression over the current implementation, averaging at 81,87%.
This format would not be easily extendable. The best combination using MessagePack consists of separating time values from fingerprints, storing all of those in a single array, then encoding as MessagePack and compresing with LZMA. That results in reductions ranging from 75,51% to 81,46%, 79,49% in average.
The average file from the first combination is 6 KB, the average file from the second is 6.7 KB. Looks too big a difference just to allow easy metadata storage.

Here's my proposal:

Start from the current format of [...[tcode, hcode]]
Run .flat(), so [tcode, hcode, tcode, hcode...]
Encode as a Uint32Array
Optionally prefix metadata encoded as MessagePack
Prefix metadata size (which can be zero) encoded as Uint32
Compress using LZMA
Prefix with DOUKI

Decoding follow the following:

Check if data starts with DOUKI, reject if not, then drop those bytes
Decompress using LZMA
Read metadata size as Uint32
Read and decode metadata using MessagePack if it exists
Read tcode and hcode values stored as Uint32

Why caring about adding metadata: because it allow versioning, it allows changing the fingerprinter parameters in case in future someone finds better values than the current ones, it allow adding info about the file used to generate the fingerprints.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap #1

Roadmap #1

qgustavor commented Aug 6, 2022 •

edited

qgustavor commented Nov 8, 2022

Roadmap #1

Roadmap #1

Comments

qgustavor commented Aug 6, 2022 • edited

qgustavor commented Nov 8, 2022

qgustavor commented Aug 6, 2022 •

edited