Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap #1

Open
2 of 10 tasks
qgustavor opened this issue Aug 6, 2022 · 1 comment
Open
2 of 10 tasks

Roadmap #1

qgustavor opened this issue Aug 6, 2022 · 1 comment

Comments

@qgustavor
Copy link
Owner

qgustavor commented Aug 6, 2022

  • Decide between using Node or switch to a Deno codebase which would allow easier porting to a browser environment

    • It will keep as a Node codebase: I'm already using the programmatic API in two Node codebases and I tested the dnt workflow and didn't liked it.
    • In the other hand maybe is a good thing porting the code to TypeScript using esbuild or tsc.
    • Deno code will be kept since it includes the only available CLI at the moment.
  • Split API into multiple small functions instead of two huge async functions

    • It would allow using the code more flexibly without requiring to use the current structure of each project having its own folder with multiple .json and .ass on those.
    • Allow overriding the default fingerprinter options
    • Allow the code to be executed outside Node.js (i.e. in Deno and in browsers) by moving I/O operations outside the main logic
    • Export a base library that handles everything but I/O and a higher class library for Node?
    • Do not call console.log inside the library
  • Find a better fingerprint storage format

    • Use protocol buffers or message pack to compress all those integers and Brotli or Gzip to compress repeated tcodes?
    • Move format from [[tcode, hcode], [tcode, hcode], ...] to {[hcode]: [tcodes, tcodes]} to compress repeated hcodes and make fingerprint matching faster?
    • Store metadata along the fingerprints? If so, which metadata? In the same file or in a separated file?
    • Create a container format to hold subtitles and synchronization data? Use a extension as .douki for it?
  • Add some info about the section to the generated .ass file, like timings and the original file name

    • I was just making a subtitle and the timing of the section was wrong
  • Create function to merge the synchronized subtitles with existent ones that can handle overlapping with multiple modes

    1. Put synchronized subtitle behind existent by layering (which is what MX Player and forked MPV does)
    2. Move the new subtitles above the existent one when needed
    3. Use separate styles for overlapping cases and delete lines with non-existent styles
      • Example: if a subtitle have three styles translation, translation-overlap and karaoke then lines with translation style would be switched to translation-overlap when those overlaps with the existing subtitles and, since there is no karaoke-overlap style defined, lines with karaoke style would be removed

    Most of what is needed, including some of the ideas above, were implemented here: https://github.com/qgustavor/subtitle-tools

  • Create a function to auto-detect sections based on a existent subtitle

    • It would detect which styles are only present in isolated parts of the subtitle
    • In order to make easier to take existent subtitles (like when some group release just one or two episodes and then disappears)
  • Create a CLI

    • Like the experimental Deno CLI, two commands generate-sync-data and generate-subtitles
    • It should create a directory where it would store its data by default and allow it to be overridden
    • Use a prompt based interface if possible
    • Allow inputting videos and make the code detect sections instead of requiring audio, timing, subtitles, fonts to be inputted separately for each section
    • Allow outputting videos instead of just subtitles and a list of fonts
  • Create a GUI

    • Same functionality as the CLI
    • Implemented as a web app that talks with a local Node.js backend (because making a real GUI application with QT or similar is quite hard and I prefer not bundling a browser like Electron does)
  • Create a completely client-side web app version

    • It would use a custom build of ffmpeg.js with pcm_s16le support instead of ffmpeg as a drop-in replacement
    • An alternative which would improve performance and reduce build size is using something like ebml-stream or kontainer-js to mux/demux video files and rely on WebAudio for decoding. The issue with WebAudio is because when testing the fingerprinter on Firefox it currently fails with "Connecting AudioNodes from AudioContexts with different sample-rate is currently not supported."
    • Try to process videos using streams to improve performance and reduce memory usage
    • Allow people hosting that version to allow users to upload and share subtitles with other users
  • Make a demo in the Web App where users can upload a video and download the same video but with that song subtitled

    • Check with a lawyer if it's legal to include subtitles and metadata for other songs in the repository or if some kind of "synchronization license" is needed
    • Probably is better using some song under Creative Commons or some other free license (like this one or the ending of this video) - even if it means using an instrumental song - the subtitle might just describe which instruments are being played as means of demonstration
@qgustavor
Copy link
Owner Author

About finding a better fingerprint storage format, I did some tests:

  • I got all 95 files I made using this script.
  • Then I ran those thought 210 combinations of preprocessing data, encoding and compression functions;
  • The best option I found was encoding fingerprint data as Uint32 values, then compressing with LZMA, resulting in a 79,44-83,25% compression over the current implementation, averaging at 81,87%.
  • This format would not be easily extendable. The best combination using MessagePack consists of separating time values from fingerprints, storing all of those in a single array, then encoding as MessagePack and compresing with LZMA. That results in reductions ranging from 75,51% to 81,46%, 79,49% in average.
  • The average file from the first combination is 6 KB, the average file from the second is 6.7 KB. Looks too big a difference just to allow easy metadata storage.

Here's my proposal:

  1. Start from the current format of [...[tcode, hcode]]
  2. Run .flat(), so [tcode, hcode, tcode, hcode...]
  3. Encode as a Uint32Array
  4. Optionally prefix metadata encoded as MessagePack
  5. Prefix metadata size (which can be zero) encoded as Uint32
  6. Compress using LZMA
  7. Prefix with DOUKI

Decoding follow the following:

  1. Check if data starts with DOUKI, reject if not, then drop those bytes
  2. Decompress using LZMA
  3. Read metadata size as Uint32
  4. Read and decode metadata using MessagePack if it exists
  5. Read tcode and hcode values stored as Uint32

Why caring about adding metadata: because it allow versioning, it allows changing the fingerprinter parameters in case in future someone finds better values than the current ones, it allow adding info about the file used to generate the fingerprints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant