You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Decide between using Node or switch to a Deno codebase which would allow easier porting to a browser environment
It will keep as a Node codebase: I'm already using the programmatic API in two Node codebases and I tested the dnt workflow and didn't liked it.
In the other hand maybe is a good thing porting the code to TypeScript using esbuild or tsc.
Deno code will be kept since it includes the only available CLI at the moment.
Split API into multiple small functions instead of two huge async functions
It would allow using the code more flexibly without requiring to use the current structure of each project having its own folder with multiple .json and .ass on those.
Allow the code to be executed outside Node.js (i.e. in Deno and in browsers) by moving I/O operations outside the main logic
Export a base library that handles everything but I/O and a higher class library for Node?
Do not call console.log inside the library
Find a better fingerprint storage format
Use protocol buffers or message pack to compress all those integers and Brotli or Gzip to compress repeated tcodes?
Move format from [[tcode, hcode], [tcode, hcode], ...] to {[hcode]: [tcodes, tcodes]} to compress repeated hcodes and make fingerprint matching faster?
Store metadata along the fingerprints? If so, which metadata? In the same file or in a separated file?
Create a container format to hold subtitles and synchronization data? Use a extension as .douki for it?
Add some info about the section to the generated .ass file, like timings and the original file name
I was just making a subtitle and the timing of the section was wrong
Create function to merge the synchronized subtitles with existent ones that can handle overlapping with multiple modes
Put synchronized subtitle behind existent by layering (which is what MX Player and forked MPV does)
Move the new subtitles above the existent one when needed
Use separate styles for overlapping cases and delete lines with non-existent styles
Example: if a subtitle have three styles translation, translation-overlap and karaoke then lines with translation style would be switched to translation-overlap when those overlaps with the existing subtitles and, since there is no karaoke-overlap style defined, lines with karaoke style would be removed
Create a function to auto-detect sections based on a existent subtitle
It would detect which styles are only present in isolated parts of the subtitle
In order to make easier to take existent subtitles (like when some group release just one or two episodes and then disappears)
Create a CLI
Like the experimental Deno CLI, two commands generate-sync-data and generate-subtitles
It should create a directory where it would store its data by default and allow it to be overridden
Use a prompt based interface if possible
Allow inputting videos and make the code detect sections instead of requiring audio, timing, subtitles, fonts to be inputted separately for each section
Allow outputting videos instead of just subtitles and a list of fonts
Create a GUI
Same functionality as the CLI
Implemented as a web app that talks with a local Node.js backend (because making a real GUI application with QT or similar is quite hard and I prefer not bundling a browser like Electron does)
Create a completely client-side web app version
It would use a custom build of ffmpeg.js with pcm_s16le support instead of ffmpeg as a drop-in replacement
An alternative which would improve performance and reduce build size is using something like ebml-stream or kontainer-js to mux/demux video files and rely on WebAudio for decoding. The issue with WebAudio is because when testing the fingerprinter on Firefox it currently fails with "Connecting AudioNodes from AudioContexts with different sample-rate is currently not supported."
Try to process videos using streams to improve performance and reduce memory usage
Allow people hosting that version to allow users to upload and share subtitles with other users
Make a demo in the Web App where users can upload a video and download the same video but with that song subtitled
Check with a lawyer if it's legal to include subtitles and metadata for other songs in the repository or if some kind of "synchronization license" is needed
Probably is better using some song under Creative Commons or some other free license (like this one or the ending of this video) - even if it means using an instrumental song - the subtitle might just describe which instruments are being played as means of demonstration
The text was updated successfully, but these errors were encountered:
About finding a better fingerprint storage format, I did some tests:
I got all 95 files I made using this script.
Then I ran those thought 210 combinations of preprocessing data, encoding and compression functions;
The best option I found was encoding fingerprint data as Uint32 values, then compressing with LZMA, resulting in a 79,44-83,25% compression over the current implementation, averaging at 81,87%.
This format would not be easily extendable. The best combination using MessagePack consists of separating time values from fingerprints, storing all of those in a single array, then encoding as MessagePack and compresing with LZMA. That results in reductions ranging from 75,51% to 81,46%, 79,49% in average.
The average file from the first combination is 6 KB, the average file from the second is 6.7 KB. Looks too big a difference just to allow easy metadata storage.
Here's my proposal:
Start from the current format of [...[tcode, hcode]]
Run .flat(), so [tcode, hcode, tcode, hcode...]
Encode as a Uint32Array
Optionally prefix metadata encoded as MessagePack
Prefix metadata size (which can be zero) encoded as Uint32
Compress using LZMA
Prefix with DOUKI
Decoding follow the following:
Check if data starts with DOUKI, reject if not, then drop those bytes
Decompress using LZMA
Read metadata size as Uint32
Read and decode metadata using MessagePack if it exists
Read tcode and hcode values stored as Uint32
Why caring about adding metadata: because it allow versioning, it allows changing the fingerprinter parameters in case in future someone finds better values than the current ones, it allow adding info about the file used to generate the fingerprints.
Decide between using Node or switch to a Deno codebase which would allow easier porting to a browser environment
Split API into multiple small functions instead of two huge async functions
console.log
inside the libraryFind a better fingerprint storage format
[[tcode, hcode], [tcode, hcode], ...]
to{[hcode]: [tcodes, tcodes]}
to compress repeated hcodes and make fingerprint matching faster?.douki
for it?Add some info about the section to the generated .ass file, like timings and the original file name
Create function to merge the synchronized subtitles with existent ones that can handle overlapping with multiple modes
translation
,translation-overlap
andkaraoke
then lines withtranslation
style would be switched totranslation-overlap
when those overlaps with the existing subtitles and, since there is nokaraoke-overlap
style defined, lines withkaraoke
style would be removedMost of what is needed, including some of the ideas above, were implemented here: https://github.com/qgustavor/subtitle-tools
Create a function to auto-detect sections based on a existent subtitle
Create a CLI
Create a GUI
Create a completely client-side web app version
Make a demo in the Web App where users can upload a video and download the same video but with that song subtitled
The text was updated successfully, but these errors were encountered: