Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option for caching result of transcription pass #115

Open
JanX2 opened this issue Dec 4, 2016 · 2 comments
Open

Option for caching result of transcription pass #115

JanX2 opened this issue Dec 4, 2016 · 2 comments

Comments

@JanX2
Copy link

JanX2 commented Dec 4, 2016

I currently use Gentle to align existing transcripts to audio. Specifically, I want to align LibriVox recordings to Project Gutenberg source.

The main issue I am facing is, that there occasionally are differences between the versions. This can go as far as a paragraph or other segment of the text missing from the recording. What I will do is edit the transcript to match the recording and rerun Gentle. This takes a lot of CPU and wall clock time.

It would make things a great deal easier for these kinds of scenarios, if the result of the audio transcription pass could be cached. The audio doesn’t change.

One way of achieving this would be to add the raw Kaldi audio transcript to the ZIP/output in serialized form. This way, it can optionally be supplied together with the audio by the user.

Edit: I just realized, that the results are already cached in ~/.gentle/webdata. What about hashing the file name, checking that against the cache. For a hit, hash the audio data and check, if there is an entry for the current version of the language model or Gentle. Use that, if available.

@strob
Copy link
Contributor

strob commented Dec 5, 2016

Hi! Thanks for writing. Sounds like an interesting project you're working on.

There's been some interest (#81, #99) in an API that exposes partial alignment (ie. not only caching the audio file, but re-running only certain time-regions). The code in gentle/multipass.py shows what the basic approach would look like.

In my experience, the upload/encoding of the audio file takes negligible time compared to the alignment, so I don't think you'll get a big speed boost unless/until we implement a partial alignment API. I would gladly support changes to Gentle's API so that it can be used for a "transcription correction" interface.

@natelawrence
Copy link

With apologies for resurrecting this thread, the redundant storage of the transcoded audio when iterating on a transcript is a vexing issue in terms of storage when working on long audio files.


I would very much like to see the concept of a library of media-files/transcription-projects added to Gentle so that one stores the transcoded media once and then a history of alignments can be associated to each piece of media.


Diverging even further from the topic of this thread would be to add the concept of a chapters/playlists/series to Gentle, such that each chapter in a book, episode in a podcast, song in an album, etc. could be ordered appropriately and linked together such that when one piece finishes playing, the next automatically begins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants