Open Source auto-bleeper for podcasts and songs #381
Labels
AI/ML
Artificial Intelligence and Machine Learning. Including, but not limited to, creating Skynet.
APIs/Backend
Like getting feature requests from the frontend team? Look no further!
Extension/Plugin/Add-on
Extend a product you enjoy, and make it even better!
Intermediate
Projects that require a medium level of understanding. Doesn't require much prior knowledge.
Mobile app
Ideas that will result in a mobile application.
Much work
This project takes little time to complete. (ETA several weeks+)
Web app
Applications on the web. Perhaps with React? Or Vue? Or Angular?
Project description
Right now my kids are at the age where they start repeating anything they hear, including swear words. Since there are already tons of services for transcribing (relatively accurately) audio into text, I thought it might be useful to use various speech to text services and popular media player APIs to automatically "bleep" out swear words for a given audio stream or file.
Relevant Technology
There are tons of different Speech to Text services that (with a subscription or sometimes per-audio-file fee) can generate a transcript of an audio file and return a JSON file or TXT file with timestamps and the transcribed words.
For "bleeping" the naughty words, we could take two different approaches. If the media player streaming the audio supports downloading the audio file and storing locally, we could use an audio signal processor like FFmpeg to edit the audio file directly and replace the audio of the naughty words with generated signals at the exact timestamps, effectively "bleeping" out the words.
I think it would take more work as far as piping in audio to a Speech to Text service for media streamers like Spotify, but for those media players that don't support downloading the file locally (or like Spotify they encrypt local downloads), We can take a different approach. If they have an accessible API for controlling playback volume (for this example, Spotify does), you could programmatically make calls to the media streamer's API to get the current playback volume, update the playback volume to zero for the duration of the bleeped word, and then reset the playback volume to the original value after the duration of the swear word has passed.
Complexity and required time
Complexity
Required time (ETA)
I think the initial idea with one media player and one speech to text service wouldn't take a large amount of time. Ideally I think this project would work best if the website/extension/web app supported multiple types of media players and multiple speech to text services, which would take a considerable amount of time.
Categories
The text was updated successfully, but these errors were encountered: