Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create a transcription output #106

Open
flavioribeiro opened this issue Aug 3, 2016 · 5 comments
Open

create a transcription output #106

flavioribeiro opened this issue Aug 3, 2016 · 5 comments
Labels

Comments

@flavioribeiro
Copy link
Member

@flavioribeiro flavioribeiro commented Aug 3, 2016

Recently I played with @google's speech API and it seems they have a pretty accurate speech-to-text feature. I tested by extracting the audio of some @nytimes videos using

ffmpeg -i source -c:a flac -ac 1 -sample_fmt s16 destination.flac

and sent to the speech api. I got ~90% of accuracy.

It would be a blast if we had this transcription generation as a feature of snickers.

@flavioribeiro flavioribeiro changed the title create a subtitles output create a transcription output Oct 25, 2016
@peterbe
Copy link

@peterbe peterbe commented Nov 11, 2016

I did similar experiments with IBM Bluemix where I took some videos from recorded presentations (not videos with high quality narration). The results were abysmal. It only got the basic English words right like stop words and some other simpler words. All the words that mattered it got wrong.

My intention was to make it possible to full-text search for videos based on what was said in them but the stop-words are ignored by the search engine anyway so I gave up.

Can you elaborate a bit on that "90%" number and the nature & quality of the audio?

@flavioribeiro
Copy link
Member Author

@flavioribeiro flavioribeiro commented Nov 17, 2016

hey @peterbe so I did some tests with some nytimes videos, including some with accents, and the results were really good.

Example: http://flv.io/41857_1_02sa-elections_wg_360p.mp4

{
    "results": [{
        "alternatives": [{
            "confidence": 0.84931809,
            "transcript": "NC is only an obstacle we have to move them out of the way so we can fight the number one present yet South Africa which is drug test"
        }]
    }, {
        "alternatives": [{
            "confidence": 0.84143984,
            "transcript": " why does ANC might be experiencing its huge internal weaknesses the institutionalization of this party and its infrastructure and resources still has death"
        }]
    }]
}
@peterbe
Copy link

@peterbe peterbe commented Nov 17, 2016

Hmm... I'm impressed but not impressed :)
For example the word "ANC" is a very important key word that it gets wrong.
Also, the transcript made it "still has death" when he said "still has depth" which can be a problem due to the "strength" of the word "death".

Your results with Google's Speech API is certainly better than mine from IBM Bluemix but I'm still unsure this transcript is good enough to put in front of users.

What my plan was was to use the automated transcript for my search engine "Find videos by words uttered" (to extend beyond searching metadata text) but people are more likely to type in "ANC" rather than "the number one".

Having said that I'm going to go back and re-investigate Google as an option for my videos with really clear and crisp sound.

Perhaps an output of this is not to really automate it but to guide and document how you'd go ahead and do it if interested. You know, to avoid snickers being too tightly bundled to vendors like Google.

@flavioribeiro
Copy link
Member Author

@flavioribeiro flavioribeiro commented Nov 17, 2016

yes @peterbe you are right. We don't have plans to automatically generate subtitles with this but to add in the metadata of the videos to help on the search engine and personalization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.