New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create a transcription output #106

Open
flavioribeiro opened this Issue Aug 3, 2016 · 5 comments

Comments

Projects
None yet
2 participants
@flavioribeiro
Member

flavioribeiro commented Aug 3, 2016

Recently I played with @google's speech API and it seems they have a pretty accurate speech-to-text feature. I tested by extracting the audio of some @NYTimes videos using

ffmpeg -i source -c:a flac -ac 1 -sample_fmt s16 destination.flac

and sent to the speech api. I got ~90% of accuracy.

It would be a blast if we had this transcription generation as a feature of snickers.

@flavioribeiro flavioribeiro changed the title from create a subtitles output to create a transcription output Oct 25, 2016

@peterbe

This comment has been minimized.

Show comment
Hide comment
@peterbe

peterbe Nov 11, 2016

I did similar experiments with IBM Bluemix where I took some videos from recorded presentations (not videos with high quality narration). The results were abysmal. It only got the basic English words right like stop words and some other simpler words. All the words that mattered it got wrong.

My intention was to make it possible to full-text search for videos based on what was said in them but the stop-words are ignored by the search engine anyway so I gave up.

Can you elaborate a bit on that "90%" number and the nature & quality of the audio?

peterbe commented Nov 11, 2016

I did similar experiments with IBM Bluemix where I took some videos from recorded presentations (not videos with high quality narration). The results were abysmal. It only got the basic English words right like stop words and some other simpler words. All the words that mattered it got wrong.

My intention was to make it possible to full-text search for videos based on what was said in them but the stop-words are ignored by the search engine anyway so I gave up.

Can you elaborate a bit on that "90%" number and the nature & quality of the audio?

@flavioribeiro

This comment has been minimized.

Show comment
Hide comment
@flavioribeiro

flavioribeiro Nov 17, 2016

Member

hey @peterbe so I did some tests with some nytimes videos, including some with accents, and the results were really good.

Example: http://flv.io/41857_1_02sa-elections_wg_360p.mp4

{
    "results": [{
        "alternatives": [{
            "confidence": 0.84931809,
            "transcript": "NC is only an obstacle we have to move them out of the way so we can fight the number one present yet South Africa which is drug test"
        }]
    }, {
        "alternatives": [{
            "confidence": 0.84143984,
            "transcript": " why does ANC might be experiencing its huge internal weaknesses the institutionalization of this party and its infrastructure and resources still has death"
        }]
    }]
}
Member

flavioribeiro commented Nov 17, 2016

hey @peterbe so I did some tests with some nytimes videos, including some with accents, and the results were really good.

Example: http://flv.io/41857_1_02sa-elections_wg_360p.mp4

{
    "results": [{
        "alternatives": [{
            "confidence": 0.84931809,
            "transcript": "NC is only an obstacle we have to move them out of the way so we can fight the number one present yet South Africa which is drug test"
        }]
    }, {
        "alternatives": [{
            "confidence": 0.84143984,
            "transcript": " why does ANC might be experiencing its huge internal weaknesses the institutionalization of this party and its infrastructure and resources still has death"
        }]
    }]
}
@peterbe

This comment has been minimized.

Show comment
Hide comment
@peterbe

peterbe Nov 17, 2016

Hmm... I'm impressed but not impressed :)
For example the word "ANC" is a very important key word that it gets wrong.
Also, the transcript made it "still has death" when he said "still has depth" which can be a problem due to the "strength" of the word "death".

Your results with Google's Speech API is certainly better than mine from IBM Bluemix but I'm still unsure this transcript is good enough to put in front of users.

What my plan was was to use the automated transcript for my search engine "Find videos by words uttered" (to extend beyond searching metadata text) but people are more likely to type in "ANC" rather than "the number one".

Having said that I'm going to go back and re-investigate Google as an option for my videos with really clear and crisp sound.

Perhaps an output of this is not to really automate it but to guide and document how you'd go ahead and do it if interested. You know, to avoid snickers being too tightly bundled to vendors like Google.

peterbe commented Nov 17, 2016

Hmm... I'm impressed but not impressed :)
For example the word "ANC" is a very important key word that it gets wrong.
Also, the transcript made it "still has death" when he said "still has depth" which can be a problem due to the "strength" of the word "death".

Your results with Google's Speech API is certainly better than mine from IBM Bluemix but I'm still unsure this transcript is good enough to put in front of users.

What my plan was was to use the automated transcript for my search engine "Find videos by words uttered" (to extend beyond searching metadata text) but people are more likely to type in "ANC" rather than "the number one".

Having said that I'm going to go back and re-investigate Google as an option for my videos with really clear and crisp sound.

Perhaps an output of this is not to really automate it but to guide and document how you'd go ahead and do it if interested. You know, to avoid snickers being too tightly bundled to vendors like Google.

@flavioribeiro

This comment has been minimized.

Show comment
Hide comment
@flavioribeiro

flavioribeiro Nov 17, 2016

Member

yes @peterbe you are right. We don't have plans to automatically generate subtitles with this but to add in the metadata of the videos to help on the search engine and personalization.

Member

flavioribeiro commented Nov 17, 2016

yes @peterbe you are right. We don't have plans to automatically generate subtitles with this but to add in the metadata of the videos to help on the search engine and personalization.

@flavioribeiro

This comment has been minimized.

Show comment
Hide comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment