Transcription missing end of the file #19

agonza1 · 2020-10-21T23:35:16Z

First, I would like to thank you for setting this module. It works great!
I was just having some issues with voice transcriptions at the end of the file. So I am streaming from a local file, very similar to what you did here .
After using:

const transcribeStream = client
    .createStreamingClient()

And

fs.createReadStream(audioFile).pipe(new throttle(sampleRate)).pipe(transcribeStream);

I can get a transcript but it is always missing the end part of the transcription, (e.g: I say: "hello world", transcription returns: "hello" and ends). I just uploaded an example wav audio here: example_audio.zip. This file gets transcribed fine when using batch transcriptions instead of aws-transcribe.

The text was updated successfully, but these errors were encountered:

qasim9872 · 2020-10-26T10:16:37Z

Hi @agonza1

Can you view this issue for details? we had a similar issue before and the way it was fixed is mentioned in it. You can find it here

Aung-Myint-Thein · 2020-10-29T11:40:55Z

Hello.. I used the example_audio.zip and I get only "hello" too.

I am trying to transcribe a file with this library too. The file I am using is following.
example_2.zip

I am not getting any returns for my file thou. Would you mind to take a look at it and suggest if I am missing anything? Followings are the settings I used.

const sampleRate = 8000; 
...
.createStreamingClient({
        region: "ap-southeast-2",
        sampleRate,
        languageCode: "en-US",
})

fs.createReadStream(path.join(__dirname, 'file_name.wav')).pipe(new Throttle(16000)).pipe(transcribeStream);

By the way, I found a more elegant way to end the streaming of the file. I will comment in the library.

agonza1 · 2020-10-29T13:59:35Z

Hi @agonza1

Can you view this issue for details? we had a similar issue before and the way it was fixed is mentioned in it. You can find it here

I found a small hack that seems to solve the issue, it is similar to what you did in the issue you linked. I just concatenated 1s silence at the end of file. I used something like:

await childProcessPromise.spawn(
          '/opt/bin/ffmpeg',
          ['-loglevel','error','-i', inputTempFileName, '-vn','-ac', '1','-filter_complex','aevalsrc=0:d=1[silence];[0:a][silence]concat=n=2:v=0:a=1[out]','-map','[out]',outputTempFileName],
          {env: process.env}
        );

I believe the issue could come from the real time transcribe API. If the audio suddenly ends just after a word, the last sentence being transcribed is never returned.

Aung-Myint-Thein · 2020-10-29T14:45:08Z

Hi @agonza1 , what is the sample rate that you are using? Is it the sample rate from the file? Mind to explain why you choose sample rate for throttle but the example used 2 x sample rate.. I am still getting empty results or bit and pieces of wrong transcribe..

agonza1 · 2020-10-30T20:33:54Z

Hi @agonza1 , what is the sample rate that you are using? Is it the sample rate from the file? Mind to explain why you choose sample rate for throttle but the example used 2 x sample rate.. I am still getting empty results or bit and pieces of wrong transcribe..

In my case I had a sample rate of 16khz so I didn't need to put the 2 in front. I believe 16khz is the max supported so that's why you need to throttle. If you are not getting any transcriptions from your files I would verify the file is audio only and with the right codecs, etc. You could use something like:

        const outputAudio = await childProcessPromise.spawn(
          '/opt/bin/ffprobe',
          ['-i', fileName, '-show_streams', '-select_streams', 'a', '-of', 'json', '-loglevel', 'error'],
          {env: process.env}
        )

to check it out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcription missing end of the file #19

Transcription missing end of the file #19

agonza1 commented Oct 21, 2020 •

edited

Loading

qasim9872 commented Oct 26, 2020

Aung-Myint-Thein commented Oct 29, 2020

agonza1 commented Oct 29, 2020

Aung-Myint-Thein commented Oct 29, 2020

agonza1 commented Oct 30, 2020

Transcription missing end of the file #19

Transcription missing end of the file #19

Comments

agonza1 commented Oct 21, 2020 • edited Loading

qasim9872 commented Oct 26, 2020

Aung-Myint-Thein commented Oct 29, 2020

agonza1 commented Oct 29, 2020

Aung-Myint-Thein commented Oct 29, 2020

agonza1 commented Oct 30, 2020

agonza1 commented Oct 21, 2020 •

edited

Loading