Increasing accuracy on Local Build #88

saxenauts · 2016-07-29T08:29:31Z

It is reasonable that on my local machine (4GB RAM) the accuracy of alignment is somewhat jittered. There's an offset of 0.02 - 0.04 seconds between the gentle server and my local build.
Compare the CSV generated on gentle server with the CSV generated on my local build.

An example with 20 - 160 ms offset.

Gentle : because 37.74 37.88
Local : because 37.72 37.88

Another example with an offset of 2 seconds

Gentle : they're 22.26 22.46
Local : they're 20.66 20.88

I am sorry, for the naive requests that follow, I just started exploring Kaldi as a tool. I have no prior experience with ASR systems.

What can I do to increase the accuracy on my local build?

Now, I need these timestamps to do a research project. Specifically, I need to segment the audio on the basis of word boundaries. And gentle was the best available tool, from a developer's perspective. As I am not even a beginner in ASR and other such tools.

I believe that if I hire an amazon instance, this will not be a problem. But they are quite expensive.
Also, can anyone direct me if there is any other language model that might work better for English?
Meanwhile I will dive into the code, to understand it better.

Thanks

strob · 2016-07-29T08:33:19Z

Are you running the code locally from source, or from a DMG release? Mac or
Linux?
On Fri, Jul 29, 2016 at 10:29 AM Utkarsh Saxena notifications@github.com
wrote:

It is reasonable that on my local machine (4GB RAM) the accuracy of
alignment is somewhat jittered. There's an offset of 0.02 - 0.04 seconds
between the gentle server and my local build.
Compare the CSV generated on gentle server
https://www.dropbox.com/s/03j3mpiiuaij5nq/align_gentle.csv?dl=0 with
the CSV generated on my local build
https://www.dropbox.com/s/h1o66p79lgnl8ru/align_local.csv?dl=0.

An example with 20 - 40 ms offset.

Gentle : because 37.74 37.88
Local : because 37.72 37.88

Another example with an offset of 2 seconds

Gentle : they're 22.26 22.46
Local : they're 20.66 20.88

I am sorry, for the naive requests that follow, I just started exploring
Kaldi as a tool. I have no prior experience with ASR systems.

Now, I need these timestamps to do a research project. Specifically, I
need to segment the audio on the basis of word boundaries. And gentle was
the best available tool, from a developer's perspective. As I am not even a
beginner in ASR and other such tools.

I believe that if I hire an amazon instance, this will not be a problem.
But they are quite expensive.
Also, can anyone direct me if there is any other language model that might
work better for English?

Meanwhile I will dive into the code, to understand it better.

Thanks

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#88, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMup4leRZBWuz8jVawE1n-Wy1N2OX6Sks5qabnrgaJpZM4JX-wg
.

saxenauts · 2016-07-29T08:35:21Z

Linux. Yes, Locally from Source.

saxenauts · 2016-08-14T15:30:40Z

@strob : ping!

I tried to understand the codebase of gentle, and a lot of things make sense now. But still, the quality of locally done alignments don't match up to the alignments done on gentle server. I still don't know if building kaldi with cuda enabled will help. Can you guide me on this?
My primary objective is to create a pronunciation database for words. Is there anything else that can be done? Probably a four gram, or a five gram model?

strob · 2016-08-15T12:53:14Z

There's no reason I can think of that alignment would be more accurate on
the server. Please make sure you're using the latest version of Gentle and
have compiled all other dependencies as instructed.

On Sun, Aug 14, 2016, 5:30 PM Utkarsh Saxena notifications@github.com
wrote:

@strob https://github.com/strob : ping!

I tried to understand the codebase of gentle, and a lot of things make
sense now. But still, the quality of locally done alignments don't match up
to the alignments done on gentle server. I still don't know if building
kaldi with cuda enabled will help. Can you guide me on this?
My primary objective is to create a pronunciation database for words. Is
there anything else that can be done? Probably a four gram, or a five gram
model?

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#88 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMup1E4HNHk70yxsLw9afJHB4IhXGBzks5qfzSggaJpZM4JX-wg
.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increasing accuracy on Local Build #88

Increasing accuracy on Local Build #88

saxenauts commented Jul 29, 2016 •

edited

Loading

strob commented Jul 29, 2016

saxenauts commented Jul 29, 2016

saxenauts commented Aug 14, 2016

strob commented Aug 15, 2016

Increasing accuracy on Local Build #88

Increasing accuracy on Local Build #88

Comments

saxenauts commented Jul 29, 2016 • edited Loading

strob commented Jul 29, 2016

saxenauts commented Jul 29, 2016

saxenauts commented Aug 14, 2016

strob commented Aug 15, 2016

saxenauts commented Jul 29, 2016 •

edited

Loading