Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing accuracy on Local Build #88

Open
saxenauts opened this issue Jul 29, 2016 · 4 comments
Open

Increasing accuracy on Local Build #88

saxenauts opened this issue Jul 29, 2016 · 4 comments

Comments

@saxenauts
Copy link

saxenauts commented Jul 29, 2016

It is reasonable that on my local machine (4GB RAM) the accuracy of alignment is somewhat jittered. There's an offset of 0.02 - 0.04 seconds between the gentle server and my local build.
Compare the CSV generated on gentle server with the CSV generated on my local build.

An example with 20 - 160 ms offset.

Gentle : because 37.74 37.88
Local : because 37.72 37.88

Another example with an offset of 2 seconds

Gentle : they're 22.26 22.46
Local : they're 20.66 20.88

I am sorry, for the naive requests that follow, I just started exploring Kaldi as a tool. I have no prior experience with ASR systems.

What can I do to increase the accuracy on my local build?

Now, I need these timestamps to do a research project. Specifically, I need to segment the audio on the basis of word boundaries. And gentle was the best available tool, from a developer's perspective. As I am not even a beginner in ASR and other such tools.

I believe that if I hire an amazon instance, this will not be a problem. But they are quite expensive.
Also, can anyone direct me if there is any other language model that might work better for English?
Meanwhile I will dive into the code, to understand it better.

Thanks

@strob
Copy link
Contributor

strob commented Jul 29, 2016

Are you running the code locally from source, or from a DMG release? Mac or
Linux?
On Fri, Jul 29, 2016 at 10:29 AM Utkarsh Saxena notifications@github.com
wrote:

It is reasonable that on my local machine (4GB RAM) the accuracy of
alignment is somewhat jittered. There's an offset of 0.02 - 0.04 seconds
between the gentle server and my local build.
Compare the CSV generated on gentle server
https://www.dropbox.com/s/03j3mpiiuaij5nq/align_gentle.csv?dl=0 with
the CSV generated on my local build
https://www.dropbox.com/s/h1o66p79lgnl8ru/align_local.csv?dl=0.

An example with 20 - 40 ms offset.

Gentle : because 37.74 37.88
Local : because 37.72 37.88

Another example with an offset of 2 seconds

Gentle : they're 22.26 22.46
Local : they're 20.66 20.88

I am sorry, for the naive requests that follow, I just started exploring
Kaldi as a tool. I have no prior experience with ASR systems.

Now, I need these timestamps to do a research project. Specifically, I
need to segment the audio on the basis of word boundaries. And gentle was
the best available tool, from a developer's perspective. As I am not even a
beginner in ASR and other such tools.

I believe that if I hire an amazon instance, this will not be a problem.
But they are quite expensive.
Also, can anyone direct me if there is any other language model that might
work better for English?

Meanwhile I will dive into the code, to understand it better.

Thanks


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#88, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMup4leRZBWuz8jVawE1n-Wy1N2OX6Sks5qabnrgaJpZM4JX-wg
.

@saxenauts
Copy link
Author

Linux. Yes, Locally from Source.

@saxenauts
Copy link
Author

@strob : ping!

I tried to understand the codebase of gentle, and a lot of things make sense now. But still, the quality of locally done alignments don't match up to the alignments done on gentle server. I still don't know if building kaldi with cuda enabled will help. Can you guide me on this?
My primary objective is to create a pronunciation database for words. Is there anything else that can be done? Probably a four gram, or a five gram model?

@strob
Copy link
Contributor

strob commented Aug 15, 2016

There's no reason I can think of that alignment would be more accurate on
the server. Please make sure you're using the latest version of Gentle and
have compiled all other dependencies as instructed.

On Sun, Aug 14, 2016, 5:30 PM Utkarsh Saxena notifications@github.com
wrote:

@strob https://github.com/strob : ping!

I tried to understand the codebase of gentle, and a lot of things make
sense now. But still, the quality of locally done alignments don't match up
to the alignments done on gentle server. I still don't know if building
kaldi with cuda enabled will help. Can you guide me on this?
My primary objective is to create a pronunciation database for words. Is
there anything else that can be done? Probably a four gram, or a five gram
model?


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#88 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMup1E4HNHk70yxsLw9afJHB4IhXGBzks5qfzSggaJpZM4JX-wg
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants