-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xvector extractor cache config #2290
Conversation
Travis build failed with a strange error
I feel like, if possible, just re-running it would solve the issue. If I should re-create this PR to trigger travis, please let me know |
I have restarted the build.
y.
…On Thu, Mar 22, 2018 at 1:12 PM, Arseniy Gorin ***@***.***> wrote:
Travis build failed with a strange error
./matrix/libkaldi-matrix.so: file not recognized: File truncated
I feel like, if possible, just re-running it would solve the issue. If I
should re-create this PR to trigger travis, please let me know
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2290 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AKisX3xGaiP5fDVLG32lwhfeHDNg35Q6ks5tg-mcgaJpZM4SvHvC>
.
|
the build isn't necessary when code is not changed. it's a travis issue. I'll merge. |
I see that the default in the code is 64. Perhaps it would make sense to use a larger value as the default in the script, assuming this makes the code run faster? |
64 is the setting we use now by default. |
Hm, that would be quite a bit of I/O for each job, but I guess it's OK if
it gives a speedup in practice. If you just set the limit to a large value
it would work for any setup, but IMO in the longer term the right fix is to
change the setup so that a relatively small number of segment sizes.
@david-ryan-snyder, I don't know if there is progress on that front?
…On Thu, Mar 22, 2018 at 2:51 PM, Arseniy Gorin ***@***.***> wrote:
64 is the setting we use now by default.
I usually use 475 for chunk-size=500 to make sure computer is compiled
only once for a given segment size. For SRE the right value would be 10000,
but I'm not sure about the memory consumption then.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2290 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu6s3oAYXyva_Iy1CbztXM9UO6yPEks5tg_KbgaJpZM4SvHvC>
.
|
@danpovey, those issues are both important but actually orthogonal. @LvHang PR speeds up training. @gorinars PR only affects the x-vector system after it's been trained. To answer your question... I still haven't made progress on it, but I haven't forgotten about it either. BTW, @gorinars, feel free to add your name to the author list on those files, so that your contribution is more visibly credited. |
It occurred to me I might've misunderstood what you meant, @danpovey. If you're talking about reducing the number of segment-lengths during x-vector extracting, I think the issue is basically solved already (especially given @gorinars PR). In diarization, we already extract embeddings from fixed-length chunks (e.g., 150 length chunks). For speaker recognition, we usually set a (near) infinite right context, because we found that it's empirically better to extract embeddings from the entire recording. However, if speed is an issue (I think the standard recipe is pretty fast anyway), the user can set chunk-length=1000 or something like that, and the binary will extract embeddings from 1000 frame chunks, and average them to produce the final embedding. |
@david-ryan-snyder that exactly what led to the PR issue. |
@david-ryan-snyder thanks for remark about adding the name. Not sure for this particular change - it is really a very small modification. |
I'll merge, it won't break anything. |
…ldi-asr#2290) Conflicts: egs/sre08/v1/sid/nnet3/xvector/extract_xvectors.sh
Following the discussion in
https://groups.google.com/forum/#!topic/kaldi-help/ROtSHHe3Z_I
Using larger cache speeds up extraction if equal length segments have already been processed