xvector extractor cache config #2290

gorinars · 2018-03-18T08:58:23Z

Following the discussion in
https://groups.google.com/forum/#!topic/kaldi-help/ROtSHHe3Z_I

Using larger cache speeds up extraction if equal length segments have already been processed

gorinars · 2018-03-22T18:12:34Z

Travis build failed with a strange error

./matrix/libkaldi-matrix.so: file not recognized: File truncated

I feel like, if possible, just re-running it would solve the issue. If I should re-create this PR to trigger travis, please let me know

jtrmal · 2018-03-22T18:18:00Z

I have restarted the build. y.

…

On Thu, Mar 22, 2018 at 1:12 PM, Arseniy Gorin ***@***.***> wrote: Travis build failed with a strange error ./matrix/libkaldi-matrix.so: file not recognized: File truncated I feel like, if possible, just re-running it would solve the issue. If I should re-create this PR to trigger travis, please let me know — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2290 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKisX3xGaiP5fDVLG32lwhfeHDNg35Q6ks5tg-mcgaJpZM4SvHvC> .

danpovey · 2018-03-22T18:36:50Z

the build isn't necessary when code is not changed. it's a travis issue. I'll merge.

danpovey · 2018-03-22T18:39:57Z

I see that the default in the code is 64. Perhaps it would make sense to use a larger value as the default in the script, assuming this makes the code run faster?

gorinars · 2018-03-22T18:50:57Z

64 is the setting we use now by default.
I usually use 475 for chunk-size=500 to make sure computer is compiled only once for a given segment size. For SRE the right value would be 10000, but I'm not sure about the memory consumption then.

danpovey · 2018-03-22T18:59:43Z

Hm, that would be quite a bit of I/O for each job, but I guess it's OK if it gives a speedup in practice. If you just set the limit to a large value it would work for any setup, but IMO in the longer term the right fix is to change the setup so that a relatively small number of segment sizes. @david-ryan-snyder, I don't know if there is progress on that front?

…

On Thu, Mar 22, 2018 at 2:51 PM, Arseniy Gorin ***@***.***> wrote: 64 is the setting we use now by default. I usually use 475 for chunk-size=500 to make sure computer is compiled only once for a given segment size. For SRE the right value would be 10000, but I'm not sure about the memory consumption then. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2290 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu6s3oAYXyva_Iy1CbztXM9UO6yPEks5tg_KbgaJpZM4SvHvC> .

david-ryan-snyder · 2018-03-22T19:30:23Z

@danpovey, those issues are both important but actually orthogonal. @LvHang PR speeds up training. @gorinars PR only affects the x-vector system after it's been trained. To answer your question... I still haven't made progress on it, but I haven't forgotten about it either.

BTW, @gorinars, feel free to add your name to the author list on those files, so that your contribution is more visibly credited.

david-ryan-snyder · 2018-03-22T19:44:49Z

It occurred to me I might've misunderstood what you meant, @danpovey.

If you're talking about reducing the number of segment-lengths during x-vector extracting, I think the issue is basically solved already (especially given @gorinars PR).

In diarization, we already extract embeddings from fixed-length chunks (e.g., 150 length chunks). For speaker recognition, we usually set a (near) infinite right context, because we found that it's empirically better to extract embeddings from the entire recording. However, if speed is an issue (I think the standard recipe is pretty fast anyway), the user can set chunk-length=1000 or something like that, and the binary will extract embeddings from 1000 frame chunks, and average them to produce the final embedding.

gorinars · 2018-03-22T20:24:55Z

@david-ryan-snyder that exactly what led to the PR issue.
I was setting chunk-length=1000, but the actual chunks are sometimes smaller, like 900, 101, etc if the segment is shorter than chunk-size. Re-compilation for each of them was slowing down the system significantly.
So we finally ended up with stuff implemented in nnet3-xvector-compute-parallel.cc in #2303 by passing the pre-compiled cache for all segment lengths and increasing the cache size.

gorinars · 2018-03-22T20:34:29Z

@david-ryan-snyder thanks for remark about adding the name. Not sure for this particular change - it is really a very small modification.

danpovey · 2018-03-22T20:36:34Z

I'll merge, it won't break anything.

…ldi-asr#2290) Conflicts: egs/sre08/v1/sid/nnet3/xvector/extract_xvectors.sh

…ldi-asr#2290)

Arseniy Gorin added 2 commits March 18, 2018 11:50

added caching compiler xvector option

338231f

added cache-capacity option to xvector script

c54c08f

danpovey merged commit 9ae3eb7 into kaldi-asr:master Mar 22, 2018

LvHang pushed a commit to LvHang/kaldi that referenced this pull request Apr 14, 2018

[src,scripts] Make cache size configurable for xvector extraction (ka…

42324e9

…ldi-asr#2290) Conflicts: egs/sre08/v1/sid/nnet3/xvector/extract_xvectors.sh

Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018

[src,scripts] Make cache size configurable for xvector extraction (ka…

7195aa6

…ldi-asr#2290)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xvector extractor cache config #2290

xvector extractor cache config #2290

gorinars commented Mar 18, 2018

gorinars commented Mar 22, 2018

jtrmal commented Mar 22, 2018 via email

danpovey commented Mar 22, 2018

danpovey commented Mar 22, 2018

gorinars commented Mar 22, 2018

danpovey commented Mar 22, 2018 via email

david-ryan-snyder commented Mar 22, 2018

david-ryan-snyder commented Mar 22, 2018 •

edited

Loading

gorinars commented Mar 22, 2018 •

edited

Loading

gorinars commented Mar 22, 2018

danpovey commented Mar 22, 2018

xvector extractor cache config #2290

xvector extractor cache config #2290

Conversation

gorinars commented Mar 18, 2018

gorinars commented Mar 22, 2018

jtrmal commented Mar 22, 2018 via email

danpovey commented Mar 22, 2018

danpovey commented Mar 22, 2018

gorinars commented Mar 22, 2018

danpovey commented Mar 22, 2018 via email

david-ryan-snyder commented Mar 22, 2018

david-ryan-snyder commented Mar 22, 2018 • edited Loading

gorinars commented Mar 22, 2018 • edited Loading

gorinars commented Mar 22, 2018

danpovey commented Mar 22, 2018

david-ryan-snyder commented Mar 22, 2018 •

edited

Loading

gorinars commented Mar 22, 2018 •

edited

Loading