How to use this for speech/audio generation? #3

jinglescode · 2020-12-03T12:00:26Z

Great work Phil! In their paper, the authors applied this model to speech modeling, how would you advise on what should I change to use for speech. Because in speech, the data are signals, we do not have num_tokens, nor do we have emb_dim. Our data input is simply, [batch, channel, time]. Any advice?

The text was updated successfully, but these errors were encountered:

lucidrains · 2020-12-11T19:18:27Z

@jinglescode oh hey Hong! sorry I missed this issue

you should use https://github.com/lucidrains/vector-quantize-pytorch to first quantize your speech signals into discrete dictionary, and then use that to train your auto-regressive model!

lucidrains · 2020-12-11T19:19:22Z

https://openreview.net/forum?id=3InxcRQsYLf

jinglescode · 2020-12-14T07:27:25Z

Very helpful. Thanks!

jinglescode changed the title ~~How to use this for speech?~~ How to use this for speech/audio generation? Dec 3, 2020

jinglescode closed this as completed Dec 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use this for speech/audio generation? #3

How to use this for speech/audio generation? #3

jinglescode commented Dec 3, 2020

lucidrains commented Dec 11, 2020

lucidrains commented Dec 11, 2020

jinglescode commented Dec 14, 2020

How to use this for speech/audio generation? #3

How to use this for speech/audio generation? #3

Comments

jinglescode commented Dec 3, 2020

lucidrains commented Dec 11, 2020

lucidrains commented Dec 11, 2020

jinglescode commented Dec 14, 2020