Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use this for speech/audio generation? #3

Closed
jinglescode opened this issue Dec 3, 2020 · 3 comments
Closed

How to use this for speech/audio generation? #3

jinglescode opened this issue Dec 3, 2020 · 3 comments

Comments

@jinglescode
Copy link

Great work Phil! In their paper, the authors applied this model to speech modeling, how would you advise on what should I change to use for speech. Because in speech, the data are signals, we do not have num_tokens, nor do we have emb_dim. Our data input is simply, [batch, channel, time]. Any advice?

@jinglescode jinglescode changed the title How to use this for speech? How to use this for speech/audio generation? Dec 3, 2020
@lucidrains
Copy link
Owner

@jinglescode oh hey Hong! sorry I missed this issue

you should use https://github.com/lucidrains/vector-quantize-pytorch to first quantize your speech signals into discrete dictionary, and then use that to train your auto-regressive model!

@lucidrains
Copy link
Owner

@jinglescode
Copy link
Author

Very helpful. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants