Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I get a interpretation of what the dense layer output that has a shape of 1536 means? #40

Closed
levscaut opened this issue Apr 19, 2022 · 3 comments

Comments

@levscaut
Copy link

I know the 0-127 midi instrument program is included in this 1536, but what exactly does this region locate? I'm asking this because I want to apply a mask to the raw output to constrain the predicted types of instrument before softmax layer. So appreciated it if you can help me with this!

@cghawthorne
Copy link

You can see how the event ranges are assigned here: https://github.com/magenta/mt3/blob/main/mt3/vocabularies.py#L119

The first event range is reserved for time shifts, and we default to reserving 10 seconds' worth, at 100 steps/second, so the first 1000 events. In practice, this is more than we need, but we decided to err on the side of flexibility.

@levscaut
Copy link
Author

Yes I have noticed these events ranges, but the overall range is 1388 for this vocabulary, is there any possibility to map this 1388 to the dense layer output that has dimension of 1536?

@iansimon
Copy link
Contributor

The dimension is always increased to the nearest multiple of 128: https://github.com/magenta/mt3/blob/main/mt3/vocabularies.py#L282

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants