-
Notifications
You must be signed in to change notification settings - Fork 301
Add a sinusoidal embedding layer #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). For more information, open the CLA check for this pull request. |
95b81af
to
4a1372a
Compare
Hey @mattdangerw, any update on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Left some initial comments.
@mattdangerw, updated all the changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Few more comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! The calculation looks good to me, left some initial comments.
@mattdangerw have small doubt when adding changes requested, should we do force push amending previous commit or create a new commit? |
@amantayal44 either is fine. We recently switched our merge strategy to "Squash and Merge", so you can keep a long commit history here of each round of review which will get squashed before going to master. Also, re the argument name I saw you and Chen were discussing... I think the framing that will be most recognizable to users would be the exact framing from the Attention is All You Need paper, with the constant 10000 specifically. So switching this to wavelength instead of frequency. What do you think? Most explainers I see out on the web use wavelength, and most implementations use 10K as a magic constant. Here's what the docstring could look like:
Besides that, this lgtm! |
yes, max_wavelength will be much better. |
@amantayal44 oops just noticed one more thing while merging. The filename no longer matches the class name. Can you rename Then really good to go I think 😄 |
done!! 😅 |
Thank you!! |
* Add a sinusoidal embedding layer * renamed min_frequency to base_frequency and updated docstring * changed base_frequency to max_wavelength and added test to check correct values * renamed files * Fixup docstring formatting Co-authored-by: Matt Watson <1389937+mattdangerw@users.noreply.github.com>
Resolves #26