Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformer encoder #97

Merged
merged 4 commits into from Nov 2, 2022
Merged

Conversation

yushiyangk
Copy link
Contributor

@yushiyangk yushiyangk commented Oct 26, 2022

TransformerPositionalEncoding

$$ \mathrm{PE}_{i, 2z} = \sin \left( \frac{i}{10000^{2z/d}} \right) $$

$$ \mathrm{PE}_{i, 2z + 1} = \cos \left( \frac{i}{10000^{2z/d}} \right) $$

where $i$ is the sequence position, $2z$ and $2z+1$ are the dimensions of the input embedding, and $d$ is the dimensionality of the input embedding.

The multiplicative factors $\frac{1}{10000^{2z/d}}$ are precomputed during object creation as they are constant for all $i$.

The full PE is initially precomputed for all $i$ up to 256 (configurable). This is then extended and stored if the module is called with a sequence length larger than the initial value.

Returns a 2D tensor matching the last two dimensions of the input tensor to TransformerEncoder.

FeedForward

Simple 2-layer fully connected neural network with relu activation. This is kept as a private class for now. If we want to make this to be exported it should probably be in a separate file.

TransformerEncoderLayer

As described in Vaswani et al.

TransformerEncoder

The full encoder half of the Transformer, using a Sequential containing arbitrary number of TransformerEncoderLayers.

This includes the positional encoding, but does not include any initial embedding of an input sequence into vectors (which would be separately done by e.g. word2vec)

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 26, 2022
@yushiyangk yushiyangk force-pushed the transformer branch 6 times, most recently from 3e721ed to 8e8f967 Compare October 27, 2022 17:29
@yushiyangk
Copy link
Contributor Author

yushiyangk commented Oct 28, 2022

The test TransformerEncoder > calculates gradient is causing bun wiptest to fail silently with exit code 139 (segfault) or 138. This occurs in the result.backward() call.

Reproduced by me locally but could not be reproduced by @cryptodeal locally. We're both using macOS 12.6 and bun 0.2.2. Same error occurred for me when I used bun 0.1.13.

@bwasti
Copy link
Contributor

bwasti commented Oct 28, 2022

do you have a local file called libflashlight.0.dylib anywhere? if not, maybe try bun install --force to update that dylib locally

oops never mind, I see it repros in the CI

@cryptodeal
Copy link
Contributor

cryptodeal commented Oct 28, 2022

I know GitHub Actions Darwin runners are x86_64, so wanted to see if I could repro on my 2015 MBP (since it's AMD64), but damn, all 344 tests pass locally on that device as well (running Bun v0.2.2 and OSx 12.6).

Additional things I've tried in an attempt to induce a reproduction of this error (none of which have yet worked to repro):

  • clear bun dev cache rm -rf ~/.bun/install/cache (in case there was some weird caching of shumai dependency)

@yushiyangk
Copy link
Contributor Author

yushiyangk commented Oct 30, 2022

Tests are now passing for me too, after upgrading ArrayFire from 3.8.1 to 3.8.2.

As discussed on discord, the CI is still running version 3.8.1; the issue will probably be resolved once that's upgraded to 3.8.2 as well.

@yushiyangk
Copy link
Contributor Author

yushiyangk commented Oct 30, 2022

Expect this to be fixed by #105. Please merge this after #103

@yushiyangk yushiyangk marked this pull request as ready for review October 30, 2022 19:52
@bwasti
Copy link
Contributor

bwasti commented Oct 31, 2022

after rebase, would you mind putting the comments in this PR as comments in the transformer module?

we're using typedoc for comment formats https://typedoc.org/example/

@yushiyangk
Copy link
Contributor Author

yushiyangk commented Nov 1, 2022

after rebase, would you mind putting the comments in this PR as comments in the transformer module?

we're using typedoc for comment formats https://typedoc.org/example/

Oh yes, I should definitely do that.

Edit: Added to all the transformer modules

@bwasti
Copy link
Contributor

bwasti commented Nov 2, 2022

awesome work!

@bwasti bwasti merged commit 528ca3e into facebookresearch:main Nov 2, 2022
@yushiyangk yushiyangk mentioned this pull request Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants