Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close #21810 #22288

Closed
wants to merge 3 commits into from
Closed

Close #21810 #22288

wants to merge 3 commits into from

Conversation

DevBhuyan
Copy link

Closes #21810
#21810

From ToDo list: #14945
#14945

Update:

Added transformer layer in ~/ivy/ivy/stateful/layers.py
Added test test_transformer_layer in ~/ivy/ivy_tests/test_ivy/test_stateful/test_layers.py
Added custom composite strategy to generate data (ref. transformer_data())
This is my first PR. I tried to make it as appropriate as possible. Please let me know if there are any modifications that you'd suggest. Thank you!😊

@github-actions
Copy link
Contributor

Thanks for contributing to Ivy! 😊👏
Here are some of the important points from our Contributing Guidelines 📝:
1. Feel free to ignore the run_tests (1), run_tests (2), … jobs, and only look at the display_test_results job. 👀 It contains the following two sections:
- Combined Test Results: This shows the results of all the ivy tests that ran on the PR. ✔️
- New Failures Introduced: This lists the tests that are passing on main, but fail on the PR Fork. Please try to make sure that there are no such tests. 💪
2. The lint / Check formatting / check-formatting tests check for the formatting of your code. 📜 If it fails, please check the exact error message in the logs and fix the same. ⚠️🔧
3. Finally, the test-docstrings / run-docstring-tests check for the changes made in docstrings of the functions. This may be skipped, as well. 📚
Happy coding! 🎉👨‍💻

Copy link
Contributor

@rishabgit rishabgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for looking into this @DevBhuyan 😄

You'll have to rewrite the syntax a bit so that it is in line with general convention. I'll recommend checking out other classes in that file.

Not super sure if the implementation is correct - it should ideally be similar to Pytorch's Transformer module - https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html#torch.nn.Transformer https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/transformer.py
and not creating a model with positional encodings pre-coded like this official Pytorch example making use of Transformer module - https://github.com/pytorch/examples/blob/13009eff7a80ebcf6ae89ed217d5d176bd3e019d/word_language_model/model.py#L107

@vedpatwardhan - do you think the above makes sense or @DevBhuyan's implementation is fine as it is? 🤔

@DevBhuyan
Copy link
Author

Hi @rishabgit, Thanks for taking the time to review my PR. Yes, you're right, I need to change the syntax to fit with the other classes. My previous code was also just an attempt to write the layer as simply as possible (autoregressive, without a decoder), I guess I'll need to modify it to an encoder-decoder architecture (like that of PyTorch's).

I'll work on it and make a commit as soon as I have it ready. Thanks again for the suggestions:)

@vedpatwardhan
Copy link
Contributor

Hi, thanks for looking into this @DevBhuyan 😄

You'll have to rewrite the syntax a bit so that it is in line with general convention. I'll recommend checking out other classes in that file.

Not super sure if the implementation is correct - it should ideally be similar to Pytorch's Transformer module - https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html#torch.nn.Transformer https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/transformer.py and not creating a model with positional encodings pre-coded like this official Pytorch example making use of Transformer module - https://github.com/pytorch/examples/blob/13009eff7a80ebcf6ae89ed217d5d176bd3e019d/word_language_model/model.py#L107

@vedpatwardhan - do you think the above makes sense or @DevBhuyan's implementation is fine as it is? 🤔

Hey @rishabgit, your suggestion makes perfect sense, we should definitely try and align with the torch.nn.Transformer module as closely as possible. I'm not sure why we wouldn't want to create a model with pre-coded positional encodings, do you mean we directly do a self.register_buffer instead to ensure that it's a non-trainable state variable? Thanks @rishabgit @DevBhuyan 😄

@DevBhuyan DevBhuyan closed this Aug 25, 2023
@DevBhuyan DevBhuyan deleted the secondary branch August 25, 2023 19:41
@DevBhuyan DevBhuyan restored the secondary branch August 25, 2023 19:45
@DevBhuyan DevBhuyan reopened this Aug 25, 2023
@DevBhuyan
Copy link
Author

Hi @vedpatwardhan, I guess I was assuming a totally different direction, a decoder-only transformer. I agree with @vedpatwardhan and @rishabgit, and I'm rewriting it entirely to be in line with Pytorch's implementation and the other classes from layers.py.

Please excuse the unforeseen closing and opening of this Pull Request:) I'm still new to contributing on GitHub, I accidentally tried to remove another branch and this happened.

@DevBhuyan
Copy link
Author

Hi @rishabgit, I have updated the Transformer class to fit in line with PyTorch's implementation as well as with the other classes in the layers.py file. The docstrings also have been edited to match with the other classes.

Since I had used PyTorch's implementation as a starting point for the class(es), There were portions in PyTorch's implementation that required lower-level backend implementation to speed up processes ('FlashAttention'). I have not totally removed those lines, instead, I commented them out with FIXME tags. If required, these can be incorporated later into Ivy's backend itself to speed up processing.

Kindly let me know if there are any changes you'd suggest.

Thank you for your patience:)

@DevBhuyan DevBhuyan reopened this Aug 29, 2023
@DevBhuyan
Copy link
Author

I guess I messed up this PR. I'll create a new fork of the repo and create a fresh PR

@DevBhuyan DevBhuyan closed this Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

transformer
3 participants