Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please consider adding MIG (MI-rror with G-radient modification) to torch.nn #122680

Open
YagaoDirac opened this issue Mar 26, 2024 · 3 comments
Open
Labels
module: nn Related to torch.nn needs research We need to decide whether or not this merits inclusion, based on research world triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@YagaoDirac
Copy link

YagaoDirac commented Mar 26, 2024

🚀 The feature, motivation and pitch

https://github.com/YagaoDirac/Pytorch-extension-from-YagaoDirac-v2/blob/main/v2%20with%20basic%20test.py
I implemented this 2 weeks ago. It's probably a better implementation of the Linear layer. It speeds up the training while let people stack much more such layers directly without any trick.

Alternatives

In the code I implemented 3 different types of similar purpose. Each of them are tested and can work individually( if my tests are not too wrong).

Additional context

Also, if you decide to add this to pytorch, remember to rename it.
Notice the "untested workflow" in the file. It's probably a better way, and can be integreted into the layer itself.
More info in the file.

cc @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki

@cpuhrsch cpuhrsch added module: nn Related to torch.nn triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Mar 28, 2024
@albanD albanD added the needs research We need to decide whether or not this merits inclusion, based on research world label Mar 28, 2024
@YagaoDirac
Copy link
Author

YagaoDirac commented Mar 29, 2024

@albanD Hi, thank you for tagging this issue with "needs research". Finally someone trusts me at least a bit. If you need more info about the code, my twitter, github, gmail and discord, are all the same name. Simply dm me somewhere.

@mikaylagawarecki
Copy link
Contributor

mikaylagawarecki commented Apr 9, 2024

Hi @YagaoDirac , could you provide more detail on what MIG is (e.g. research papers where it is used or proposed) and further elaboration on what problem it is meant to solve?

@YagaoDirac
Copy link
Author

YagaoDirac commented Apr 11, 2024

Hi @YagaoDirac , could you provide more detail on what MIG is (e.g. research papers where it is used or proposed) and further elaboration on what problem it is meant to solve?

Hi @mikaylagawarecki . I'm glad people don't ignore my work.
Short answer: I explained it in the python file.

I'm a hobbist. I didn't write a paper for it since I'm recently too busy on some other tasks. MIG and 2 other tools in the code is designed as a better implementation of the FCNN(torch.nn.Linear). It's trainable no matter you stack however many of it in a row. (If you stack 5 FCNN in a row, the training is too slow. Basically I don't do it. But mig and 2 other tools can do the same thing with at least 10 stacked in a row.) According to my tests(if they are not too wrong), it's 1000x to 100_000 times faster in some cases. The test code is also in the code in the link.
Now I'm asking 2 of my friends to coop on a paper for this but they can not begin at the moment. One of them has to wait until sep this year, the other one is bit slower. So if Pytorch team is interested in this, you guys are absolutely much faster, so people will begin to know this tool earlier, which is great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: nn Related to torch.nn needs research We need to decide whether or not this merits inclusion, based on research world triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants