-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document which architectures have been modified and are incompatible with the original architecture #61
Comments
Hi, good suggestion. I wrote here when they are not compatible. As for ATD, I also added the empty tensor, so spandrel could detect it.
If the changes contributes that much, yes, I would accept it. Minor changes should be an option though, as we have been doing with rgb normalization. For example, in some networks I added flash attention, because it improves ~40% training speeds. However, I made it a bool param, so anybody can enable/disable it. |
Lots of valuable information. Thanks! Just one thing: you forgot SPAN :) I'll have some testing to do to see how compatible everything is with spandrel.
Yup, I saw. I'll add support for this soon.
I wonder how compatible that is. You don't seem too optimistic in the docs. Maybe it's detectable? Then it won't be an issue. Well, I guess I'll just have to intestate and see. |
Added it
Not sure, but probably yes. I put 2 models in this zip, one with |
Thanks! That'll help when I get to it. Since all architectures modifications seems to have been documented, I consider this issue closed now. We have to investigate those modifications for spandrel, but that isn't an issue with neosr. Thanks musl! |
Hey musl.
Kim just informed me that ATD light models from neosr don't load with spandrel because of added the
norm
parameter. IMO it's not a problem for neosr to make improvements to existing architectures, but I think that this has to be communicated. Just like with SPAN, ATD models withnorm=False
(which is the default) are incompatible with the original arch code.Could you please document somewhere (1) whether an architecture is compatible with original arch code, and (2) which specific parameter configurations are compatible/incompatible? E.g. for SPAN and ATD, I would document both as "partially compatible" because they are only compatible if
norm=True
is set.In general, I would also be interested to hear what your take is on neosr models being incompatible with original arch code? E.g. would you accept major changes to an existing arch that majorly improves it (idk, 3x speed) but loses compatibility?
The text was updated successfully, but these errors were encountered: