Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document which architectures have been modified and are incompatible with the original architecture #61

Closed
RunDevelopment opened this issue Apr 5, 2024 · 4 comments
Labels
discussion Discussion about something solved issue has been solved

Comments

@RunDevelopment
Copy link
Contributor

Hey musl.

Kim just informed me that ATD light models from neosr don't load with spandrel because of added the norm parameter. IMO it's not a problem for neosr to make improvements to existing architectures, but I think that this has to be communicated. Just like with SPAN, ATD models with norm=False (which is the default) are incompatible with the original arch code.

Could you please document somewhere (1) whether an architecture is compatible with original arch code, and (2) which specific parameter configurations are compatible/incompatible? E.g. for SPAN and ATD, I would document both as "partially compatible" because they are only compatible if norm=True is set.


In general, I would also be interested to hear what your take is on neosr models being incompatible with original arch code? E.g. would you accept major changes to an existing arch that majorly improves it (idk, 3x speed) but loses compatibility?

@neosr-project
Copy link
Owner

Hi, good suggestion. I wrote here when they are not compatible. As for ATD, I also added the empty tensor, so spandrel could detect it.

what your take is on neosr models being incompatible with original arch code?

If the changes contributes that much, yes, I would accept it. Minor changes should be an option though, as we have been doing with rgb normalization. For example, in some networks I added flash attention, because it improves ~40% training speeds. However, I made it a bool param, so anybody can enable/disable it.

@neosr-project neosr-project added the discussion Discussion about something label Apr 5, 2024
@RunDevelopment
Copy link
Contributor Author

I wrote here when they are not compatible.

Lots of valuable information. Thanks! Just one thing: you forgot SPAN :)

I'll have some testing to do to see how compatible everything is with spandrel.

As for ATD, I also added the empty tensor, so spandrel could detect it.

Yup, I saw. I'll add support for this soon.

I added flash attention

I wonder how compatible that is. You don't seem too optimistic in the docs. Maybe it's detectable? Then it won't be an issue. Well, I guess I'll just have to intestate and see.

@neosr-project
Copy link
Owner

neosr-project commented Apr 6, 2024

Just one thing: you forgot SPAN :)

Added it

Maybe it's detectable?

Not sure, but probably yes. I put 2 models in this zip, one with flash_attn: true and other without it, if you want to take a look at some point: flash_attn.zip

@RunDevelopment
Copy link
Contributor Author

I put 2 models in this zip

Thanks! That'll help when I get to it.


Since all architectures modifications seems to have been documented, I consider this issue closed now. We have to investigate those modifications for spandrel, but that isn't an issue with neosr.

Thanks musl!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion about something solved issue has been solved
Projects
None yet
Development

No branches or pull requests

2 participants