Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Hard error when ignoring tensors. #27484
[WIP] Hard error when ignoring tensors. #27484
Changes from 4 commits
ae5cadb
92d1715
88571f3
d8e1ed1
7ee5bd0
6a47030
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we merge the tensors that have the same ending in the filtered tensors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged the tensors ?
We add the name to a previous set, so that the list
filtered_tensors
will contain a set with at least 2 entries.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this tensors that are not loaded on the same device would have a different entry in the set dict right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not seem tested by the small dummy test (but is most probably tested by some models that do have this?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because
area
contains the device as a key.Nothing in transformers could trigger this, only
optimum
did trigger this. It requires some low level manipulation of storages and tensors.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identical was already previously tested by other tests. (Since it's a regular flow for shared tensors with
tie_weights
.)The dummy test only tests the new path for disjoint tensors (which cannot be triggered by regular transformers code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error name is only updated once with
names
is it needed?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a sort of code comment, it's mostly that some names will trigger warn, some errors.
Not strongly feeling here, just feels it's a bit clearer that there are different names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What property means in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bit strange to me that we re-compute the intersection and difference with the
to_delete_names
that is not updated in the loop.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes every bit less entangled.
_find_xx
don't care about transformers specifications on what tensors are tied.to_delete_names
IS the only place where we know tensors we should delete.Set calculations should be ridiculously fast compared to the actual saving/loading part.