Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not attempt to pad nested tensors #2041

Merged
merged 1 commit into from
Nov 17, 2023

Conversation

frankier
Copy link
Contributor

@frankier frankier commented Oct 9, 2023

What does this PR do?

Fixes pad_across_processes erroring on nested tensors when reaching len(tensor.shape)

Fixes #2040
Fixes huggingface/transformers#26687

Looking for a bit of guidance/direction on whether other things are needed.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@pacman100

@BenjaminBossan
Copy link
Member

I wasn't aware of nested tensors, thanks for bringing this to my attention. AFAICT, accelerate doesn't really take them into account for now, I wouldn't be surprised if more errors can occur. It would probably require some unit tests to check this.

There is not an obvious default since numpy doesn't have an exact equivalent.

As for padding, I don't understand yet why to_padded_tensor cannot be used, what is the relation to numpy?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@BenjaminBossan
Copy link
Member

@frankier Could you please merge with/rebase on main, that should fix the issue with CI.

@frankier
Copy link
Contributor Author

AFAICT, accelerate doesn't really take them into account for now, I wouldn't be surprised if more errors can occur.

There are not more issues when I'm using accelerate from transformers Trainer, but there are issues in transformers if they are not converted in a custom preprocess_logits_for_metrics. Once this PR is merged, I will write another PR to transformers to improve the error messages in transformers.

It would probably require some unit tests to check this.

The test I know I how to write now at the moment is just a smoke/unit test that checks that nested_tensor is returned as-is roundtripped when passed to pad_across_processes. I think this test would be a bit vacuous, but I can easily add it to the PR if you like. Could you please direct me which file/class would be the best place for it?

As for padding, I don't understand yet why to_padded_tensor cannot be used

The problem is that it uses in-band signalling. Whichever padding element you choose, may actually occur in the array. Adding this as default behavior could cause confusing bugs for users. It's better to force them to choose how to deal with the nested_tensor.

what is the relation to numpy?

This might be from and old, possibly miscorrect memory I have about the default behavior of preprocess_logits_for_metrics converting everything to numpy arrays.

@BenjaminBossan
Copy link
Member

There are not more issues when I'm using accelerate from transformers Trainer, but there are issues in transformers if they are not converted in a custom preprocess_logits_for_metrics.

There could still be edge cases that you don't encounter but that other users may run into. E.g. the simple fact that tensor.shape may fail could easily lead to problems. Maybe accelerate dodges all of them, but we cannot know until we really test it. As is, I'd say we don't officially support nested tensors -- the docs mention they're at prototype stage, so it makes sense to wait.

That being said, there is no harm in accelerate getting out of the way if users happen to choose nested tensors, which is why I think this PR still makes sense.

The test I know I how to write now at the moment is just a smoke/unit test that checks that nested_tensor is returned as-is roundtripped when passed to pad_across_processes. I think this test would be a bit vacuous, but I can easily add it to the PR if you like.

I don't think it would be vacuous. You need to consider that someone else might come in the future and remove the check inadvertently. Having a test would alert us to this.

Could you please direct me which file/class would be the best place for it?

Not completely sure, test_utils.py seems most appropriate, @muellerzr WDYT?

The problem is that it uses in-band signalling. Whichever padding element you choose, may actually occur in the array.

I see, that makes sense. However, I do wonder if we should give a warning in that situation, so that the user is aware that padding did not take place.

This might be from and old, possibly miscorrect memory I have about the default behavior of preprocess_logits_for_metrics converting everything to numpy arrays.

AFAICT, there is nothing in accelerate that requires this. A quick search didn't reveal anything in transformers Trainer either, but I don't have experience with it, so might be wrong.

@muellerzr
Copy link
Collaborator

Agree with all of @BenjaminBossan's points

Not completely sure, test_utils.py seems most appropriate, @muellerzr WDYT?

Yes indeed!

@muellerzr
Copy link
Collaborator

@frankier do you need any more direction on how to add the tests? :)

@frankier frankier force-pushed the dont-pad-nested-tensor branch 2 times, most recently from 74c1d0c to 8f7ffcc Compare November 9, 2023 13:20
@frankier
Copy link
Contributor Author

Please excuse the delay. I think I've addressed all this now.

@BenjaminBossan
Copy link
Member

@frankier Could you please run make style and make quality?

@frankier
Copy link
Contributor Author

Whoops. Thought I had(!) Done now.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me.

Copy link
Collaborator

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@muellerzr muellerzr merged commit a5a7c03 into huggingface:main Nov 17, 2023
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants