Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BatchFeature: Convert List[np.ndarray] to np.ndarray before converting to pytorch tensors #14306

Merged
merged 7 commits into from
Nov 10, 2021

Conversation

eladsegal
Copy link
Contributor

@eladsegal eladsegal commented Nov 7, 2021

What does this PR do?

Fixes #14307

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sgugger
Copy link
Collaborator

sgugger commented Nov 8, 2021

Thanks for your PR!
I think the check needs to be a bit more thorough: we need to also check all the elements of the list are np.ndarray (or at least the first one), according to the warning at least.

@eladsegal
Copy link
Contributor Author

eladsegal commented Nov 8, 2021

Thanks @sgugger,
I added a check for the first element, but here are my thoughts about it:
Iterating through the full list doesn't seem helpful - what would you do if half of the list is List and the other half is np.array? It would still be more efficient to call np.array in such a case.
It's the same thing for checking just the first element, but at least the check doesn't have the cost of iterating through the full list.
Therefore, I think that it's best to always pass the list to np.array without checking it.

eladsegal and others added 2 commits November 10, 2021 04:29
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
@eladsegal eladsegal marked this pull request as ready for review November 10, 2021 02:59
@sgugger
Copy link
Collaborator

sgugger commented Nov 10, 2021

Failure is unrelated and already fixed on master.
Thanks again for your work on this PR!

@sgugger sgugger merged commit 321eb56 into huggingface:master Nov 10, 2021
@eladsegal eladsegal deleted the batch-feature-speedup branch November 11, 2021 02:58
Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 27, 2022
…verting to pytorch tensors (huggingface#14306)

* update

* style fix

* retrigger checks

* check first element

* fix syntax error

* Update src/transformers/feature_extraction_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove import

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants