Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Windows and onnx dtype compatibility #1886

Merged
merged 12 commits into from
Jun 24, 2024
Merged

Conversation

IlyasMoutawwakil
Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil commented Jun 3, 2024

What does this PR do?

Fixes multiple windows specific issues:

  • dtype mismatch: numpy uses a default int representation that depends on os. On ubuntu/macos it uses int64 but on windows it uses int32. This results in tokenizers returning input ids and attention masks in a different format than torch's default which is int64/long.
  • shutil.rmtree doesn't always work on windows, there's some random failures that we can see when a tempfile is being closed (cleaned up) or when directly trying to shutil.rmtree a folder.

Also seq2seq decoder used to return past kv cache in torch format all the time.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/onnxruntime/modeling_ort.py Show resolved Hide resolved
optimum/onnxruntime/modeling_ort.py Outdated Show resolved Hide resolved
optimum/onnxruntime/modeling_ort.py Outdated Show resolved Hide resolved
@@ -852,6 +863,39 @@ def raise_on_numpy_input_io_binding(self, use_torch: bool):
" with model.use_io_binding = False, or pass torch.Tensor inputs instead."
)

def _prepare_onnx_inputs(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Philipp did not like that kind of dynamicity with perf in mind but I have no opinion, sounds fine to me

Copy link
Member Author

@IlyasMoutawwakil IlyasMoutawwakil Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it only applies when needed so for performance I think there's no added overhead vs what used to be done.
where things can be optimized and i think can be viewed as the optimization oriented path, is i/o binding:

  • pre-allocation of output buffers, not during forward but before that, and dynamically changing the size of output buffers when batch size changes.
  • decoder models with i/o binding where static cache implementation can be used as output buffers to reduce the overhead of their creation.
  • decoder models input/output synchronization which can be before and after the generation loop instead of each forward call (if that's possible).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree, just not sure what each ORTModelForXXX is for then

optimum/onnxruntime/modeling_ort.py Outdated Show resolved Hide resolved
optimum/onnxruntime/modeling_ort.py Outdated Show resolved Hide resolved
optimum/onnxruntime/modeling_ort.py Outdated Show resolved Hide resolved
optimum/onnxruntime/modeling_ort.py Outdated Show resolved Hide resolved
optimum/onnxruntime/modeling_ort.py Outdated Show resolved Hide resolved
optimum/onnxruntime/modeling_ort.py Outdated Show resolved Hide resolved
@tengomucho
Copy link

FWIW, numpy 2.0 has just been released, and int size in windows is now 64 bit! https://numpy.org/devdocs/numpy_2_0_migration_guide.html#windows-default-integer

@IlyasMoutawwakil IlyasMoutawwakil merged commit aad4b8b into main Jun 24, 2024
86 of 92 checks passed
@IlyasMoutawwakil IlyasMoutawwakil deleted the fix-windows-int32 branch June 24, 2024 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants