Update gather to use multiple threads#11524
Merged
RyanUnderhill merged 5 commits intomasterfrom May 17, 2022
Merged
Conversation
hariharans29
previously approved these changes
May 16, 2022
| // Copyright (c) Microsoft Corporation. All rights reserved. | ||
| // Licensed under the MIT License. | ||
|
|
||
| #include <string> |
Member
There was a problem hiding this comment.
Just curious - why was this header inclusion required now ?
Contributor
Author
There was a problem hiding this comment.
It was a lint warning to not include it. I forget the exact text but something about including libraries for types you use.
hariharans29
approved these changes
May 17, 2022
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description: GatherElements wouldn't distribute the work across multiple threads
Motivation and Context
A user was comparing the performance of Onnxruntime vs Pytorch and saw that the latest Pytorch was 2x faster than Onnxruntime. A profile showed that they were using all CPU cores but we were limited to 1.
There is a separate issue where they were using Onnxruntime inefficiently (there's a memcpy on every Run() call to copy the output tensor, using io-bindings avoids the memcpy).
Here's some performance data comparing the old vs new. Note that there is a slight perf hit for the single threaded case, as it has to divide up the work into independent chunks vs a slightly faster incremental calculation between chunks.
New version using 8 threads:
onnx model: 0.770249s after 10000 iterations
pytorch model: 3.669060s after 10000 iterations
New version limited to one thread:
onnx model: 3.474726s after 10000 iterations
pytorch model: 3.579055s after 10000 iterations
Old version:
onnx model: 2.905542s after 10000 iterations
pytorch model: 3.634197s after 10000 iterations