Skip to content

[ROCm] Enable Einsum for inferencing perf#12360

Merged
zhangyaobit merged 3 commits into
masterfrom
ettao/rocm-einsum
Jul 29, 2022
Merged

[ROCm] Enable Einsum for inferencing perf#12360
zhangyaobit merged 3 commits into
masterfrom
ettao/rocm-einsum

Conversation

@ytaous
Copy link
Copy Markdown
Contributor

@ytaous ytaous commented Jul 28, 2022

Description: The existing Einsum kernel is disable for ROCm. This causes 1P model inferencing fall back to CPUExecutionProvider. In addition, additional Cast OPs and memcpy (D2H/H2D) are inserted, particularly in fp16 mode, before and after Einsum op. The whole e2e thus much slower than running in CUDA env.

Perf - before the change:
Latency(ms) Throughput(QPS)
49.47 646.86

After the change ~ 3x faster
Latency(ms) Throughput(QPS)
18.06 1772.09

Also verified using the rocm profiler - #10911

@ytaous ytaous requested review from Lafi7e and zhangyaobit July 28, 2022 07:14
@ytaous ytaous added type:performance core runtime issues related to core runtime labels Jul 28, 2022
Comment thread onnxruntime/test/providers/cpu/math/einsum_test.cc Outdated
Comment thread onnxruntime/core/providers/rocm/math/einsum_utils/einsum_auxiliary_ops.cc Outdated
Comment thread onnxruntime/core/providers/rocm/math/einsum_utils/einsum_auxiliary_ops.h Outdated
@zhangyaobit zhangyaobit merged commit e4bd41f into master Jul 29, 2022
@zhangyaobit zhangyaobit deleted the ettao/rocm-einsum branch July 29, 2022 03:26
@ytaous
Copy link
Copy Markdown
Contributor Author

ytaous commented Jul 29, 2022

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core runtime issues related to core runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants