Skip to content

[js/webgpu] Optimize convtranspose#23302

Merged
guschmue merged 5 commits into
microsoft:mainfrom
qjia7:opt_convtranspose
Jan 9, 2025
Merged

[js/webgpu] Optimize convtranspose#23302
guschmue merged 5 commits into
microsoft:mainfrom
qjia7:opt_convtranspose

Conversation

@qjia7
Copy link
Copy Markdown
Contributor

@qjia7 qjia7 commented Jan 9, 2025

Description

BUG #23273

With this change, I see the convTranspose time in that bug becomes ~7s from ~90s on my Meteor Lake.

This PR does below things:

  1. Use stride to update the increasement in the loop.
    In the bug, the stride is 1024, which can greatly reduce the loop times.
  2. Support components for A to reduce the memory access times.
  3. When output channels is 1, the b components can be same with A to further reduce the memory access times.

@qjia7
Copy link
Copy Markdown
Contributor Author

qjia7 commented Jan 9, 2025

@guschmue @fs-eire Please take a look, thanks.

@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Jan 9, 2025
@guschmue
Copy link
Copy Markdown
Contributor

guschmue commented Jan 9, 2025

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

@guschmue
Copy link
Copy Markdown
Contributor

guschmue commented Jan 9, 2025

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@guschmue
Copy link
Copy Markdown
Contributor

guschmue commented Jan 9, 2025

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@guschmue
Copy link
Copy Markdown
Contributor

guschmue commented Jan 9, 2025

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@guschmue guschmue merged commit 7be006c into microsoft:main Jan 9, 2025
guschmue pushed a commit that referenced this pull request Jan 12, 2025
### Description
<!-- Describe your changes. -->
BUG #23273

With this change, I see the convTranspose time in that bug becomes ~7s
from ~90s on my Meteor Lake.

This PR does below things:
1. Use stride to update the increasement in the loop.
In the bug, the stride is 1024, which can greatly reduce the loop times.
2. Support components for A to reduce the memory access times.
3. When output channels is 1, the b components can be same with A to
further reduce the memory access times.
@qjia7 qjia7 deleted the opt_convtranspose branch January 13, 2025 08:34
ashrit-ms pushed a commit that referenced this pull request Mar 17, 2025
### Description
<!-- Describe your changes. -->
BUG #23273

With this change, I see the convTranspose time in that bug becomes ~7s
from ~90s on my Meteor Lake.

This PR does below things:
1. Use stride to update the increasement in the loop.
In the bug, the stride is 1024, which can greatly reduce the loop times.
2. Support components for A to reduce the memory access times.
3. When output channels is 1, the b components can be same with A to
further reduce the memory access times.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants