Use cublasHgemm "back" for fp16 computation with Volta GPU by pengwa · Pull Request #3765 · microsoft/onnxruntime

pengwa · 2020-04-30T13:20:30Z

Description: Use cublasHgemm for fp16 computation with Volta GPU

cublasHgemm is used for fp16 computation with Volta GPU before training code is merged into master. For historical reasons when we did master->old training branch internally merge, we commented out that path. So I used "back" in PR title to indicate this change is just re-enable existing path.

This change should do no harm for inference because it is the case already in master before training is merged.
This change bring perf improvement on training, tested on 32GV100.

The reasons exists here https://docs.nvidia.com/cuda/cublas/index.html#cublassetmathmode.
cublasGemmEx's computation type is CUDA_R_32F, though its main data inputs.outputs are CUDA_R_16F. cublasHgemm did CUDA_R_16F computation.

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.

…o pengwa/fp16_gemm

pengwa · 2020-04-30T13:40:22Z

@SherlockNoMad feel free to take over this PR if we want it asap in latest benchmarking.

weixingzhang

Can you verify convergence? previously, the accumulation is done in FP32. But with this change, the accumulation will be done in FP16. For training, probably it is not a good idea to use hgemm, even it is OK for bert-l, but may not be OK for other big models.

pengwa added 3 commits April 30, 2020 03:41

fp16 gemm for volta

637db3e

Use cublasHgemm for fp16 computation with Volta GPU

399be2b

Merge branch 'master' of https://github.com/microsoft/onnxruntime int…

76fcfcc

…o pengwa/fp16_gemm

pengwa requested review from SherlockNoMad, edgchen1, souptc, weixingzhang and ytaous April 30, 2020 13:20

pengwa requested a review from a team as a code owner April 30, 2020 13:20

pengwa added the training issues related to ONNX Runtime training; typically submitted using template label Apr 30, 2020

pengwa changed the title ~~Use cublasHgemm for fp16 computation with Volta GPU~~ Use cublasHgemm "back" for fp16 computation with Volta GPU Apr 30, 2020

SherlockNoMad approved these changes Apr 30, 2020

View reviewed changes

Tixxx approved these changes Apr 30, 2020

View reviewed changes

pengwa merged commit 177c135 into master Apr 30, 2020

pengwa deleted the pengwa/fp16_gemm branch April 30, 2020 16:36

weixingzhang reviewed Apr 30, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use cublasHgemm "back" for fp16 computation with Volta GPU#3765

Use cublasHgemm "back" for fp16 computation with Volta GPU#3765
pengwa merged 3 commits intomasterfrom
pengwa/fp16_gemm

pengwa commented Apr 30, 2020 •

edited

Loading

Uh oh!

pengwa commented Apr 30, 2020

Uh oh!

weixingzhang left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pengwa commented Apr 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pengwa commented Apr 30, 2020

Uh oh!

weixingzhang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pengwa commented Apr 30, 2020 •

edited

Loading

weixingzhang left a comment •

edited

Loading