Optimize _PaddedFP8rowwise_to_float_cuda_kernel for nrows == 1 #1858

sryap · 2023-06-30T22:33:30Z

Summary:
Optimize _PaddedFP8rowwise_to_float_cuda_kernel by using one thread
block to dequantize one row for the nrows == 1 case. Prior to this
diff, one thread dequantizes one row.

_PaddedFP8rowwise_to_float_cuda_kernel removes padding, scale, and
pad value from the input. Please see the attached figure for the
input and output layouts.

{F1039403748}

Differential Revision: D47068547

netlify · 2023-06-30T22:33:34Z

✅ Deploy Preview for pytorch-fbgemm-docs canceled.

Name	Link
🔨 Latest commit	`4c90c6d`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/64a3122d3350250008e83a0f

facebook-github-bot · 2023-06-30T22:34:38Z

This pull request was exported from Phabricator. Differential Revision: D47068547

…ch#1858) Summary: Pull Request resolved: pytorch#1858 Optimize `_PaddedFP8rowwise_to_float_cuda_kernel` by using one thread block to dequantize one row for the `nrows == 1` case. Prior to this diff, one thread dequantizes one row. `_PaddedFP8rowwise_to_float_cuda_kernel` removes padding, scale, and pad value from the input. Please see the attached figure for the input and output layouts. {F1039403748} Reviewed By: xiaosun86, jianyuh Differential Revision: D47068547 fbshipit-source-id: cd0c88051d8691d232c28600e70b6827318a6ef7

facebook-github-bot · 2023-07-03T18:16:43Z

This pull request was exported from Phabricator. Differential Revision: D47068547

…ch#1858) Summary: Pull Request resolved: pytorch#1858 Optimize `_PaddedFP8rowwise_to_float_cuda_kernel` by using one thread block to dequantize one row for the `nrows == 1` case. Prior to this diff, one thread dequantizes one row. `_PaddedFP8rowwise_to_float_cuda_kernel` removes padding, scale, and pad value from the input. Please see the attached figure for the input and output layouts. {F1039403748} Reviewed By: xiaosun86, jianyuh Differential Revision: D47068547 fbshipit-source-id: 6b6cd9b2a092d17c11c356a84c96449351d453be

facebook-github-bot · 2023-07-03T18:23:43Z

This pull request was exported from Phabricator. Differential Revision: D47068547

facebook-github-bot · 2023-07-03T23:47:07Z

This pull request has been merged in f509314.

facebook-github-bot added cla signed fb-exported labels Jun 30, 2023

sryap force-pushed the export-D47068547 branch from e51be7f to a14ff70 Compare July 3, 2023 18:16

sryap force-pushed the export-D47068547 branch from a14ff70 to 4c90c6d Compare July 3, 2023 18:23

facebook-github-bot closed this in f509314 Jul 3, 2023

facebook-github-bot added the Merged label Jul 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize _PaddedFP8rowwise_to_float_cuda_kernel for nrows == 1 #1858

Optimize _PaddedFP8rowwise_to_float_cuda_kernel for nrows == 1 #1858

sryap commented Jun 30, 2023

netlify bot commented Jun 30, 2023 •

edited

facebook-github-bot commented Jun 30, 2023

facebook-github-bot commented Jul 3, 2023

facebook-github-bot commented Jul 3, 2023

facebook-github-bot commented Jul 3, 2023

Optimize _PaddedFP8rowwise_to_float_cuda_kernel for nrows == 1 #1858

Optimize _PaddedFP8rowwise_to_float_cuda_kernel for nrows == 1 #1858

Conversation

sryap commented Jun 30, 2023

netlify bot commented Jun 30, 2023 • edited

✅ Deploy Preview for pytorch-fbgemm-docs canceled.

facebook-github-bot commented Jun 30, 2023

facebook-github-bot commented Jul 3, 2023

facebook-github-bot commented Jul 3, 2023

facebook-github-bot commented Jul 3, 2023

netlify bot commented Jun 30, 2023 •

edited