Skip to content

Conversation

@mingzhe09088
Copy link
Contributor

Summary:
This diff implements Linear operation with fp16 weights based on FBGEMM. At a hight level, we want to perform the following operation:
Y = X * W + B with dtypes:
(fp32, fp32, fp16, fp32)

To do that, three steps are needed:

  1. Quantize weights from fp32 to fp16, this is done using PackedGemmMatrixFP16 in the fbgemm_pack_gemm_matrix_fp16
  2. Conduct matrix multiplication with quantized weights using cblas_gemm_compute in fbgemm_linear_fp16_weight
  3. Add bias to the result from step2 and return the final Y

Differential Revision: D15921768

@pytorchbot pytorchbot added module: operators module: nn Related to torch.nn module: internals Related to internal abstractions in c10 and ATen labels Jun 20, 2019
Summary:
Pull Request resolved: #22023

This diff implements Linear operation with fp16 weights based on FBGEMM. At a hight level, we want to perform the following operation:
Y  =   X * W + B with dtypes:
(fp32, fp32, fp16, fp32)

To do that, three steps are needed:
1. Quantize weights from fp32 to fp16, this is done using `PackedGemmMatrixFP16` in the `fbgemm_pack_gemm_matrix_fp16`
2. Conduct matrix multiplication with quantized weights using `cblas_gemm_compute` in `fbgemm_linear_fp16_weight`
3. Add bias to the result from step2 and return the final Y

Reviewed By: jianyuh

Differential Revision: D15921768

fbshipit-source-id: f48ed23c4a446e6454b1334ede492b7efec45260
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 573d9e6.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 13, 2019
Summary:
Pull Request resolved: pytorch/pytorch#22023

This diff implements Linear operation with fp16 weights based on FBGEMM. At a hight level, we want to perform the following operation:
Y  =   X * W + B with dtypes:
(fp32, fp32, fp16, fp32)

To do that, three steps are needed:
1. Quantize weights from fp32 to fp16, this is done using `PackedGemmMatrixFP16` in the `fbgemm_pack_gemm_matrix_fp16`
2. Conduct matrix multiplication with quantized weights using `cblas_gemm_compute` in `fbgemm_linear_fp16_weight`
3. Add bias to the result from step2 and return the final Y

Reviewed By: jianyuh

Differential Revision: D15921768

fbshipit-source-id: dc4e5b366f846ce9d58975876940a9b3372b8b8d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: internals Related to internal abstractions in c10 and ATen module: nn Related to torch.nn

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants