QAT and quant postprocessing for torch.nn.Embedding #374

bfineran · 2021-09-03T17:50:21Z

Embedding layers are a large storage requirement for transformer models. This PR introduces a pathway to quantize embedding values. By default this behavior will be enabled by the QuantizationModifier

Python Example

>>> import torch
>>> from sparseml.pytorch.optim import QuantizationModifier
>>> module = torch.nn.Embedding(100, 100)
>>> QuantizationModifier().apply(module)
>>> module
Embedding(
  100, 100
  (activation_post_process): FakeQuantize(
    fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8),            quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1,         scale=tensor([1.]), zero_point=tensor([0])
    (activation_post_process): MovingAverageMinMaxObserver(min_val=inf, max_val=-inf)
  )
  (weight_fake_quant): FakeQuantize(
    fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8),            quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_tensor_affine, ch_axis=-1,         scale=tensor([1.]), zero_point=tensor([0])
    (activation_post_process): MovingAverageMinMaxObserver(min_val=inf, max_val=-inf)
  )
)

FP32 ONNX Graph

QAT ONNX Graph

Quant ONNX Graph

markurtz

Overall looks good, a few small comments

src/sparseml/pytorch/optim/modifier_quantization.py

rahul-tuli

* QAT and quant postprocessing for torch.nn.Embedding * cleanup * residual optim and logging fixes * response to comments

QAT and quant postprocessing for torch.nn.Embedding

e0e2318

bfineran requested review from markurtz, mgoin and natuan September 3, 2021 17:50

bfineran self-assigned this Sep 3, 2021

Benjamin added 2 commits September 3, 2021 13:55

cleanup

598e6cb

residual optim and logging fixes

ff208a7

markurtz suggested changes Sep 6, 2021

View reviewed changes

src/sparseml/pytorch/optim/modifier_quantization.py Outdated Show resolved Hide resolved

src/sparseml/pytorch/optim/modifier_quantization.py Outdated Show resolved Hide resolved

src/sparseml/pytorch/optim/modifier_quantization.py Outdated Show resolved Hide resolved

response to comments

626b3e9

markurtz approved these changes Sep 7, 2021

View reviewed changes

Merge branch 'main' into qat-embeddings

f524d75

mgoin approved these changes Sep 8, 2021

View reviewed changes

rahul-tuli approved these changes Sep 8, 2021

View reviewed changes

bfineran merged commit 0b55a06 into main Sep 8, 2021

bfineran deleted the qat-embeddings branch September 8, 2021 15:45

bfineran added a commit that referenced this pull request Sep 8, 2021

QAT and quant postprocessing for torch.nn.Embedding (#374)

6fa2bb3

* QAT and quant postprocessing for torch.nn.Embedding * cleanup * residual optim and logging fixes * response to comments

bfineran added a commit that referenced this pull request Sep 8, 2021

QAT and quant postprocessing for torch.nn.Embedding (#374)

47d9472

* QAT and quant postprocessing for torch.nn.Embedding * cleanup * residual optim and logging fixes * response to comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QAT and quant postprocessing for torch.nn.Embedding #374

QAT and quant postprocessing for torch.nn.Embedding #374

Uh oh!

bfineran commented Sep 3, 2021

Uh oh!

markurtz left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rahul-tuli left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

QAT and quant postprocessing for torch.nn.Embedding #374

QAT and quant postprocessing for torch.nn.Embedding #374

Uh oh!

Conversation

bfineran commented Sep 3, 2021

Python Example

FP32 ONNX Graph

QAT ONNX Graph

Quant ONNX Graph

Uh oh!

markurtz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rahul-tuli left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants