FP [6,8,12] quantizer op #5336

jeffra · 2024-03-29T21:25:12Z

Flexible-bit quantizer-dequantizer library with fp6/fp12/fp8 support

Requires Ampere+ architecture, this is due to the initial focus of this op only on bfloat16 input types.

Co-authored-by: Reza Yazdani reza.yazdani@snowflake.com

jeffra · 2024-03-29T22:57:29Z

Regarding the nv-pre-compile-ops test failure. I just did some debugging on this and I see that for some reason if you set DS_ENABLE_NINJA=1 this fails but if you don't use ninja the pre-compile works fine. I remember we used to have lots of issues with using ninja but then it was re-enabled in #5088. Not sure if this PR introduces something sublte that doesn't work w. ninja?

@mrwyattii @loadams any ideas?

jeffra · 2024-03-29T22:58:43Z

@microsoft-github-policy-service agree company="Snowflake"

…e-compile

loadams · 2024-04-01T21:16:45Z

Regarding the nv-pre-compile-ops test failure. I just did some debugging on this and I see that for some reason if you set DS_ENABLE_NINJA=1 this fails but if you don't use ninja the pre-compile works fine. I remember we used to have lots of issues with using ninja but then it was re-enabled in #5088. Not sure if this PR introduces something sublte that doesn't work w. ninja?

@mrwyattii @loadams any ideas?

@jeffra - No immediate guesses, but I'll take a look

jeffra · 2024-04-01T21:19:32Z

Thanks @loadams! I believe this is resolved now, I figured out the issue with ninja enabled. I have to disable the pre-compile for this op for the test since it includes V100 compute target which is not compatible with this op (since it only works with bf16). I was able to get all the CI tests to pass (before updating with latest master right now).

I think now it just needs a review from folks? :)

loadams · 2024-04-01T21:25:48Z

Thanks @loadams! I believe this is resolved now, I figured out the issue with ninja enabled. I have to disable the pre-compile for this op for the test since it includes V100 compute target which is not compatible with this op (since it only works with bf16). I was able to get all the CI tests to pass (before updating with latest master right now).

I think now it just needs a review from folks? :)

That makes sense, I hadn't reviewed which is probably why I missed that you already identified the issue :)

mrwyattii

@arashashari and/or @JamesTheZ please add your review as well, thank you

csrc/fp_quantizer/quantize.cu

deepspeed/ops/fp_quantizer/quantize.py

sfc-gh-reyazda · 2024-04-03T17:22:40Z

@arashashari and/or @JamesTheZ please add your review as well, thank you

@mrwyattii, probably you meant @arashb :) btw, @arashashari is also welcome to review this ;)

JamesTheZ

LGTM

@rajhans

Optimized version of `nn.Linear` that adds features such as: * LoRA w. base weight sharding * FP [6,8,12] quantization Depends on #5336 being merged first Co-authored-by: @rajhans Co-authored-by: @aurickq --------- Co-authored-by: Rajhans Samdani <rajhans.samdani@snowflake.com> Co-authored-by: Jeff Rasley <jeff.rasley@snowflake.com>

Flexible-bit quantizer-dequantizer library with fp6/fp12/fp8 support Requires Ampere+ architecture, this is due to the initial focus of this op only on `bfloat16` input types. Co-authored-by: Reza Yazdani <reza.yazdani@snowflake.com>

@rajhans

Optimized version of `nn.Linear` that adds features such as: * LoRA w. base weight sharding * FP [6,8,12] quantization Depends on microsoft#5336 being merged first Co-authored-by: @rajhans Co-authored-by: @aurickq --------- Co-authored-by: Rajhans Samdani <rajhans.samdani@snowflake.com> Co-authored-by: Jeff Rasley <jeff.rasley@snowflake.com>

jeffra added 4 commits March 29, 2024 11:55

test

010fa30

fp[6,8,12] quantizer op

9936796

op builder

cc3096f

cleanup

483d8d9

jeffra requested review from mrwyattii, awan-10, arashb, tjruwase and loadams as code owners March 29, 2024 21:25

jeffra and others added 7 commits March 31, 2024 10:35

fp quantizer assumes ampere and above arch, also disable ninja for pr…

a53cd0b

…e-compile

ifdef bf16 in reduction utils

4092e46

skip on cpu

90c13db

cannot run fp quant on v100

bf390f4

fix missing import

ee4aa3f

move qtorch import to be after pytest skip

23c06bf

Merge branch 'master' into fp-quantizer

fd564a7

jeffra mentioned this pull request Apr 2, 2024

OptimizedLinear implementation #5355

Merged

mrwyattii approved these changes Apr 3, 2024

View reviewed changes

JamesTheZ reviewed Apr 3, 2024

View reviewed changes

csrc/fp_quantizer/quantize.cu Show resolved Hide resolved

deepspeed/ops/fp_quantizer/quantize.py Show resolved Hide resolved

JamesTheZ approved these changes Apr 4, 2024

View reviewed changes

loadams added this pull request to the merge queue Apr 4, 2024

Merged via the queue into microsoft:master with commit 3fbd01c Apr 4, 2024
14 checks passed

maziyarpanahi mentioned this pull request Apr 25, 2024

Add support to ArcticForCausalLM ggerganov/llama.cpp#6877

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP [6,8,12] quantizer op #5336

FP [6,8,12] quantizer op #5336

jeffra commented Mar 29, 2024 •

edited

jeffra commented Mar 29, 2024

jeffra commented Mar 29, 2024

loadams commented Apr 1, 2024

jeffra commented Apr 1, 2024

loadams commented Apr 1, 2024

mrwyattii left a comment

sfc-gh-reyazda commented Apr 3, 2024

JamesTheZ left a comment

FP [6,8,12] quantizer op #5336

FP [6,8,12] quantizer op #5336

Conversation

jeffra commented Mar 29, 2024 • edited

jeffra commented Mar 29, 2024

jeffra commented Mar 29, 2024

loadams commented Apr 1, 2024

jeffra commented Apr 1, 2024

loadams commented Apr 1, 2024

mrwyattii left a comment

Choose a reason for hiding this comment

sfc-gh-reyazda commented Apr 3, 2024

JamesTheZ left a comment

Choose a reason for hiding this comment

jeffra commented Mar 29, 2024 •

edited