Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP [6,8,12] quantizer op #5336

Merged
merged 11 commits into from Apr 4, 2024
Merged

Conversation

jeffra
Copy link
Contributor

@jeffra jeffra commented Mar 29, 2024

Flexible-bit quantizer-dequantizer library with fp6/fp12/fp8 support

Requires Ampere+ architecture, this is due to the initial focus of this op only on bfloat16 input types.

Co-authored-by: Reza Yazdani reza.yazdani@snowflake.com

@jeffra
Copy link
Contributor Author

jeffra commented Mar 29, 2024

Regarding the nv-pre-compile-ops test failure. I just did some debugging on this and I see that for some reason if you set DS_ENABLE_NINJA=1 this fails but if you don't use ninja the pre-compile works fine. I remember we used to have lots of issues with using ninja but then it was re-enabled in #5088. Not sure if this PR introduces something sublte that doesn't work w. ninja?

@mrwyattii @loadams any ideas?

@jeffra
Copy link
Contributor Author

jeffra commented Mar 29, 2024

@microsoft-github-policy-service agree company="Snowflake"

@loadams
Copy link
Contributor

loadams commented Apr 1, 2024

Regarding the nv-pre-compile-ops test failure. I just did some debugging on this and I see that for some reason if you set DS_ENABLE_NINJA=1 this fails but if you don't use ninja the pre-compile works fine. I remember we used to have lots of issues with using ninja but then it was re-enabled in #5088. Not sure if this PR introduces something sublte that doesn't work w. ninja?

@mrwyattii @loadams any ideas?

@jeffra - No immediate guesses, but I'll take a look

@jeffra
Copy link
Contributor Author

jeffra commented Apr 1, 2024

Thanks @loadams! I believe this is resolved now, I figured out the issue with ninja enabled. I have to disable the pre-compile for this op for the test since it includes V100 compute target which is not compatible with this op (since it only works with bf16). I was able to get all the CI tests to pass (before updating with latest master right now).

I think now it just needs a review from folks? :)

@loadams
Copy link
Contributor

loadams commented Apr 1, 2024

Thanks @loadams! I believe this is resolved now, I figured out the issue with ninja enabled. I have to disable the pre-compile for this op for the test since it includes V100 compute target which is not compatible with this op (since it only works with bf16). I was able to get all the CI tests to pass (before updating with latest master right now).

I think now it just needs a review from folks? :)

That makes sense, I hadn't reviewed which is probably why I missed that you already identified the issue :)

Copy link
Contributor

@mrwyattii mrwyattii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arashashari and/or @JamesTheZ please add your review as well, thank you

@sfc-gh-reyazda
Copy link
Contributor

@arashashari and/or @JamesTheZ please add your review as well, thank you

@mrwyattii, probably you meant @arashb :) btw, @arashashari is also welcome to review this ;)

Copy link
Contributor

@JamesTheZ JamesTheZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@loadams loadams added this pull request to the merge queue Apr 4, 2024
Merged via the queue into microsoft:master with commit 3fbd01c Apr 4, 2024
14 checks passed
loadams pushed a commit that referenced this pull request Apr 23, 2024
Optimized version of `nn.Linear` that adds features such as:
      * LoRA w. base weight sharding
      * FP [6,8,12] quantization

Depends on #5336 being merged first

Co-authored-by: @rajhans
Co-authored-by: @aurickq

---------

Co-authored-by: Rajhans Samdani <rajhans.samdani@snowflake.com>
Co-authored-by: Jeff Rasley <jeff.rasley@snowflake.com>
rraminen pushed a commit to ROCm/DeepSpeed that referenced this pull request May 9, 2024
Flexible-bit quantizer-dequantizer library with fp6/fp12/fp8 support

Requires Ampere+ architecture, this is due to the initial focus of this
op only on `bfloat16` input types.

Co-authored-by: Reza Yazdani <reza.yazdani@snowflake.com>
rraminen pushed a commit to ROCm/DeepSpeed that referenced this pull request May 9, 2024
Optimized version of `nn.Linear` that adds features such as:
      * LoRA w. base weight sharding
      * FP [6,8,12] quantization

Depends on microsoft#5336 being merged first

Co-authored-by: @rajhans
Co-authored-by: @aurickq

---------

Co-authored-by: Rajhans Samdani <rajhans.samdani@snowflake.com>
Co-authored-by: Jeff Rasley <jeff.rasley@snowflake.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants