Skip to content

Conversation

@quic-amitraj
Copy link
Contributor

@quic-amitraj quic-amitraj commented Sep 7, 2024

  1. Adding support for GPTQ quantized models.
  2. Shifted all the common code to qunatizer_utils.py.
  3. Added test for GPTQ model.

@quic-amitraj quic-amitraj changed the title Adding support for gptq models Adding support for GPTQ models Sep 7, 2024
@quic-amitraj quic-amitraj marked this pull request as draft September 7, 2024 19:43
@quic-amitraj quic-amitraj marked this pull request as ready for review September 8, 2024 12:45
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
@quic-amitraj
Copy link
Contributor Author

quic-amitraj commented Sep 13, 2024

Output for model -https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ

  1. Without TS-
    image
  2. TS with 4 devices-
    image
  3. With CB with batch size=3-
    image

@ochougul ochougul merged commit 14b4de9 into quic:awq+gptq Sep 13, 2024
ochougul pushed a commit that referenced this pull request Sep 13, 2024
* Adding support for gptq models

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Code cleaning and formating

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* ruff format and fixed some bug

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Added tests for gptq

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Bug-fix-1

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* fixed bugs-2

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* fixed bug-3

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Added docstring

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Addressed comments

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Addressed comments

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* fixed bugs-3

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* ruff check and format

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Addressed comments-3

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

---------

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
ochougul pushed a commit that referenced this pull request Sep 13, 2024
* Adding support for gptq models

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Code cleaning and formating

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* ruff format and fixed some bug

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Added tests for gptq

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Bug-fix-1

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* fixed bugs-2

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* fixed bug-3

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Added docstring

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Addressed comments

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Addressed comments

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* fixed bugs-3

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* ruff check and format

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Addressed comments-3

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

---------

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
ochougul added a commit that referenced this pull request Sep 13, 2024
* Awq feature (#100)

* added preprocess layer before loading quantized awq weights

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added onnx export

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added ScaledActivation class

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* refactoring the code to right places and added one single test for now

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* cleaned code

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added proper tests, added decorator for updating quantizers, cleaned code

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* fixed CLI

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added auto file for decorator

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

---------

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* bugfix for tests

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* fixed tests for AWQ model

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* Adding support for GPTQ models (#103)

* Adding support for gptq models

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Code cleaning and formating

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* ruff format and fixed some bug

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Added tests for gptq

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Bug-fix-1

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* fixed bugs-2

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* fixed bug-3

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Added docstring

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Addressed comments

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Addressed comments

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* fixed bugs-3

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* ruff check and format

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Addressed comments-3

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

---------

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added liscence at top for missing file

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added export_and_compile and fixed bugs

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* removed GPTQ test

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* removed threading from pytest

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

---------

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Co-authored-by: Amit Raj <168538872+quic-amitraj@users.noreply.github.com>
quic-amitraj added a commit to quic-amitraj/efficient-transformers that referenced this pull request Sep 16, 2024
* Awq feature (quic#100)

* added preprocess layer before loading quantized awq weights

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added onnx export

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added ScaledActivation class

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* refactoring the code to right places and added one single test for now

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* cleaned code

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added proper tests, added decorator for updating quantizers, cleaned code

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* fixed CLI

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added auto file for decorator

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

---------

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* bugfix for tests

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* fixed tests for AWQ model

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* Adding support for GPTQ models (quic#103)

* Adding support for gptq models

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Code cleaning and formating

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* ruff format and fixed some bug

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Added tests for gptq

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Bug-fix-1

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* fixed bugs-2

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* fixed bug-3

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Added docstring

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Addressed comments

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Addressed comments

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* fixed bugs-3

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* ruff check and format

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Addressed comments-3

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

---------

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added liscence at top for missing file

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added export_and_compile and fixed bugs

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* removed GPTQ test

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* removed threading from pytest

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

---------

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Co-authored-by: Amit Raj <168538872+quic-amitraj@users.noreply.github.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
quic-amitraj added a commit to quic-amitraj/efficient-transformers that referenced this pull request Sep 16, 2024
* Awq feature (quic#100)

* added preprocess layer before loading quantized awq weights

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added onnx export

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added ScaledActivation class

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* refactoring the code to right places and added one single test for now

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* cleaned code

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added proper tests, added decorator for updating quantizers, cleaned code

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* fixed CLI

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added auto file for decorator

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

---------

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* bugfix for tests

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* fixed tests for AWQ model

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* Adding support for GPTQ models (quic#103)

* Adding support for gptq models

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Code cleaning and formating

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* ruff format and fixed some bug

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Added tests for gptq

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Bug-fix-1

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* fixed bugs-2

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* fixed bug-3

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Added docstring

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Addressed comments

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Addressed comments

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* fixed bugs-3

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* ruff check and format

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

* Addressed comments-3

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

---------

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added liscence at top for missing file

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* added export_and_compile and fixed bugs

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* removed GPTQ test

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

* removed threading from pytest

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>

---------

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Co-authored-by: Amit Raj <168538872+quic-amitraj@users.noreply.github.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
@quic-amitraj quic-amitraj deleted the gptq_support branch November 21, 2024 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants