Fix quantization tests #29914

SunMarc · 2024-03-27T17:53:14Z

What does this PR do ?

This PR fixes all failing tests in the quantization CI. We also split the workflow so that we have one job per quantization method and device.
You can see that everything works fine from

the run page: https://github.com/huggingface/transformers/actions/runs/8483098195/job/23243697001
the report (internal): https://huggingface.slack.com/archives/C06KVUMU5JA/p1711728524491529

cc @ydshieh @younesbelkada

SunMarc · 2024-03-27T17:56:14Z

tests/quantization/autoawq/test_awq.py

+    custom_mapping_model_id = "TheBloke/Mistral-7B-v0.1-AWQ"
+    custom_model_revision = "f186bcfa9edbe2a4334262ec1e67f23e53ed1ae7"


I changed to a mistral model since the yi model is based on remote code and we had breaking changes ... (Dynamic Cache). This should be fine since we want to test if we can pass custom fuse mapping.

SunMarc · 2024-03-27T17:57:32Z

docker/transformers-quantization-latest-gpu/Dockerfile

@@ -43,7 +46,8 @@ RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/opt
 RUN python3 -m pip install --no-cache-dir aqlm[gpu]==1.0.2

 # Add autoawq for quantization testing
-RUN python3 -m pip install --no-cache-dir https://github.com/casper-hansen/AutoAWQ/releases/download/v0.2.0/autoawq-0.2.0+cu118-cp38-cp38-linux_x86_64.whl
+# >=v0.2.3 needed for compatibility with torch 2.2.1
+RUN python3 -m pip install --no-cache-dir https://github.com/casper-hansen/AutoAWQ/releases/download/v0.2.3/autoawq-0.2.3+cu118-cp38-cp38-linux_x86_64.whl


With torch2.2.1, we need a newer version of autoawq. Otherwise, we have issues importing the kernels =(

HuggingFaceDocBuilderDev · 2024-03-27T18:14:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

This reverts commit 399a5f9.

ydshieh · 2024-04-03T08:27:37Z

utils/notification_service_quantization.py

I personally would probably try not to have new notification files, but I understand it's not easy to make the necessary changes in the currently existing notification_service.py.

Yeah, at the end of the day, I think it is easier to recreate a notification file since this is a different CI and I wanted to have a detailed overview of the results just like what you did for models. Putting everything in notification_service.py can be done and, it will be a little bit complex.

ydshieh · 2024-04-03T08:33:37Z

Thank you @SunMarc on this work, super appreciated!

From the run page: https://github.com/huggingface/transformers/actions/runs/8483098195/job/23243697001
and the report (internal): https://huggingface.slack.com/archives/C06KVUMU5JA/p1711728524491529
(it would be nice if those are provided in the PR description),

I don't have specific question but just a slight question on the notification stuff.

ydshieh · 2024-04-03T08:38:22Z

utils/notification_service_quantization.py

+        text = f"{self.n_failures} failures out of {self.n_tests} tests," if self.n_failures else "All tests passed."
+
+        self.thread_ts = client.chat_postMessage(
+            channel="#transformers-ci-daily-quantization",


well, I didn't know this works. Nice!

Yes I learned this from the peft/accelerate CI log report here.

ydshieh

Good on my side. Thanks.

younesbelkada

Thanks so much @SunMarc for fixing the quantization tests and refactoring the workflow!

ydshieh · 2024-04-05T12:11:05Z

Hi @SunMarc

If you are OK, I would merge #30012 today. I can help to make the necessary changes to this PR to resolve the conflicts (or make it compatible to #30012).

WDYT?

SunMarc · 2024-04-05T12:17:01Z

Feel free to merge your PR @ydshieh ! Yes, that would be great if you could resolve the conflits if it is not too complicated for you !

ydshieh · 2024-04-05T12:21:42Z

Happy to help - if it is difficult for me , I guess it would be even difficult to you 😆 (to understand what I changed in my PR)

ydshieh · 2024-04-05T12:28:50Z

@ArthurZucker No need to review until I request a reivew (after rebasing this PR on #30012)

ydshieh · 2024-04-05T13:50:35Z

Back to this next week.

ArthurZucker

Looks good thank you 🙂

ArthurZucker · 2024-04-05T14:11:00Z

.github/workflows/self-scheduled.yml

+        run: |
+          echo "quantization_matrix=$(python3 -c 'import os; tests = os.getcwd(); quantization_tests = os.listdir(os.path.join(tests, "quantization")); d = sorted(list(filter(os.path.isdir, [f"quantization/{x}" for x in quantization_tests]))) ;  print(d)')" >> $GITHUB_OUTPUT


(py39) arthur@hf-dgx-01:~/transformers/tests$ find "$(pwd)/quantization" -maxdepth 1 -type d | sort /home/arthur/transformers/tests/quantization /home/arthur/transformers/tests/quantization/aqlm_integration /home/arthur/transformers/tests/quantization/autoawq /home/arthur/transformers/tests/quantization/bnb /home/arthur/transformers/tests/quantization/gptq /home/arthur/transformers/tests/quantization/quanto_integration

seems a lot simpler no?

$(find "$(pwd)/quantization" -mindepth 1 -maxdepth 1 -type d | sort)

Yes indeed ! I'll switch to your solution =)

Make sure we don't have the prefix (here /home/arthur/transformers/ but whatever it is in a system) in the outputs.

ydshieh

Thanks again @SunMarc !

I will keep the new notification file at this moment as you created. Integrate it to the existing one is good but not urgent (and it takes some time 😆 )

* revert back to torch 2.1.1 * run test * switch to torch 2.2.1 * udapte dockerfile * fix awq tests * fix test * run quanto tests * update tests * split quantization tests * fix * fix again * final fix * fix report artifact * build docker again * Revert "build docker again" This reverts commit 399a5f9. * debug * revert * style * new notification system * testing notfication * rebuild docker * fix_prev_ci_results * typo * remove warning * fix typo * fix artifact name * debug * issue fixed * debug again * fix * fix time * test notif with faling test * typo * issues again * final fix ? * run all quantization tests again * remove name to clear space * revert modfiication done on workflow * fix * build docker * build only quant docker * fix quantization ci * fix * fix report * better quantization_matrix * add print * revert to the basic one

SunMarc added 6 commits March 27, 2024 12:17

revert back to torch 2.1.1

943411a

run test

0ce4613

switch to torch 2.2.1

41aed78

udapte dockerfile

785943b

fix awq tests

886c136

fix test

e56b9b2

SunMarc commented Mar 27, 2024

View reviewed changes

SunMarc added 21 commits March 28, 2024 14:17

run quanto tests

125369b

update tests

9d4f428

split quantization tests

d0fde18

fix

b25c6c4

fix again

f729157

final fix

e603cd3

fix report artifact

9af48e1

build docker again

399a5f9

Revert "build docker again"

55c5c98

This reverts commit 399a5f9.

debug

7fbc243

revert

5483d66

style

787bd8b

new notification system

e864f02

testing notfication

78d0dcb

rebuild docker

a29ee10

fix_prev_ci_results

8cb49c4

typo

c537d46

remove warning

54778f4

fix typo

7095d97

fix artifact name

5a1d2a1

debug

49f522c

ydshieh reviewed Apr 3, 2024

View reviewed changes

SunMarc requested a review from ArthurZucker April 3, 2024 09:40

ydshieh approved these changes Apr 3, 2024

View reviewed changes

younesbelkada approved these changes Apr 4, 2024

View reviewed changes

ydshieh removed the request for review from ArthurZucker April 5, 2024 12:28

ArthurZucker approved these changes Apr 5, 2024

View reviewed changes

SunMarc added 9 commits April 9, 2024 11:38

Merge remote-tracking branch 'upstream/main' into fix-quantization-tests

d6c9d70

build docker

247f75f

build only quant docker

ff0d5a8

fix quantization ci

176e055

fix

ec7cab1

fix report

a4ad871

better quantization_matrix

2fff50d

add print

813f4c1

revert to the basic one

e26ca6d

SunMarc requested a review from ydshieh April 9, 2024 14:05

ydshieh approved these changes Apr 9, 2024

View reviewed changes

SunMarc merged commit 58a939c into main Apr 9, 2024
22 checks passed

SunMarc deleted the fix-quantization-tests branch April 9, 2024 15:10

SunMarc mentioned this pull request Apr 9, 2024

[CI] Fix setup #30147

Merged

ydshieh mentioned this pull request Apr 18, 2024

Fix missing prev_ci_results #30313

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix quantization tests #29914

Fix quantization tests #29914

SunMarc commented Mar 27, 2024 •

edited

SunMarc Mar 27, 2024 •

edited

SunMarc Mar 27, 2024

HuggingFaceDocBuilderDev commented Mar 27, 2024

ydshieh Apr 3, 2024

SunMarc Apr 3, 2024 •

edited

ydshieh commented Apr 3, 2024

ydshieh Apr 3, 2024

SunMarc Apr 3, 2024

ydshieh left a comment

younesbelkada left a comment

ydshieh commented Apr 5, 2024

SunMarc commented Apr 5, 2024

ydshieh commented Apr 5, 2024

ydshieh commented Apr 5, 2024

ydshieh commented Apr 5, 2024

ArthurZucker left a comment

ArthurZucker Apr 5, 2024

ArthurZucker Apr 5, 2024

SunMarc Apr 5, 2024

ydshieh Apr 5, 2024

ydshieh left a comment

		custom_mapping_model_id = "TheBloke/Mistral-7B-v0.1-AWQ"
		custom_model_revision = "f186bcfa9edbe2a4334262ec1e67f23e53ed1ae7"

		run: \|
		echo "quantization_matrix=$(python3 -c 'import os; tests = os.getcwd(); quantization_tests = os.listdir(os.path.join(tests, "quantization")); d = sorted(list(filter(os.path.isdir, [f"quantization/{x}" for x in quantization_tests]))) ; print(d)')" >> $GITHUB_OUTPUT

Fix quantization tests #29914

Fix quantization tests #29914

Conversation

SunMarc commented Mar 27, 2024 • edited

What does this PR do ?

SunMarc Mar 27, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Mar 27, 2024

Choose a reason for hiding this comment

SunMarc Apr 3, 2024 • edited

Choose a reason for hiding this comment

ydshieh commented Apr 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ydshieh left a comment

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

ydshieh commented Apr 5, 2024

SunMarc commented Apr 5, 2024

ydshieh commented Apr 5, 2024

ydshieh commented Apr 5, 2024

ydshieh commented Apr 5, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ydshieh left a comment

Choose a reason for hiding this comment

SunMarc commented Mar 27, 2024 •

edited

SunMarc Mar 27, 2024 •

edited

SunMarc Apr 3, 2024 •

edited