Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantization produces non-deterministic weights #27

Closed
MarkSchmidty opened this issue Mar 12, 2023 · 3 comments
Closed

Quantization produces non-deterministic weights #27

MarkSchmidty opened this issue Mar 12, 2023 · 3 comments

Comments

@MarkSchmidty
Copy link

MarkSchmidty commented Mar 12, 2023

Below is a segment of the 7B 4bit weights generated using the line in the same environment with two different video cards. An A4000 (on the left) and an A6000 (on the right).

Notice how every 20-40bytes there is a half byte difference? These differences are always off by one, a B becomes an A and a 5 becomes a 6 etc. This issue seems to persist across all model sizes when producing weights on different cards.

image

No idea what is causing it.

Without reproducible builds it is hard to say if we're actually producing the same weights.

@qwopqwop200
Copy link
Owner

related to this issue.(IST-DASLab/gptq#1)

@MarkSchmidty
Copy link
Author

MarkSchmidty commented Mar 12, 2023

@qwopqwop200 is it possible the CUDA_VISIBLE_DEVICES variable is somehow being used somewhere in the quant code where it shouldn't be? I see no references to it. But the only difference between the two models above is one was generated with CUDA_VISIBLE_DEVICES=0 and the other with CUDA_VISIBLE_DEVICES=1

@qwopqwop200
Copy link
Owner

That doesn't seem to happen. Also, this seems to be caused by different GPUs.
The difference in performance caused by these differences is negligible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants