-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete gibberish produced by any and all models only when device_map="auto". #2692
Comments
Hi @FanaticPythoner, thanks for the detailed report ! This is indeed strange that sequential works while it fails with auto (using "balanced"). Could you check what is the output of |
For the code:
Doing:
Prints:
Furthermore, in our current codebase, we have several different mechanism that handle model balancing. Changing |
And what is the |
For the code:
Doing:
Prints:
|
Let me send you the same code for comparison, but instead of starchat, using mixtral, which is larger. |
Oh makes sense why it works. It is because the model fits in a single gpu in the case of starchat. Yeah, let's check for mixtral. |
For the code:
Doing:
Prints:
Now, for the code:
Doing:
Prints:
I also looked at the result of |
Yes, in the above code, it uses the starchat template... Still works. |
It is probably a communication issue with your GPUs. I see that in "sequential", only two gpus are used. Maybe one quick way to solve this would be to run this model on only the first 2 GPUS by specifying |
The hardware/drivers has/have been triple checked by the bare metal provider. On my 3x 3090 setup, I don't use NVLink. Maybe that's the key. Or maybe it's something else. As an update, I tested both Sequential and Auto on llama 3 70b, in bfloat16. Both are unable to run the inference and throw:
Here are the device maps and the code that was used.
outputs:
outputs:
|
Would anyone care to look at it please? Let it be @SunMarc or someone else? I highly suppose it's an HF compatibility issue with NVLink, but I can't say with 100% certainty. |
We really would appreciate any help on this roadblock....much thx! |
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
System specs:
This:
Prints:
[{'generated_text': '<|system|>\n<|end|>\n<|user|>\nHow do I sort a list in Python?<|end|>\n<|assistant|> to. Air1 (\nь\nInfo plit An che a\n the weьь Share Share\n aremobatar\n…We brain be S jj jj'..., … …: no J\n,…AL more of… y they code lifefl\n -- B moreand.. L\nplitahph a after\n Ishare, E I I is L\n unel not Mid' I'’ …\n\n …" you a a South strength I I S said "no\n\n\n E E11\n EASC not Sh English. of of E |isse\n as that said said of said reg of The The– n a… Open. The The for | A after After\n was M open open over in been\n\n into,onAR down :-)mad cos I you to E,( not "a001 that vis m44\n\n\n of3\n re1 T by so itack in inententancy of is int Library to U U.. a a = ==Compression Itdata66 as111110 S'}]
While this:
Prints:
[{'generated_text': '<|system|>\n<|end|>\n<|user|>\nHow do I sort a list in Python?<|end|>\n<|assistant|>\nThere are multiple ways to sort a list in Python. One of the most common ways is to use the sort() method. Here is an example:\n\n```\nmy_list = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]\nmy_list.sort()\nprint(my_list)\n```\n\nThis will sort the list in place and print the sorted list.\n\nAnother way to sort a list is to use the sorted() function. This function returns a new sorted list and does not modify the original list. Here is an example:\n\n```\nmy_list = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]\nsorted_list = sorted(my_list)\nprint(sorted_list)\n```\n\nIn this example, the sorted_list variable will contain the sorted list and the my_list variable will remain unchanged.\n\nThere are also other sorting algorithms available in the built-in sort module, such as quicksort, heapsort, and merge sort. You'}]
Expected behavior
Both should print coherent text. This happens no matter the model chosen. In the above reproduction steps, the model used is
HuggingFaceH4/starchat-beta
. The exact same thing happens withmistralai/Mixtral-8x7B-Instruct-v0.1
, no matter if ran in bfloat16, float16, float32, or quantized / not quantized. The issue also occurs no matter the prompt.The issue, however, does NOT occur when device_map="sequential" is set (tested with HuggingFaceH4/starchat-beta only).
Furthermore, the issue does NOT occur with device_map="auto" on my home 3x RTX 3090 / Threadripper 3960x setup.
However, I cannot use sequential in our current production environment without making significant changes.
The text was updated successfully, but these errors were encountered: