-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Closed
Labels
bug-unconfirmedhigh severityUsed to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)
Description
What happened?
Since commit b3188 llama-cli produce incoherent output on multi-gpu system with CUDA and row tensor splitting.
Layer tensor split works fine but is actually almost twice slower.
GPU are 3x Nvidia Tesla + 3090
All future commits seems to be affected.
Name and Version
llama-cli version b3188 built on Debian 12.
What operating system are you seeing the problem on?
Linux
Relevant log output
No response
Metadata
Metadata
Assignees
Labels
bug-unconfirmedhigh severityUsed to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)