Skip to content

Bug: b3188 breaks row split mode for multiple GPUs #8801

@m-arbaro

Description

@m-arbaro

What happened?

Since commit b3188 llama-cli produce incoherent output on multi-gpu system with CUDA and row tensor splitting.
Layer tensor split works fine but is actually almost twice slower.
GPU are 3x Nvidia Tesla + 3090
All future commits seems to be affected.

Name and Version

llama-cli version b3188 built on Debian 12.

What operating system are you seeing the problem on?

Linux

Relevant log output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug-unconfirmedhigh severityUsed to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions