Bug: b3188 breaks row split mode for multiple GPUs

### What happened?

Since commit b3188 llama-cli produce incoherent output on multi-gpu system with CUDA and row tensor splitting.
Layer tensor split works fine but is actually almost twice slower.
GPU are 3x Nvidia Tesla + 3090
All future commits seems to be affected.

### Name and Version

llama-cli version b3188 built on Debian 12.

### What operating system are you seeing the problem on?

Linux

### Relevant log output

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: b3188 breaks row split mode for multiple GPUs #8801

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: b3188 breaks row split mode for multiple GPUs #8801

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions