SPLIT_MODE_TENSOR and backend sampling incompatible? #24048

dagbdagb · 2026-06-02T20:59:11Z

dagbdagb
Jun 2, 2026

Now that I can do MTP and mmproj together with split mode tensor all at the same time, I see a nice performance boost on my 3090s. (qwen3.6-27B-q8 now starting out at 60 t/s)

But I do notice:

0.23.147.134 W set_sampler: backend sampling not supported with SPLIT_MODE_TENSOR; using CPU
0.23.147.134 W common_speculative_impl_draft_mtp: backend offload failed for seq_id=0; using CPU sampler

Is split mode tensor fundamentally incompatible with backend sampling, or is it "just" an issue of the code not having been written yet?

Wroxeter · 2026-06-04T20:40:24Z

Wroxeter
Jun 4, 2026

I am also seeing this error in logs, but my performance seems suboptimal. However I am running dual RTX4000 Ada, not 3090s.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPLIT_MODE_TENSOR and backend sampling incompatible? #24048

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

SPLIT_MODE_TENSOR and backend sampling incompatible? #24048

Uh oh!

dagbdagb Jun 2, 2026

Replies: 1 comment

Uh oh!

Wroxeter Jun 4, 2026

dagbdagb
Jun 2, 2026

Wroxeter
Jun 4, 2026