TurboPrefill: Multi-GPU prefill acceleration for llama.cpp #24092

sergey-automation · 2026-06-03T21:28:17Z

sergey-automation
Jun 3, 2026

TurboPrefill is an attempt to make layer-split multi-GPU configurations spend less time waiting and more time computing during prefill.

[TurboPrefill Repository](https://github.com/sergey-automation/TurboPrefill)

sergey-automation · 2026-06-04T05:20:28Z

sergey-automation
Jun 4, 2026
Author

https://github.com/sergey-automation/TurboPrefill

0 replies

sergey-automation · 2026-06-20T22:14:21Z

sergey-automation
Jun 20, 2026
Author

Update: I also validated the same scheduling approach on Vision Language Models (Qwen2.5-VL). The experiments suggest that VLM response waiting time can also be reduced by about 2× without changing model weights, prompts, quantization, or inference mathematics.

1 reply

sergey-automation Jun 20, 2026
Author

[](https://github.com/sergey-automation/TurboPrefill-VLM-Validation)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TurboPrefill: Multi-GPU prefill acceleration for llama.cpp #24092

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

TurboPrefill: Multi-GPU prefill acceleration for llama.cpp #24092

Uh oh!

sergey-automation Jun 3, 2026

Replies: 2 comments · 1 reply

Uh oh!

sergey-automation Jun 4, 2026 Author

Uh oh!

Uh oh!

sergey-automation Jun 20, 2026 Author

Uh oh!

sergey-automation Jun 20, 2026 Author

sergey-automation
Jun 3, 2026

Replies: 2 comments 1 reply

sergey-automation
Jun 4, 2026
Author

sergey-automation
Jun 20, 2026
Author

sergey-automation Jun 20, 2026
Author