Maybe it would better to have a diagram to show how llama.cpp process inferences

I'm using llama.cpp to deploy deepseek-r1-671B-Q4_0 weights, but I found documention/README.md is barely detailed; I even have to read the source to understand what would happen if I make some flag on.  For example '--gpu-layers', according to code it would be a key for PP, but no word was put on the detail in the document, but it found no better performance  when i make it greater than model tensor layers.
`
  // TODO: move these checks to ggml_backend_sched
  // enabling pipeline parallelism in the scheduler increases memory usage, so it is only done when necessary
  bool pipeline_parallel =
      model->n_devices() > 1 &&
      model->params.n_gpu_layers > (int)model->hparams.n_layer &&
      model->params.split_mode == LLAMA_SPLIT_MODE_LAYER &&
      params.offload_kqv;
`
it would highly appreciated if I could have a prcoessing diargam, better if it has some related flag attached to each node;

thanks all the way

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Maybe it would better to have a diagram to show how llama.cpp process inferences #11967

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Maybe it would better to have a diagram to show how llama.cpp process inferences #11967

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions