Misc. bug: `llama_context_params.n_seq_max` decides the amount of sequences instead of being a `max`

### Name and Version

LLAMA API (Sunday, Oct 5, 13:43pm UTC)
DLLs built by github workers :p


### Operating systems

Windows

### Which llama.cpp modules do you know to be affected?

libllama (core library)

### Problem description & steps to reproduce

### Actual bugs:
- `n_seq_max` determines the actual amount of sequences, instead of just being a `max`.
   In addition, it splits the context between the `n` seqs, effectively making each seq have `context_size/n_seq_max`.
- Specifying `n_seq_max = 2` but passing 3 seqs throws an appropriate exception during decode,
    ...but leaving it to `= 1` (default) will just return bad decode results yet allow continuing anyway.

###  Actual problem: Existence of `n_seq_max` in the `llama_context_params`
This does a lot of magic, like splitting the total context size to `n_max_seq` partitions, even if some seqs are unused.

Few months ago, the context would automatically & dynamically self-handle its sequence count during updates.
Now, the context gets initialized with the specified amount of `n_seq_max` seqs, and gets locked on that forever.

The `n_seq_max` variable was *fixed*, but I find that a flexibility downgrade from what used to exist.

In addition, this completely disallows scheduling on the same `llama_context`. I'll expand on this below.

Currently, to dynamically support different amount of batches or scheduling, what one would have to do is:
1) Create a context with say 10 batches (current load), start decoding.
2) Say new request arrives while inferencing. Now, we want 11 batches.
3) Create new context and copy over the old one -- the cache is lost & can't reuse the `memory` handle.

### Proposal:
Get rid of the `n_seq_max` completely, or, at worst-case scenario, make it just a safety net for context manipulation, instead of having it decide the actual amount of sequences that will always persist.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: `llama_context_params.n_seq_max` decides the amount of sequences instead of being a `max` #16432

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Problem description & steps to reproduce

Actual bugs:

Actual problem: Existence of `n_seq_max` in the `llama_context_params`

Proposal:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: llama_context_params.n_seq_max decides the amount of sequences instead of being a max #16432

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Problem description & steps to reproduce

Actual bugs:

Actual problem: Existence of n_seq_max in the llama_context_params

Proposal:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Misc. bug: `llama_context_params.n_seq_max` decides the amount of sequences instead of being a `max` #16432

Actual problem: Existence of `n_seq_max` in the `llama_context_params`