Skip to content

Misc. bug: llama_context_params.n_seq_max decides the amount of sequences instead of being a max #16432

@Lyrcaxis

Description

@Lyrcaxis

Name and Version

LLAMA API (Sunday, Oct 5, 13:43pm UTC)
DLLs built by github workers :p

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

libllama (core library)

Problem description & steps to reproduce

Actual bugs:

  • n_seq_max determines the actual amount of sequences, instead of just being a max.
    In addition, it splits the context between the n seqs, effectively making each seq have context_size/n_seq_max.
  • Specifying n_seq_max = 2 but passing 3 seqs throws an appropriate exception during decode,
    ...but leaving it to = 1 (default) will just return bad decode results yet allow continuing anyway.

Actual problem: Existence of n_seq_max in the llama_context_params

This does a lot of magic, like splitting the total context size to n_max_seq partitions, even if some seqs are unused.

Few months ago, the context would automatically & dynamically self-handle its sequence count during updates.
Now, the context gets initialized with the specified amount of n_seq_max seqs, and gets locked on that forever.

The n_seq_max variable was fixed, but I find that a flexibility downgrade from what used to exist.

In addition, this completely disallows scheduling on the same llama_context. I'll expand on this below.

Currently, to dynamically support different amount of batches or scheduling, what one would have to do is:

  1. Create a context with say 10 batches (current load), start decoding.
  2. Say new request arrives while inferencing. Now, we want 11 batches.
  3. Create new context and copy over the old one -- the cache is lost & can't reuse the memory handle.

Proposal:

Get rid of the n_seq_max completely, or, at worst-case scenario, make it just a safety net for context manipulation, instead of having it decide the actual amount of sequences that will always persist.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions