server : fix crash when system prompt is bigger than batch size #5714

compilade · 2024-02-25T16:25:44Z

The system prompt is now decoded in batches.

Maybe llama_decode should eventually have some kind of built-in auto-batching since forgetting to split a batch seems common in the examples.

I also fixed a problem where n_past would skip a pos value when the prefix of a prompt is fully matching the tokens in the cache.

The system prompt is now decoded in batches. * server : fix off-by-one n_past when start of prompt matches whole cache The tokens right after the matching part would otherwise skip a pos value.

ggerganov

Maybe llama_decode should eventually have some kind of built-in auto-batching since forgetting to split a batch seems common in the examples.

~~It should be added to common and eventually the new llamax library will abstract this~~ (see below)

slaren · 2024-02-25T17:07:36Z

I plan to extend llama_decode to automatically split the batches if they exceed n_batch for pipeline parallelism. It is already implemented in the demo PR.

phymbert · 2024-02-25T17:15:44Z

@slaren Hi, thanks for the fix, would it be possible to add a simple test ?

slaren · 2024-02-25T17:21:43Z

After it is implemented sure, but it is not merged yet. It still needs more work, but it should be done soon.

…ganov#5714) The system prompt is now decoded in batches. * server : fix off-by-one n_past when start of prompt matches whole cache The tokens right after the matching part would otherwise skip a pos value.

server : fix crash when system prompt is bigger than batch size

63d4095

The system prompt is now decoded in batches. * server : fix off-by-one n_past when start of prompt matches whole cache The tokens right after the matching part would otherwise skip a pos value.

ggerganov approved these changes Feb 25, 2024

View reviewed changes

ggerganov merged commit f762501 into ggerganov:master Feb 25, 2024
59 of 108 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : fix crash when system prompt is bigger than batch size #5714

server : fix crash when system prompt is bigger than batch size #5714

compilade commented Feb 25, 2024

ggerganov left a comment •

edited

slaren commented Feb 25, 2024

phymbert commented Feb 25, 2024

slaren commented Feb 25, 2024

server : fix crash when system prompt is bigger than batch size #5714

server : fix crash when system prompt is bigger than batch size #5714

Conversation

compilade commented Feb 25, 2024

ggerganov left a comment • edited

Choose a reason for hiding this comment

slaren commented Feb 25, 2024

phymbert commented Feb 25, 2024

slaren commented Feb 25, 2024

ggerganov left a comment •

edited