Continuous generation in Outlines #667

rlouf · 2024-02-15T15:08:38Z

I am opening this issue to roughly sketch the next big milestone for Outlines, tentatively called "continuous generation". There are many rough edges still, and open questions.

The first goal is to allow sampling of sequences like these:

from outlines import generate, models

model = models.transformers("mistalai/Mistral-7B-v0.1")
generator = generate.text(model)

sequence = "What are the most popular types of vehicles?\n"
for i in range(6):
      sequence += f"{i}, "
      sequence += generator(sequence, stop_at=["\n"])
      sequence += "\n"

By "sampling these sequences" I mean being able to run, for instance, beam search and optimize the sequence as a whole rather than each generation separately.

All we have to do is to return a Sequence object instead of a string, with the following attributes and methods:

class Sequence:
    token_ids: torch.Tensor
    weights: torch.Tensor
    kv_cache: Tuple
    tokenizer: Tokenizer

    def __str__(self):
        return tokenizer.decode(token_ids)

Sequence should have the same feel as a string. Besides being able to print it, we should be able to slice it, add it to another string, another sequence, etc. and carry on:

class Sequence:
    ...
    def __getitem__(self, key):
        if isinstance(key, int):
            # Just return the character? There's not much more we can do here.
        if isintance(key, slice):
           # Different behavior depending on whether `start` is 0. If `start = 0` we can 
           # keep part of the KV Cache.  Otherwise we need to re-compute the KV 
           # Cache i.e. consider the `Sequence` as a new prompt. 
           #
           # We will likely need to split tokens. For instance if we call `sequence[:10]` and 
           # 10 is the letter `m` in `formida`. In this case we can encode and append `afor` 
           # to the previous token ids. Edge cases should automatically be handled when 
           # aligning prompt and generation. 
    def __add__(self, other):
         if isinstance(other, str):
            # Signal that KV cache + logprob need to be re-computed
         if isinstance(other, Sequence):
             # Concatenate token_ids
             # Concatenate logprobs
             # Signal that KV Cache after `other` needs to be recomputed

This should be enough to bring Outlines at feature-parity with other DSLs, while not being a DSL.

The text was updated successfully, but these errors were encountered:

cpfiffer · 2024-02-24T19:04:21Z

It may also be interesting to get the join token likelihood, if available. I'm not super familiar with outlines but I'd love to be able to compare Sequences probabilistic.

rlouf · 2024-02-24T19:09:31Z

We could store that in addition to the sequence weights (which can be, but are not necessarily, the log-probability of the sequence).

jeffreyftang · 2024-03-01T02:21:08Z

Hi @rlouf, I was directed towards this issue by @lapp0 as a prerequisite issue for #657. I'm interested in contributing, but would like to get a sense of the scope of work involved so that I don't make promises I can't keep.

miftahmoha · 2024-03-01T12:01:22Z

I'm also interested, currently working on it right now.

rlouf · 2024-03-01T12:34:11Z

Great! It is fairly involved and there are many important design decisions that need to be made, and we need to handle computation of the KV cache after concatenating text to a previous generation.

don't hesitate to open a draft PR asap so I can give some feedback early on.

rlouf · 2024-03-01T12:36:05Z

would like to get a sense of the scope of work involved so that I don't make promises I can't keep.

It is fairly involved, interleaving function calls should be easier to implement though.

lucasavila00 · 2024-03-28T02:46:16Z

LmScript, a graphical interface for Outlines programs, makes heavy usage of continuous generation.

We currently re-send the accumulated prompt for every generation call and handle the chat template on our end.

Better performance for continuous generation would be highly appreciated

roberthoenig · 2024-04-24T16:06:44Z

Super excited for this feature!

One note: It'd be great if continuous generation is implemented so that intermediate outputs can be processed and reused during generation:

sequence = "What are the most popular names of vehicles and the length of their names?\n"
for i in range(6):
      sequence += f"{i}, "
      vehicle_name_gen = generator(sequence, stop_at=["\n"])
      name_len = process(len, vehicle_name_gen)   # `process` would be part of the outlines API and execute the given function during generation
      sequence += vehicle_name_gen + ",  " + name_len + " characters long."  
      sequence += "\n"

rlouf added the enhancement label Feb 15, 2024

rlouf mentioned this issue Feb 16, 2024

Add an infilling DSL #182

Closed

rlouf pinned this issue Feb 16, 2024

rlouf mentioned this issue Feb 21, 2024

Generation from partially done JSON/List #689

Closed

lapp0 mentioned this issue Feb 23, 2024

Support Interleaved Function Calling #657

Open

rlouf mentioned this issue Mar 1, 2024

Add integration with Hugging Face transformers #713

Closed

gautierdag mentioned this issue Mar 21, 2024

Expose kv_cache in generator API #765

Open

miftahmoha mentioned this issue Apr 1, 2024

One small step towards continuous generation for Outlines #781

Draft

rlouf unpinned this issue Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continuous generation in Outlines #667

Continuous generation in Outlines #667

rlouf commented Feb 15, 2024 •

edited

Loading

cpfiffer commented Feb 24, 2024

rlouf commented Feb 24, 2024

jeffreyftang commented Mar 1, 2024

miftahmoha commented Mar 1, 2024

rlouf commented Mar 1, 2024 •

edited

Loading

rlouf commented Mar 1, 2024

lucasavila00 commented Mar 28, 2024

roberthoenig commented Apr 24, 2024

Continuous generation in Outlines #667

Continuous generation in Outlines #667

Comments

rlouf commented Feb 15, 2024 • edited Loading

cpfiffer commented Feb 24, 2024

rlouf commented Feb 24, 2024

jeffreyftang commented Mar 1, 2024

miftahmoha commented Mar 1, 2024

rlouf commented Mar 1, 2024 • edited Loading

rlouf commented Mar 1, 2024

lucasavila00 commented Mar 28, 2024

roberthoenig commented Apr 24, 2024

rlouf commented Feb 15, 2024 •

edited

Loading

rlouf commented Mar 1, 2024 •

edited

Loading