-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuous generation in Outlines #667
Comments
It may also be interesting to get the join token likelihood, if available. I'm not super familiar with outlines but I'd love to be able to compare |
We could store that in addition to the sequence weights (which can be, but are not necessarily, the log-probability of the sequence). |
I'm also interested, currently working on it right now. |
Great! It is fairly involved and there are many important design decisions that need to be made, and we need to handle computation of the KV cache after concatenating text to a previous generation. don't hesitate to open a draft PR asap so I can give some feedback early on. |
It is fairly involved, interleaving function calls should be easier to implement though. |
LmScript, a graphical interface for Outlines programs, makes heavy usage of continuous generation. We currently re-send the accumulated prompt for every generation call and handle the chat template on our end. Better performance for continuous generation would be highly appreciated |
Super excited for this feature! One note: It'd be great if continuous generation is implemented so that intermediate outputs can be processed and reused during generation: sequence = "What are the most popular names of vehicles and the length of their names?\n"
for i in range(6):
sequence += f"{i}, "
vehicle_name_gen = generator(sequence, stop_at=["\n"])
name_len = process(len, vehicle_name_gen) # `process` would be part of the outlines API and execute the given function during generation
sequence += vehicle_name_gen + ", " + name_len + " characters long."
sequence += "\n" |
I am opening this issue to roughly sketch the next big milestone for Outlines, tentatively called "continuous generation". There are many rough edges still, and open questions.
The first goal is to allow sampling of sequences like these:
By "sampling these sequences" I mean being able to run, for instance, beam search and optimize the sequence as a whole rather than each generation separately.
All we have to do is to return a
Sequence
object instead of a string, with the following attributes and methods:Sequence
should have the same feel as a string. Besides being able to print it, we should be able to slice it, add it to another string, another sequence, etc. and carry on:This should be enough to bring Outlines at feature-parity with other DSLs, while not being a DSL.
The text was updated successfully, but these errors were encountered: