-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch model calls #72
Conversation
Async callsI added async calls to the OpenAI API, but I'm a bit unsatisfied syntax-wise with the resulting code. Indeed, async functions in Python have a "viral" property: all functions that call an async function must be async, all the way to the top. So to generate a sequence one currently needs to write: import asyncio
import outlines.models as models
async def main():
model = models.text_completion.openai("text-davinci-003", max_tokens=20)
task = await model(["blah", "blah", "blah"], 1)
print(task)
asyncio.run(main()) We can surely remove the need to call import outlines
import outlines.models as models
@outlines.chain
def my_chain():
model = models.text_completion.openai("text-davinci-003", max_tokens=20)
task = await model(["blah", "blah", "blah"], 1)
return task
my_chain() It is slightly better, but not completely satisfactory as we need to explicitly JoblibIt seems that caching does not work on async functions? I see there's an issue open for that, but no indication of any plans. |
I solved this by creating an model = models.text_completion.openai("text-davinci-003", max_tokens=20)
task = await model(["blah", "blah", "blah"], num_samples=3) |
On other news, I implemented an example taken from the self-consistency sampling paper. The OpenAI API is taking a while to return results, I wonder if this could be improved with the OSS models we're hosting. |
2d32930
to
34cfd4e
Compare
I added batch inference (multiple inputs) and generation of several samples for the HF |
There are a few things that can be improved in the current PR, which will have to be solved for efficient execution in #22:
Overall this PR achieves what we originally intended to do, albeit imperfectly, and will work well for simple workflows. A call to an OpenAI model with a list of 20 prompts will take less than double the time it takes to call the model with a single prompt. However, we should make sure this works as well as it possibly can and is well-tested before we merge to avoid getting stuck in an async nightmare later. |
Joblib's Memory does not work with async function. We thus replace it with `perscache`, which although less popular, passes the tests.
outlines.elemwise
function that maps a function over an array of inputsoutlines.elemwise
Use some magic to only return a coroutine even functions are called with several inputs? Or make the library async by default.We gather and execute the calls when the function is async. We may need to revisit this later when we start implementing workflows with concurrently running "agents".We return coroutines for model calls.perscache
instead ofjoblib
Makeoutlines.text.function
work withoutlines.elemwise
and add testsCloses #52