Batch model calls #72

rlouf · 2023-04-27T10:50:52Z

Allow generation of several samples from one query with the OpenAI API
Allow generation of several samples from one query with the HugginFace integration
Allow batch generation with HuggingFace integration
Add async calls to the OpenAI API
Add outlines.elemwise function that maps a function over an array of inputs
Handle keyword arguments with outlines.elemwise
~~Use some magic to only return a coroutine even functions are called with several inputs? Or make the library async by default.~~ ~~We gather and execute the calls when the function is async. We may need to revisit this later when we start implementing workflows with concurrently running "agents".~~ We return coroutines for model calls.
Fix caching: use perscache instead of joblib
~~Make outlines.text.function work with outlines.elemwise and add tests~~

Closes #52

rlouf · 2023-04-27T11:19:58Z

Async calls

I added async calls to the OpenAI API, but I'm a bit unsatisfied syntax-wise with the resulting code. Indeed, async functions in Python have a "viral" property: all functions that call an async function must be async, all the way to the top. So to generate a sequence one currently needs to write:

import asyncio
import outlines.models as models


async def main():
    model = models.text_completion.openai("text-davinci-003", max_tokens=20)
    task = await model(["blah", "blah", "blah"], 1)
    print(task)

asyncio.run(main())

We can surely remove the need to call asyncio.run by asking users to wrap their program in a function with a @outlines.chain decorator like so:

import outlines
import outlines.models as models


@outlines.chain
def my_chain():
    model = models.text_completion.openai("text-davinci-003", max_tokens=20)
    task = await model(["blah", "blah", "blah"], 1)
    return task


my_chain()

It is slightly better, but not completely satisfactory as we need to explicitly await the model call. With my current knowledge of asyncio I don't know if this is completely unavoidable, or even desirable, but from a user-friendliness perspective it would be nice if we could abstract this away.

Joblib

It seems that caching does not work on async functions? I see there's an issue open for that, but no indication of any plans.

rlouf · 2023-04-27T14:21:00Z

Async calls

I solved this by creating an elementwise decorator that accepts both functions and coroutines, has its own event loop and will run all coroutines before returning. This runs the requests asynchronously:

model = models.text_completion.openai("text-davinci-003", max_tokens=20)
task = await model(["blah", "blah", "blah"], num_samples=3)

rlouf · 2023-04-30T08:18:41Z

On other news, I implemented an example taken from the self-consistency sampling paper. The OpenAI API is taking a while to return results, I wonder if this could be improved with the OSS models we're hosting.

outlines/models/hf_transformers.py

rlouf · 2023-05-02T20:59:41Z

I added batch inference (multiple inputs) and generation of several samples for the HF transformers models. We may want to launch the HF inference in a subprocess (drawn from a pool with a fixed number of subprocesses to avoid overloading the user's machine in complex workflows) and await the computation.

rlouf · 2023-05-03T13:29:39Z

There are a few things that can be improved in the current PR, which will have to be solved for efficient execution in #22:

The library should be async by default, which means exposing the async API. We currently avoid this by executing elemwise calls in a local event loop; however to benefit from concurrent execution of several chains we will need to return the coroutine instead so we can create a task for each chain that will be executed concurrently.
@outlines.chain decorator that runs the chain in the event loop + allows to map over several inputs.
Use NumPy for batching? Would be useful for arrays of arbitrary size + flattening/unflattening of the arrays that's necessary to batch the async calls.
We should make the calls to HuggingFace models non-blocking by spawning them in a new process and awaiting the result of the computation. To avoid overloading the user's machine we will need to work with a pool of processes (probably with only one process available to run models given how resource intensive these are?)
Similarly we should work with a pool of aiohttp.ClientSession

Overall this PR achieves what we originally intended to do, albeit imperfectly, and will work well for simple workflows. A call to an OpenAI model with a list of 20 prompts will take less than double the time it takes to call the model with a single prompt.

However, we should make sure this works as well as it possibly can and is well-tested before we merge to avoid getting stuck in an async nightmare later.

Joblib's Memory does not work with async function. We thus replace it with `perscache`, which although less popular, passes the tests.

rlouf force-pushed the vectorize branch from 830795f to 2336ea7 Compare April 29, 2023 22:16

rlouf mentioned this pull request Apr 30, 2023

Use the same aiohttp session for element-wise operations #81

Closed

rlouf force-pushed the vectorize branch 5 times, most recently from 2d32930 to 34cfd4e Compare May 1, 2023 11:23

rlouf added the async label May 1, 2023

This was linked to issues May 1, 2023

Support async model calls #55

Closed

Parallel execution when mapping on several inputs #22

Closed

jaix131 reviewed May 2, 2023

View reviewed changes

outlines/models/hf_transformers.py Outdated Show resolved Hide resolved

rlouf force-pushed the vectorize branch from fb8030e to dd419d0 Compare May 2, 2023 20:31

rlouf added the enhancement label May 2, 2023

Allow generation of several samples with OpenAI API

628c548

rlouf force-pushed the vectorize branch from dd419d0 to d739fae Compare May 2, 2023 20:58

rlouf force-pushed the vectorize branch from f6a6ac7 to aa66465 Compare May 3, 2023 09:03

rlouf added 7 commits May 3, 2023 15:41

Use async OpenAI API calls

c8406c3

Map async functions over list of inputs

89485a3

Add self-consistency example

0a7f189

Use perscache instead of joblib

d1e6c02

Joblib's Memory does not work with async function. We thus replace it with `perscache`, which although less popular, passes the tests.

Use outlines.text.function with outlines.elemwise

f029879

Clean the top-level imports

4a063b8

Generate several samples with HF models

8161b70

rlouf force-pushed the vectorize branch from 4fb2b0c to c8edc4d Compare May 3, 2023 13:41

rlouf force-pushed the main branch from 9b54ece to 7bd0f4d Compare May 4, 2023 12:38

rlouf force-pushed the main branch from 7bd0f4d to a2e0b68 Compare May 4, 2023 12:39

rlouf added 4 commits May 5, 2023 20:25

Add num_samples and batched calls to the README

caf77d8

Make OpenAI return a single element for single output

f6c82c6

Remove default elemwise behavior of @outlines.function

de2ea3d

Update the examples

16b72ca

rlouf force-pushed the vectorize branch from 82ab1ae to 16b72ca Compare May 5, 2023 18:26

Propagate the async calls up to the user

7ca565f

rlouf changed the title ~~Vectorize model calls~~ Batch model calls May 6, 2023

rlouf added the text Linked to text generation label May 6, 2023

rlouf mentioned this pull request May 10, 2023

Takes several samples from the generative models #95

Merged

rlouf closed this May 10, 2023

rlouf deleted the vectorize branch June 6, 2023 15:13

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch model calls #72

Batch model calls #72

rlouf commented Apr 27, 2023 •

edited

Loading

rlouf commented Apr 27, 2023 •

edited

Loading

rlouf commented Apr 27, 2023 •

edited

Loading

Async calls

rlouf commented Apr 30, 2023

rlouf commented May 2, 2023 •

edited

Loading

rlouf commented May 3, 2023 •

edited

Loading

Batch model calls #72

Batch model calls #72

Conversation

rlouf commented Apr 27, 2023 • edited Loading

rlouf commented Apr 27, 2023 • edited Loading

Async calls

Joblib

rlouf commented Apr 27, 2023 • edited Loading

Async calls

rlouf commented Apr 30, 2023

rlouf commented May 2, 2023 • edited Loading

rlouf commented May 3, 2023 • edited Loading

rlouf commented Apr 27, 2023 •

edited

Loading

rlouf commented Apr 27, 2023 •

edited

Loading

rlouf commented Apr 27, 2023 •

edited

Loading

rlouf commented May 2, 2023 •

edited

Loading

rlouf commented May 3, 2023 •

edited

Loading