Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch model calls #72

Closed
wants to merge 13 commits into from
Closed

Batch model calls #72

wants to merge 13 commits into from

Conversation

rlouf
Copy link
Member

@rlouf rlouf commented Apr 27, 2023

  • Allow generation of several samples from one query with the OpenAI API
  • Allow generation of several samples from one query with the HugginFace integration
  • Allow batch generation with HuggingFace integration
  • Add async calls to the OpenAI API
  • Add outlines.elemwise function that maps a function over an array of inputs
  • Handle keyword arguments with outlines.elemwise
  • Use some magic to only return a coroutine even functions are called with several inputs? Or make the library async by default. We gather and execute the calls when the function is async. We may need to revisit this later when we start implementing workflows with concurrently running "agents". We return coroutines for model calls.
  • Fix caching: use perscache instead of joblib
  • Make outlines.text.function work with outlines.elemwise and add tests

Closes #52

@rlouf
Copy link
Member Author

rlouf commented Apr 27, 2023

Async calls

I added async calls to the OpenAI API, but I'm a bit unsatisfied syntax-wise with the resulting code. Indeed, async functions in Python have a "viral" property: all functions that call an async function must be async, all the way to the top. So to generate a sequence one currently needs to write:

import asyncio
import outlines.models as models


async def main():
    model = models.text_completion.openai("text-davinci-003", max_tokens=20)
    task = await model(["blah", "blah", "blah"], 1)
    print(task)

asyncio.run(main())

We can surely remove the need to call asyncio.run by asking users to wrap their program in a function with a @outlines.chain decorator like so:

import outlines
import outlines.models as models


@outlines.chain
def my_chain():
    model = models.text_completion.openai("text-davinci-003", max_tokens=20)
    task = await model(["blah", "blah", "blah"], 1)
    return task


my_chain()

It is slightly better, but not completely satisfactory as we need to explicitly await the model call. With my current knowledge of asyncio I don't know if this is completely unavoidable, or even desirable, but from a user-friendliness perspective it would be nice if we could abstract this away.

Joblib

It seems that caching does not work on async functions? I see there's an issue open for that, but no indication of any plans.

@rlouf
Copy link
Member Author

rlouf commented Apr 27, 2023

Async calls

I solved this by creating an elementwise decorator that accepts both functions and coroutines, has its own event loop and will run all coroutines before returning. This runs the requests asynchronously:

model = models.text_completion.openai("text-davinci-003", max_tokens=20)
task = await model(["blah", "blah", "blah"], num_samples=3)

@rlouf
Copy link
Member Author

rlouf commented Apr 30, 2023

On other news, I implemented an example taken from the self-consistency sampling paper. The OpenAI API is taking a while to return results, I wonder if this could be improved with the OSS models we're hosting.

@rlouf
Copy link
Member Author

rlouf commented May 2, 2023

I added batch inference (multiple inputs) and generation of several samples for the HF transformers models. We may want to launch the HF inference in a subprocess (drawn from a pool with a fixed number of subprocesses to avoid overloading the user's machine in complex workflows) and await the computation.

@rlouf
Copy link
Member Author

rlouf commented May 3, 2023

There are a few things that can be improved in the current PR, which will have to be solved for efficient execution in #22:

  • The library should be async by default, which means exposing the async API. We currently avoid this by executing elemwise calls in a local event loop; however to benefit from concurrent execution of several chains we will need to return the coroutine instead so we can create a task for each chain that will be executed concurrently.
  • @outlines.chain decorator that runs the chain in the event loop + allows to map over several inputs.
  • Use NumPy for batching? Would be useful for arrays of arbitrary size + flattening/unflattening of the arrays that's necessary to batch the async calls.
  • We should make the calls to HuggingFace models non-blocking by spawning them in a new process and awaiting the result of the computation. To avoid overloading the user's machine we will need to work with a pool of processes (probably with only one process available to run models given how resource intensive these are?)
  • Similarly we should work with a pool of aiohttp.ClientSession

Overall this PR achieves what we originally intended to do, albeit imperfectly, and will work well for simple workflows. A call to an OpenAI model with a list of 20 prompts will take less than double the time it takes to call the model with a single prompt.

However, we should make sure this works as well as it possibly can and is well-tested before we merge to avoid getting stuck in an async nightmare later.

@rlouf rlouf changed the title Vectorize model calls Batch model calls May 6, 2023
@rlouf rlouf added the text Linked to text generation label May 6, 2023
@rlouf rlouf closed this May 10, 2023
@rlouf rlouf deleted the vectorize branch June 6, 2023 15:13
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement text Linked to text generation
Projects
None yet
2 participants