-
Notifications
You must be signed in to change notification settings - Fork 49
fix: async overhaul; create global event loop; add client cache #186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
Can you provide an example here?
We've heard this pain (not wrt mellea, so far).
Does this end up sequentializing all of the async calls?
For reference: bug report |
For instance, the OpenAI backend will run without issues right now if you run multiple For watsonx, we were already re-instantiating the APIClient to get around this event loop issue. Watsonx also uses an async client that gets tied to a specific event loop. Litellm also handles this automatically in most cases by using a client cache (minus the watsonx issue): BerriAI/litellm#7667.
No. Async calls are still concurrent by default. They might just use different clients. This is very similar to returning to session level event loops. |
avinash2692
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the implementation looks alright. I just had some questions re: caching strategy:
- Any reason to use a LRU rather than just have a standard cache for each
backendobject? - If we are going with an LRU, might want to increase the default size to something bigger than
2?
| return loop | ||
|
|
||
|
|
||
| class ClientCache: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just use the inbuilt LRU here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that inbuilt LRU is for caching function results. It seems like it could be used here but may be somewhat clunky? How would you go about using it here since we have to pass in a key to get?
I guess we would have to do something like:
@functools.lru_cache
def _get_client(self, key):
return Client()
def _client(self):
key = id(event_loop)
return self._get_client(key)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this out more. I think I would like to keep the separate client cache. The functools version is much harder to debug / test, and I think the multiple functions makes it less straightforward what is happening. If you feel strongly about this, I am willing to implement it using the functools.lru_cache though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that inbuilt LRU is for caching function results. It seems like it could be used here but may be somewhat clunky? How would you go about using it here since we have to pass in a key to get?
I guess we would have to do something like:
@functools.lru_cache def _get_client(self, key): return Client() def _client(self): key = id(event_loop) return self._get_client(key)
Ah, for some reason I thought that it could cache both objects (with @cached_property). But I agree with re: debugging efficiency.
I figured that users would use the same type of call multiple times in a row (ie synchronous calls) and then may want to intersperse the other type of call (ie multiple different asyncio.run calls). In this scenario, we would want to keep the client associated with the primary event loop around rather than keep recreating it after 2 of the other types of calls.
Yes, in most cases, user code will be running in a single event loop (ie only using synchronous calls or only using a single asyncio.run call). I left it at 2 incase the user uses both. |
avinash2692
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…rative-computing#186) * fix: async overhaul; create global event loop; add client cache * fix: watsonx test cicd check * feat: add client cache to openai and simplify setup * fix: add additional test for client cache
Main issue: We create backends (that contain clients) that outlive async event loops even though those objects are (sometimes) bound to the event loop. In most cases, this wasn't causing any issues. Provider's sdk's handled this internally. However, it seems this automatic handling gets less reliable across different providers and as the amount of requests increases.
For instance, with OpenAI (or RITS) Backends, running multiple
m.instructusually doesn't cause an issue. However, if you create enough requests (by including enough requirements and a sampling strategy), you hit issues with the async client.Solution:
Return back to explicitly handling the event loop and explicitly handling the async clients where possible. For most code paths (ie running fully sync or fully async code), the client handling will not be needed since everything will now take place in the same event loop.
Changes:
Added an event loop that all synchronous code uses to run the async code in. For backends where clients get tied to specific event loops, I added a client cache so that they can re-instantiate clients when needed.
Litellm still has a bug with Watsonx clients. I opened an issue for that, but there's nothing we can do on our end to force litellm to recreate that client. We might be able to manually pass in async httpx / aiohttp client, but for now I simply added a warning message. You won't have any issues as long as you either:
asyncio.run()callI added a few more tests to test for these specific watsonx litellm issues.
Notes: