fix: async overhaul; create global event loop; add client cache #186

jakelorocco · 2025-10-07T20:40:37Z

Main issue: We create backends (that contain clients) that outlive async event loops even though those objects are (sometimes) bound to the event loop. In most cases, this wasn't causing any issues. Provider's sdk's handled this internally. However, it seems this automatic handling gets less reliable across different providers and as the amount of requests increases.

For instance, with OpenAI (or RITS) Backends, running multiple m.instruct usually doesn't cause an issue. However, if you create enough requests (by including enough requirements and a sampling strategy), you hit issues with the async client.

Solution:
Return back to explicitly handling the event loop and explicitly handling the async clients where possible. For most code paths (ie running fully sync or fully async code), the client handling will not be needed since everything will now take place in the same event loop.

Changes:
Added an event loop that all synchronous code uses to run the async code in. For backends where clients get tied to specific event loops, I added a client cache so that they can re-instantiate clients when needed.

Litellm still has a bug with Watsonx clients. I opened an issue for that, but there's nothing we can do on our end to force litellm to recreate that client. We might be able to manually pass in async httpx / aiohttp client, but for now I simply added a warning message. You won't have any issues as long as you either:

run everything synchronously
run everything in the same asyncio.run() call

I added a few more tests to test for these specific watsonx litellm issues.

Notes:

We should keep the client cache, even if other changes in this PR get rejected.
The event loop singleton is only truly necessary for the litellm backend, but it prevents us from having to use new threads for each sync call which is more performant.

mergify · 2025-10-07T20:41:12Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

nrfulton · 2025-10-09T22:36:40Z

Main issue: We create backends (that contain clients) that outlive async event loops even though those objects are (sometimes) bound to the event loop. In most cases, this wasn't causing any issues. Provider's sdk's handled this internally.

Can you provide an example here?

However, it seems this automatic handling gets less reliable across different providers and as the amount of requests increases.

We've heard this pain (not wrt mellea, so far).

Changes: Added an event loop that all synchronous code uses to run the async code in. For backends where clients get tied to specific event loops, I added a client cache so that they can re-instantiate clients when needed.

Does this end up sequentializing all of the async calls?

Litellm still has a bug with Watsonx clients. I opened an issue for that

For reference: bug report

jakelorocco · 2025-10-10T13:33:01Z

Main issue: We create backends (that contain clients) that outlive async event loops even though those objects are (sometimes) bound to the event loop. In most cases, this wasn't causing any issues. Provider's sdk's handled this internally.

Can you provide an example here?

For instance, the OpenAI backend will run without issues right now if you run multiple m.instruct calls even though those technically happen in different event loops (and the underlying httpx client is bound to each event loop). However, if you run enough of these calls with enough requirements, the automatic client cleanup takes too long (this is my hypothesis) and you get an event loop closed error.

For watsonx, we were already re-instantiating the APIClient to get around this event loop issue. Watsonx also uses an async client that gets tied to a specific event loop.

Litellm also handles this automatically in most cases by using a client cache (minus the watsonx issue): BerriAI/litellm#7667.

Changes: Added an event loop that all synchronous code uses to run the async code in. For backends where clients get tied to specific event loops, I added a client cache so that they can re-instantiate clients when needed.

Does this end up sequentializing all of the async calls?

No. Async calls are still concurrent by default. They might just use different clients. This is very similar to returning to session level event loops.

avinash2692

I think the implementation looks alright. I just had some questions re: caching strategy:

Any reason to use a LRU rather than just have a standard cache for each backendobject?
If we are going with an LRU, might want to increase the default size to something bigger than 2?

avinash2692 · 2025-10-10T17:43:49Z

mellea/helpers/async_helpers.py

+    return loop
+
+
+class ClientCache:


Could we just use the inbuilt LRU here?

My understanding is that inbuilt LRU is for caching function results. It seems like it could be used here but may be somewhat clunky? How would you go about using it here since we have to pass in a key to get?

I guess we would have to do something like:

@functools.lru_cache def _get_client(self, key): return Client() def _client(self): key = id(event_loop) return self._get_client(key)

I tested this out more. I think I would like to keep the separate client cache. The functools version is much harder to debug / test, and I think the multiple functions makes it less straightforward what is happening. If you feel strongly about this, I am willing to implement it using the functools.lru_cache though.

My understanding is that inbuilt LRU is for caching function results. It seems like it could be used here but may be somewhat clunky? How would you go about using it here since we have to pass in a key to get?

I guess we would have to do something like:

@functools.lru_cache def _get_client(self, key): return Client() def _client(self): key = id(event_loop) return self._get_client(key)

Ah, for some reason I thought that it could cache both objects (with @cached_property). But I agree with re: debugging efficiency.

jakelorocco · 2025-10-10T19:31:32Z

I think the implementation looks alright. I just had some questions re: caching strategy:

Any reason to use a LRU rather than just have a standard cache for each backendobject?

I figured that users would use the same type of call multiple times in a row (ie synchronous calls) and then may want to intersperse the other type of call (ie multiple different asyncio.run calls). In this scenario, we would want to keep the client associated with the primary event loop around rather than keep recreating it after 2 of the other types of calls.

If we are going with an LRU, might want to increase the default size to something bigger than 2?

Yes, in most cases, user code will be running in a single event loop (ie only using synchronous calls or only using a single asyncio.run call). I left it at 2 incase the user uses both.

avinash2692

LGTM

…rative-computing#186) * fix: async overhaul; create global event loop; add client cache * fix: watsonx test cicd check * feat: add client cache to openai and simplify setup * fix: add additional test for client cache

fix: async overhaul; create global event loop; add client cache

56e88c4

fix: watsonx test cicd check

5befbef

jakelorocco marked this pull request as ready for review October 8, 2025 00:42

jakelorocco requested review from avinash2692 and nrfulton October 8, 2025 00:42

feat: add client cache to openai and simplify setup

223e5fa

Merge branch 'main' into jal/async-updates

8965236

avinash2692 reviewed Oct 10, 2025

View reviewed changes

fix: add additional test for client cache

9e24ecc

avinash2692 approved these changes Oct 13, 2025

View reviewed changes

avinash2692 merged commit 1e236dd into main Oct 13, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: async overhaul; create global event loop; add client cache #186

fix: async overhaul; create global event loop; add client cache #186

jakelorocco commented Oct 7, 2025 •

edited

Loading

Uh oh!

mergify bot commented Oct 7, 2025

Uh oh!

nrfulton commented Oct 9, 2025

Uh oh!

jakelorocco commented Oct 10, 2025 •

edited

Loading

Uh oh!

avinash2692 left a comment

Uh oh!

avinash2692 Oct 10, 2025

Uh oh!

jakelorocco Oct 10, 2025

Uh oh!

jakelorocco Oct 10, 2025

Uh oh!

avinash2692 Oct 13, 2025

Uh oh!

jakelorocco commented Oct 10, 2025

Uh oh!

avinash2692 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: async overhaul; create global event loop; add client cache #186

fix: async overhaul; create global event loop; add client cache #186

Conversation

jakelorocco commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Oct 7, 2025

Merge Protections

🟢 Enforce conventional commit

Uh oh!

nrfulton commented Oct 9, 2025

Uh oh!

jakelorocco commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avinash2692 left a comment

Choose a reason for hiding this comment

Uh oh!

avinash2692 Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

jakelorocco Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

jakelorocco Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

avinash2692 Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

jakelorocco commented Oct 10, 2025

Uh oh!

avinash2692 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jakelorocco commented Oct 7, 2025 •

edited

Loading

jakelorocco commented Oct 10, 2025 •

edited

Loading