Skip to content

Conversation

@jakelorocco
Copy link
Contributor

@jakelorocco jakelorocco commented Oct 7, 2025

Main issue: We create backends (that contain clients) that outlive async event loops even though those objects are (sometimes) bound to the event loop. In most cases, this wasn't causing any issues. Provider's sdk's handled this internally. However, it seems this automatic handling gets less reliable across different providers and as the amount of requests increases.

For instance, with OpenAI (or RITS) Backends, running multiple m.instruct usually doesn't cause an issue. However, if you create enough requests (by including enough requirements and a sampling strategy), you hit issues with the async client.

Solution:
Return back to explicitly handling the event loop and explicitly handling the async clients where possible. For most code paths (ie running fully sync or fully async code), the client handling will not be needed since everything will now take place in the same event loop.

Changes:
Added an event loop that all synchronous code uses to run the async code in. For backends where clients get tied to specific event loops, I added a client cache so that they can re-instantiate clients when needed.

Litellm still has a bug with Watsonx clients. I opened an issue for that, but there's nothing we can do on our end to force litellm to recreate that client. We might be able to manually pass in async httpx / aiohttp client, but for now I simply added a warning message. You won't have any issues as long as you either:

  1. run everything synchronously
  2. run everything in the same asyncio.run() call

I added a few more tests to test for these specific watsonx litellm issues.

Notes:

  • We should keep the client cache, even if other changes in this PR get rejected.
  • The event loop singleton is only truly necessary for the litellm backend, but it prevents us from having to use new threads for each sync call which is more performant.

@mergify
Copy link

mergify bot commented Oct 7, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@jakelorocco jakelorocco marked this pull request as ready for review October 8, 2025 00:42
@nrfulton
Copy link
Contributor

nrfulton commented Oct 9, 2025

Main issue: We create backends (that contain clients) that outlive async event loops even though those objects are (sometimes) bound to the event loop. In most cases, this wasn't causing any issues. Provider's sdk's handled this internally.

Can you provide an example here?

However, it seems this automatic handling gets less reliable across different providers and as the amount of requests increases.

We've heard this pain (not wrt mellea, so far).

Changes: Added an event loop that all synchronous code uses to run the async code in. For backends where clients get tied to specific event loops, I added a client cache so that they can re-instantiate clients when needed.

Does this end up sequentializing all of the async calls?

Litellm still has a bug with Watsonx clients. I opened an issue for that

For reference: bug report

@jakelorocco
Copy link
Contributor Author

jakelorocco commented Oct 10, 2025

Main issue: We create backends (that contain clients) that outlive async event loops even though those objects are (sometimes) bound to the event loop. In most cases, this wasn't causing any issues. Provider's sdk's handled this internally.

Can you provide an example here?

For instance, the OpenAI backend will run without issues right now if you run multiple m.instruct calls even though those technically happen in different event loops (and the underlying httpx client is bound to each event loop). However, if you run enough of these calls with enough requirements, the automatic client cleanup takes too long (this is my hypothesis) and you get an event loop closed error.

For watsonx, we were already re-instantiating the APIClient to get around this event loop issue. Watsonx also uses an async client that gets tied to a specific event loop.

Litellm also handles this automatically in most cases by using a client cache (minus the watsonx issue): BerriAI/litellm#7667.

Changes: Added an event loop that all synchronous code uses to run the async code in. For backends where clients get tied to specific event loops, I added a client cache so that they can re-instantiate clients when needed.

Does this end up sequentializing all of the async calls?

No. Async calls are still concurrent by default. They might just use different clients. This is very similar to returning to session level event loops.

Copy link
Contributor

@avinash2692 avinash2692 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the implementation looks alright. I just had some questions re: caching strategy:

  • Any reason to use a LRU rather than just have a standard cache for each backendobject?
  • If we are going with an LRU, might want to increase the default size to something bigger than 2?

return loop


class ClientCache:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just use the inbuilt LRU here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that inbuilt LRU is for caching function results. It seems like it could be used here but may be somewhat clunky? How would you go about using it here since we have to pass in a key to get?

I guess we would have to do something like:

@functools.lru_cache
def _get_client(self, key):
  return Client()

def _client(self):
  key = id(event_loop)
  return self._get_client(key)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this out more. I think I would like to keep the separate client cache. The functools version is much harder to debug / test, and I think the multiple functions makes it less straightforward what is happening. If you feel strongly about this, I am willing to implement it using the functools.lru_cache though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that inbuilt LRU is for caching function results. It seems like it could be used here but may be somewhat clunky? How would you go about using it here since we have to pass in a key to get?

I guess we would have to do something like:

@functools.lru_cache
def _get_client(self, key):
  return Client()

def _client(self):
  key = id(event_loop)
  return self._get_client(key)

Ah, for some reason I thought that it could cache both objects (with @cached_property). But I agree with re: debugging efficiency.

@jakelorocco
Copy link
Contributor Author

I think the implementation looks alright. I just had some questions re: caching strategy:

  • Any reason to use a LRU rather than just have a standard cache for each backendobject?

I figured that users would use the same type of call multiple times in a row (ie synchronous calls) and then may want to intersperse the other type of call (ie multiple different asyncio.run calls). In this scenario, we would want to keep the client associated with the primary event loop around rather than keep recreating it after 2 of the other types of calls.

  • If we are going with an LRU, might want to increase the default size to something bigger than 2?

Yes, in most cases, user code will be running in a single event loop (ie only using synchronous calls or only using a single asyncio.run call). I left it at 2 incase the user uses both.

Copy link
Contributor

@avinash2692 avinash2692 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@avinash2692 avinash2692 merged commit 1e236dd into main Oct 13, 2025
4 checks passed
tuliocoppola pushed a commit to tuliocoppola/mellea that referenced this pull request Nov 5, 2025
…rative-computing#186)

* fix: async overhaul; create global event loop; add client cache

* fix: watsonx test cicd check

* feat: add client cache to openai and simplify setup

* fix: add additional test for client cache
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants