-
Notifications
You must be signed in to change notification settings - Fork 49
fix: async overhaul; create global event loop; add client cache #186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
56e88c4
fix: async overhaul; create global event loop; add client cache
jakelorocco 5befbef
fix: watsonx test cicd check
jakelorocco 223e5fa
feat: add client cache to openai and simplify setup
jakelorocco 8965236
Merge branch 'main' into jal/async-updates
jakelorocco 9e24ecc
fix: add additional test for client cache
jakelorocco File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,86 @@ | ||
| """Helper for event loop management. Allows consistently running async generate requests in sync code.""" | ||
|
|
||
| import asyncio | ||
| import threading | ||
| from collections.abc import Coroutine | ||
| from typing import Any, TypeVar | ||
|
|
||
| R = TypeVar("R") | ||
|
|
||
|
|
||
| class _EventLoopHandler: | ||
| """A class that handles the event loop for Mellea code. Do not directly instantiate this. Use `_run_async_in_thread`.""" | ||
|
|
||
| def __init__(self): | ||
| """Instantiates an EventLoopHandler. Used to ensure consistency when calling async code from sync code in Mellea. | ||
|
|
||
| Do not instantiate this class. Rely on the exported `_run_async_in_thread` function. | ||
| """ | ||
| self._event_loop = asyncio.new_event_loop() | ||
| self._thread: threading.Thread = threading.Thread( | ||
| target=self._event_loop.run_forever, daemon=True | ||
| ) | ||
| self._thread.start() | ||
|
|
||
| def __del__(self): | ||
| """Delete the event loop handler.""" | ||
| self._close_event_loop() | ||
|
|
||
| def _close_event_loop(self) -> None: | ||
| """Called when deleting the event loop handler. Cleans up the event loop and thread.""" | ||
| if self._event_loop: | ||
| try: | ||
| tasks = asyncio.all_tasks(self._event_loop) | ||
| for task in tasks: | ||
| task.cancel() | ||
|
|
||
| async def finalize_tasks(): | ||
| # TODO: We can log errors here if needed. | ||
| await asyncio.gather(*tasks, return_exceptions=True) | ||
|
|
||
| out = asyncio.run_coroutine_threadsafe( | ||
| finalize_tasks(), self._event_loop | ||
| ) | ||
|
|
||
| # Timeout if needed. | ||
| out.result(5) | ||
| except Exception: | ||
| pass | ||
|
|
||
| # Finally stop the event loop for this session. | ||
| self._event_loop.stop() | ||
|
|
||
| def __call__(self, co: Coroutine[Any, Any, R]) -> R: | ||
| """Runs the coroutine in the event loop.""" | ||
| return asyncio.run_coroutine_threadsafe(co, self._event_loop).result() | ||
|
|
||
|
|
||
| # Instantiate this class once. It will not be re-instantiated. | ||
| __event_loop_handler = _EventLoopHandler() | ||
|
|
||
|
|
||
| def _run_async_in_thread(co: Coroutine[Any, Any, R]) -> R: | ||
| """Call to run async code from synchronous code in Mellea. | ||
|
|
||
| In Mellea, we utilize async code underneath sync code to speed up | ||
| inference requests. This puts us in a difficult situation since most | ||
| api providers and sdks use async clients that get bound to a specific event | ||
| loop to make requests. These clients are typically long-lasting and sometimes | ||
| cannot be easily reinstantiated on demand to avoid these issues. | ||
| By declaring a single event loop for these async requests, | ||
| Mellea avoids these client issues. | ||
|
|
||
| Note: This implementation requires that sessions/backends be run only through | ||
| the top-level / session sync or async interfaces, not both. You will need to | ||
| reinstantiate your backend if switching between the two. | ||
|
|
||
| Args: | ||
| co: coroutine to run | ||
|
|
||
| Returns: | ||
| output of the coroutine | ||
| """ | ||
| return __event_loop_handler(co) | ||
|
|
||
|
|
||
| __all__ = ["_run_async_in_thread"] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just use the inbuilt LRU here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that inbuilt LRU is for caching function results. It seems like it could be used here but may be somewhat clunky? How would you go about using it here since we have to pass in a key to get?
I guess we would have to do something like:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this out more. I think I would like to keep the separate client cache. The functools version is much harder to debug / test, and I think the multiple functions makes it less straightforward what is happening. If you feel strongly about this, I am willing to implement it using the functools.lru_cache though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, for some reason I thought that it could cache both objects (with
@cached_property). But I agree with re: debugging efficiency.