-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in the chat completion create
method
#820
Comments
You shouldn't be instantiating a new client for every request, if you move the client outside of the request handler you shouldn't see any memory leaks. I did manage to replicate the memory leak you're seeing with your example and moving the client instantiation outside of the function resulted in stable memory usage. |
For what it's worth, I still see this issue when wrapping @CodePalAI's |
Wouldn't "moving the client outside of the request handler" be the same thing as a module level client? From the README, I figured the network resources are getting cleaned up when the OpenAI client gets garbage collected. Sounds like this isn't necessarily true? This might be outside of the scope of this library but I'm also trying to understand how this fits in your typical backend app. It seems like using a module scoped client is heavily discouraged but so is instantiating a new client with every request (ie a FastAPI dependency). What is the other option here? |
How can we implement an async client with fastapi correctly? Is it safe to implement like this (singleton approach) ? Another problem is that we have multiple api keys like below. client = AsyncAzureOpenAI(
api_key=api_keys_list[random_index],
... |
@brian-goo you can create a single client and use FastAPI app events to close it when the server is stopped and then use |
@Filimoa what are you trying to achieve that requires you to instantiate a new client for every request? If you really want to do so you currently need to use Instantiating a new http client for each request is very bad for performance regardless of whether or not it has any resource leaks as the client can not reuse connections. |
@RobertCraigie one use case I have is with https://github.com/abi/screenshot-to-code where users put in their Open AI key on the hosted version (that's how the app is free to use). I understand that performance will be worse with instantiating a new client on each request but a memory leak is even worse. Here's a fun memory leak graph from my server :) Not 100% sure it's because of this issue but very likely. |
@abi are you explicitly closing the client using Additionally, is there a reason that |
@RobertCraigie I'm not closing with The main reason |
Interesting point. Maybe there could be an option to disable the use of of that default option (what about others?), so that the client would err if you forget to provide the key using |
This is a fair point, I do think there are ways you could structure your code to make this practically impossible to happen though. Even for anyone making changes to your codebase for the first time. For example, instead of using # utils/openai_client.py
from openai import AsyncOpenAI
_client = AsyncOpenAI(
# provide a dummy API key so that requests made directly will always fail
api_key='<this client should never be used directly!>',
)
def get_openai(user: User) -> AsyncOpenAI:
return _client.with_options(api_key=user.openai_api_key)
# backend.py
async def handler():
user = get_user()
completion = await get_openai(user).chat.completions.create(...)
# ... It's also worth noting that you could use your existing pattern and avoid memory leaks entirely by sharing the http client instance if you really don't want to use import httpx
from openai import AsyncOpenAI
http_client = httpx.AsyncClient()
async def handler(api_key):
openai_client = AsyncOpenAI(api_key=api_key, http_client=http_client) I would argue though that the first snippet in this comment is less likely to result in accidentally using the incorrect API key because you could accidentally just write However with a helper function similar to The only way you'll accidentally use the wrong API key is by forgetting to use this helper function, which it may be possible to define lint rules for (I don't know of an easy way to do this off the top of my head). |
@RobertCraigie that's super helpful, both of those patterns. Didn't realize you could pass in the http_client for the client. I pushed |
Thanks for the investigation there, @gintsmurans ! That makes sense to me. We'll take a look soon and see if we can demonstrate that this prevents the issue. |
@gintsmurans can you help me understand exactly what you believe to be happening? I agree that |
Hey @tamird ! Created a little example here: https://gist.github.com/gintsmurans/79be8ee951ae1658ef9d7afd2f7d5460 |
@gintsmurans Why do you call the init many times? Maybe it's good to improve your case, but anyhow, am curious as usually AFAIK there is no reason to. |
In our case its an api backend server where openai is just a small part of it, so we don't keep one global openai instance. But as requests comes in, in a week or so, api server's memory is all used up. |
Is there a technical reason for this? Our backend also uses many APIs and OpenAI is just a small part of it, but still we use and keep a client per process. |
There wasn't a definite reason, just that we made it that way, but now when I think about it - we are using async request handlers. If we would be using the same instance for all openai requests, that could lead to conflicts and even deadlocks. Learned it the hard way. :) Anyhow, any library should cleanup after it self, or give option to do it - you never know how people are gonna use it. |
What do you mean? We also use async request handlers, and the async support in the OpenAI client, to scale. I'm not aware of any problems it would give, conflicts or deadlocks or anything. I think those would be a serious bug in the library. I guess you encountered something, as you learned it the hard way?
Sure, but I think it's also good to know the good patterns for usage, and possible problems. |
As far as my understanding goes .. lets say one api request starts async task, but as it waits for results, another request can be handled. At the same time the class instance that handles requests is the same one for every request. We had serveral issues with this:
We solved these by using ContextVar class to make sure, only one context - request uses same object instance. In case of openai - not sure how it would react to multiple simultaneous requests. I would not use global instances for anything that I can be sure its thread / context safe. Which leads to openai not releasing memory correctly. :) |
@gintsmurans Yes, the idea of async request handlers is that you can have many running concurrently, within a single process and thread. And indeed, you should not use class attributes or other mechanisms to mix up request specific data then. I've usually just passed the request context as parameters to the business logic functions, like the session object when using postgres via sqlalchemy. Anyway, that's not related to this OpenAI client lib, which AFAIK does no such mixups, and where reusing a single client is the recommended way for multiple concurrent async request handlers. Again, sure, it's good to fix the cleanup, but it's also good to be clear about what is good practice and why. I've understood that sharing the client cross multiple handlers is good so that you are not opening and closing connections to OpenAI in vain all the time. |
Haven't looked, but I would guess that open ai does not create a "new connection" with each instance, but internally does basic http requests when asked to - for example on In my subjective opinion, using same instances between coroutines / threads is just a bad practice, will lead to weird behavior sooner or later, thus should be avoided - you never know what a third party library does internally, nor you can check each one of them. |
Anyway, it really does not matter as its not in the scope of this issue. Focus is that there seems to be memory leaks, regardless of how the library is used. |
I think the http connections are pooled, by the underlying httpx library. https://www.python-httpx.org/advanced/ says:
On the OpenAI side, the httpx client is kept and reused by requests, so I think it gets to use the pool, and does not create new connections for new requests. https://github.com/openai/openai-python/blob/main/src/openai/_base_client.py#L872
I think you are wrong here, and both the OpenAI and HTTPX libraries are made so that it can be a good idea to reuse connections across async tasks. With threads it would indeed lead to problems I think, but the point of async is to not need threads for this kind of things. |
We believe this has been fixed in |
These are all nice things, but we've had issues with the connection becoming corrupt after some kinda of failures. This is something a fresh connection never does, so we'd prefer if a fresh connection did not have memory leaks. It seems risky to rely upon the same http client to be perfect forever globally. |
@pseudotensor as I said just above, please open a new issue if you see further problems instead of commenting here. It's not clear to me whether #1181 describes the entirety of the issues you're seeing. |
Yes, you can see I opened a new issue, what's your question here? FYI I'm just responding to one particular comment here, so others know as well when they search for issues. |
Confirm this is an issue with the Python library and not an underlying OpenAI API
Describe the bug
Calling the
create
method oncompletions
introduces a memory leak, according totracemalloc
.Example:
How to determine it's a memory leak?
I use
tracemalloc
with my flask application:When running this endpoint multiple times, one line is at the very top (which means it's the most expensive one):
When I refresh, the
size
increases. Of course, in a production environment, the numbers get high a lot quicker.To Reproduce
There's no one way to prove there's a memory leak.
But what I did was:
size
of the objectCode snippets
No response
OS
Linux, macOS
Python version
Python v3.11.2
Library version
openai v1.2.4
The text was updated successfully, but these errors were encountered: