Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stripe api with requests leaks socket file descriptors via unclosed Session #874

Open
asottile-sentry opened this issue Sep 15, 2022 · 8 comments

Comments

@asottile-sentry
Copy link
Contributor

Describe the bug

the http request client is assigned to a global variable here:

stripe.default_http_client = http_client.new_default_http_client(
verify_ssl_certs=verify, proxy=proxy
)

requests client is defined here:

class RequestsClient(HTTPClient):

the Session is assigned here:

if getattr(self._thread_local, "session", None) is None:
self._thread_local.session = self._session or requests.Session()

this Session is never closed leading to file descriptor leak and a ResourceWarning -- the correct usage of a Session is to either utilize the with statement or explicitly .close() it

To Reproduce

the simplest reproduction I can come up with is this one liner:

$ python3  -Wonce -c $'import stripe; stripe.api_key="placeholder"; import contextlib\nwith contextlib.suppress(Exception):\n stripe.Account.list()'
sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('10.0.2.15', 44434), raddr=('34.200.27.109', 443)>

the ResourceWarning there is coming from the teardown of that Session object I mentioned above -- it's an unclosed connection to the stripe api:

$ nslookup 34.200.27.109
109.27.200.34.in-addr.arpa	name = api-34-200-27-109.stripe.com.

Authoritative answers can be found from:

Expected behavior

utilization of the stripe api should not lead to ResourceWarnings

Code snippets

above

OS

any, though I'm on linux

Language version

any, though I'm using 3.10.4

Library version

4.1.0

API version

N/A

Additional context

No response

@asottile-sentry
Copy link
Contributor Author

looks like #276 is where the issue was introduced

@yejia-stripe
Copy link
Contributor

Hi @asottile-sentry, thanks for the report!

While we agree it's not ideal, we've found that the benefits of re-using the session are worth not properly closing it on exit. If this is an issue for you, you can close the session manually before your program exits by calling the RequestsClient.close method, or use a different HTTPClient (we have a few defined, see here for configuration instructions).

We'll keep this issue open for potential improvements in the future.

@asottile-sentry
Copy link
Contributor Author

there isn't a way to reliably do that because the session is in a thread local (and thus leaks once per thread)

plus relying on any user of your library to have to hack around the default behaviour leaking things via global variables isn't ideal

were the "benefits" ever profiled? I suspect in practice they don't make significant difference here (requests is after all limited to H1 requests and keepalive still needs to be re-negotiated) but if you could point me to some evidence of that I'd be happy to be wrong.

if there is a benefit, I'd really like to see a first-class context manager or constructable client rather than global variables -- the current "interface" is really inconvenient and not threadsafe. I say interface lightly because "monkey patch these global variables at the start and end of your program" is barely an interface

I suspect a simpler middle-ground would be to use an ephemeral session if the user does not pass one in (closing it properly on each request) -- then if users want to manage their own session lifecycle they can do so but then the default behaviour doesn't leak socket/file descriptors. I think this is a fairly unobtrusive patch as well --

diff --git a/stripe/http_client.py b/stripe/http_client.py
index ba3838b..160d413 100644
--- a/stripe/http_client.py
+++ b/stripe/http_client.py
@@ -1,5 +1,6 @@
 from __future__ import absolute_import, division, print_function
 
+import contextlib
 import sys
 import textwrap
 import warnings
@@ -87,6 +88,15 @@ def new_default_http_client(*args, **kwargs):
     return impl(*args, **kwargs)
 
 
+@contextlib.contextmanager
+def _session(session):
+    if session is not None:
+        yield session
+    else:
+        with requests.Session() as session:
+            yield session
+
+
 class HTTPClient(object):
     MAX_DELAY = 2
     INITIAL_DELAY = 0.5
@@ -315,19 +325,17 @@ class RequestsClient(HTTPClient):
         if is_streaming:
             kwargs["stream"] = True
 
-        if getattr(self._thread_local, "session", None) is None:
-            self._thread_local.session = self._session or requests.Session()
-
         try:
             try:
-                result = self._thread_local.session.request(
-                    method,
-                    url,
-                    headers=headers,
-                    data=post_data,
-                    timeout=self._timeout,
-                    **kwargs
-                )
+                with _session(self._session) as session:
+                    result = session.request(
+                        method,
+                        url,
+                        headers=headers,
+                        data=post_data,
+                        timeout=self._timeout,
+                        **kwargs
+                    )
             except TypeError as e:
                 raise TypeError(
                     "Warning: It looks like your installed version of the "

@yejia-stripe
Copy link
Contributor

Please bear with me as I'm not terribly familiar the topics we're discussing.

there isn't a way to reliably do that because the session is in a thread local (and thus leaks once per thread)

The RequestsClient.close method looks to me like it is using the same thread local as the session. Does this not work the way I think it does?

were the "benefits" ever profiled? I suspect in practice they don't make significant difference here [...] but if you could point me to some evidence of that I'd be happy to be wrong.

Looking at #266 (linked from the PR you mentioned), I don't believe the performance was actually tested. I did a naive test of RequestClient before this change vs the current one, and it does look like reusing the session leads to a slight decrease in request times for tens of requests sent in succession. You're right that this probably doesn't make a difference to most users in practice, so we'll need to reevaluate our implementation and its tradeoffs.

requests is after all limited to H1 requests and keepalive still needs to be re-negotiated

What are H1 requests? I haven't heard of the term, and nothing stood out during a quick search.


Thank you for the suggestions and for bringing this to our attention! We do have some ideas for improvements/fixes for this, so I'll add yours to that list for our team to discuss. Realistically, we're unlikely to work on this right away, but the team is aware of this issue and we'll budget time accordingly.

@asottile-sentry
Copy link
Contributor Author

asottile-sentry commented Sep 28, 2022

Please bear with me as I'm not terribly familiar the topics we're discussing.

there isn't a way to reliably do that because the session is in a thread local (and thus leaks once per thread)

The RequestsClient.close method looks to me like it is using the same thread local as the session. Does this not work the way I think it does?

thread locals are essentially global variables per-thread. so calling close in the main thread doesn't affect instances in other threads

requests is after all limited to H1 requests and keepalive still needs to be re-negotiated

What are H1 requests? I haven't heard of the term, and nothing stood out during a quick search.

ah H1 being HTTP 1.x -- H2 being HTTP 2.x where connection pooling and reuse makes a much more significant difference

@xavdid-stripe
Copy link

thread locals are essentially global variables per-thread. so calling close in the main thread doesn't affect instances in other threads

I'm not quite following- since our custom close() calls down to self._thread_local.session.close(), won't that work on a per-thread basis? Assuming that each of your threads calls close() manually, I guess.

@asottile-sentry
Copy link
Contributor Author

thread locals are essentially global variables per-thread. so calling close in the main thread doesn't affect instances in other threads

I'm not quite following- since our custom close() calls down to self._thread_local.session.close(), won't that work on a per-thread basis? Assuming that each of your threads calls close() manually, I guess.

each thread would need to do that because they are thread locals -- meaning you cannot ...

... close the session manually before your program exits

you could do it in each thread if you have strong control over the thread runtime, but most of the time you're not going to have strong control over the threads (think like thread pools, web frameworks, etc.) -- for example django's thread-based executor for which I'd have to hack around with django internals to have the right patch point to close this (and I'm not even sure that's accessible at all since I don't think the daemon thread can even configure that).

@richardm-stripe
Copy link
Contributor

richardm-stripe commented Feb 28, 2024

were the "benefits" ever profiled? I suspect in practice they don't make significant difference here

I very naïvely profiled this on my Macbook: richardm-stripe#2 and found ~33% speedup from session re-use. Unfortunately I don't think that's insignificant enough to change the defaults for the global client.

if there is a benefit, I'd really like to see a first-class context manager or constructable client rather than global variables -- the current "interface" is really inconvenient and not threadsafe. I say interface lightly because "monkey patch these global variables at the start and end of your program" is barely an interface

This is spot on, and we have that now with StripeClient released in stripe-python v5.

The default right now is to initialize a client without a session which triggers the uncleanupable thread-local session creation, the same way as the global client does, but I think we should change this so that you get one session per StripeClient, something like #1261. As long as users create one StripeClient per thread, this should perform the same as the global client re: session re-use.

Following up from that we could make StripeClient a contextmanager so

with stripe.StripeClient as client:
  client.customers.create(...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants