Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssl socket leak #74056

Closed
thehesiod mannequin opened this issue Mar 22, 2017 · 23 comments
Closed

ssl socket leak #74056

thehesiod mannequin opened this issue Mar 22, 2017 · 23 comments
Assignees
Labels
3.7 (EOL) end of life performance Performance or resource usage topic-SSL

Comments

@thehesiod
Copy link
Mannequin

thehesiod mannequin commented Mar 22, 2017

BPO 29870
Nosy @pitrou, @tiran, @Lukasa, @thehesiod, @Mariatta
PRs
  • bpo-29870: ssl socket leak #981
  • bpo-29870: Free CRL DP with CRL_DIST_POINTS_free() #1085
  • bpo-29870: Add a NEWS entry for #29870 (PR #981) #2047
  • [3.5] bpo-29870: Break circular references when closing SSLTransport objects (#981) #2048
  • [3.6] bpo-29870: Break circular references when closing SSLTransport objects (#981) #2049
  • Files
  • valgrind.log.gz
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/tiran'
    closed_at = <Date 2017-04-15.01:44:09.771>
    created_at = <Date 2017-03-22.01:10:31.111>
    labels = ['expert-SSL', '3.7', 'performance']
    title = 'ssl socket leak'
    updated_at = <Date 2017-06-09.22:42:09.065>
    user = 'https://github.com/thehesiod'

    bugs.python.org fields:

    activity = <Date 2017-06-09.22:42:09.065>
    actor = 'yselivanov'
    assignee = 'christian.heimes'
    closed = True
    closed_date = <Date 2017-04-15.01:44:09.771>
    closer = 'Mariatta'
    components = ['SSL']
    creation = <Date 2017-03-22.01:10:31.111>
    creator = 'thehesiod'
    dependencies = []
    files = ['46750']
    hgrepos = []
    issue_num = 29870
    keywords = []
    message_count = 23.0
    messages = ['289954', '289956', '289960', '289961', '291427', '291441', '291444', '291447', '291497', '291500', '291501', '291502', '291503', '291504', '291505', '291508', '291509', '291510', '291511', '291512', '291513', '291518', '291692']
    nosy_count = 5.0
    nosy_names = ['pitrou', 'christian.heimes', 'Lukasa', 'thehesiod', 'Mariatta']
    pr_nums = ['981', '1085', '2047', '2048', '2049']
    priority = 'high'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue29870'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6', 'Python 3.7']

    @thehesiod
    Copy link
    Mannequin Author

    thehesiod mannequin commented Mar 22, 2017

    When upgrading to 3.5.3 we noticed that the requests module was leaking memory rather quickly. This led to me logging the issue: https://github.com/kennethreitz/requests/issues/3933. After more investigation I've found that the leak is caused by the raw python SSL sockets. I've created a test file here: https://gist.github.com/thehesiod/ef79dd77e2df7a0a7893dfea6325d30a which allows you to reproduce the leak with raw python ssl socket (CLIENT_TYPE = ClientType.RAW), aiohttp or requests. They all leak in a similar way due to their use of the python SSL socket objects. I tried tracing the memory usage with tracemalloc but nothing interesting popped up so I believe this is a leak in the native code.

    A docker cloud image is available here: amohr/testing:stretch_request_leak based on:

    FROM debian:stretch
    
    COPY request_https_leak.py /tmp/request_https_leak.py
    
    RUN apt-get update && \
        apt-get install -y python3.5 python3-pip git
    
    RUN python3 -m pip install requests git+git://github.com/thehesiod/pyca.git@fix-py3#egg=calib setproctitle requests psutil
    

    I believe this issue was introduced in python 3.5.3 as we're not seeing the leak with 3.5.2. Also I haven't verified yet if this happens on non-debian systems. I'll update if I have any more info. I believe 3.6 is similarly impacted but am not 100% certain yet.

    @thehesiod thehesiod mannequin assigned tiran Mar 22, 2017
    @thehesiod thehesiod mannequin added topic-SSL performance Performance or resource usage labels Mar 22, 2017
    @thehesiod
    Copy link
    Mannequin Author

    thehesiod mannequin commented Mar 22, 2017

    validated 3.6 in fedora is affected as well, see github bug for charts. So it seems all 3.5.3+ versions are affected. I'm guessing it was introduced in one of the SSL changes in 3.5.3: https://docs.python.org/3.5/whatsnew/changelog.html#python-3-5-3

    @thehesiod
    Copy link
    Mannequin Author

    thehesiod mannequin commented Mar 22, 2017

    adding valgrind log of 3.5.3 on debian: jessie

    @thehesiod
    Copy link
    Mannequin Author

    thehesiod mannequin commented Mar 22, 2017

    interestingly the valgrind run doesn't show a leak in the profile

    @pitrou
    Copy link
    Member

    pitrou commented Apr 10, 2017

    Can you record sys.getallocatedblocks() to see whether it grows continuously?

    @thehesiod
    Copy link
    Mannequin Author

    thehesiod mannequin commented Apr 10, 2017

    @pitrou: sys.getallocatedblocks does not seem to increase

    @pitrou
    Copy link
    Member

    pitrou commented Apr 10, 2017

    I see. This may mean the leak is in memory that's not managed directly by Python (e.g. some OpenSSL structure).

    Is there a way to reproduce without third-party libraries such as requests?

    @thehesiod
    Copy link
    Mannequin Author

    thehesiod mannequin commented Apr 10, 2017

    yes, in the gist I created you can switch between the various clients, by default right now it uses raw sockets.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 11, 2017

    After adapting your test script to run against a local openssl server (openssl s_server -www), I can't see a single leak: the process is peaking at 20860 KB RSS. This is with Python 3.5 tip.

    Does it need a specific server to test against to showcase the leak?

    @thehesiod
    Copy link
    Mannequin Author

    thehesiod mannequin commented Apr 11, 2017

    the interesting part is it doesn't leak with a local https server, it appears to need to be a remove server.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 11, 2017

    Is there a fast enough remote server that shows the leak? I've tested with my own remote server (https://pitrou.net/), but it doesn't leak.

    @thehesiod
    Copy link
    Mannequin Author

    thehesiod mannequin commented Apr 11, 2017

    ya, my sample script hits google.com <http://google.com/\>, it's pretty fast. It just does a "HEAD".

    On Apr 11, 2017, at 9:14 AM, Antoine Pitrou <report@bugs.python.org> wrote:

    Antoine Pitrou added the comment:

    Is there a fast enough remote server that shows the leak? I've tested with my own remote server (https://pitrou.net/), but it doesn't leak.

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue29870\>


    @pitrou
    Copy link
    Member

    pitrou commented Apr 11, 2017

    Google is not very fast here (a couple of requests / sec at most). How many requests does it take to see a clear tendency?

    @thehesiod
    Copy link
    Mannequin Author

    thehesiod mannequin commented Apr 11, 2017

    see graphs here: https://github.com/kennethreitz/requests/issues/3933, x-axis is number of requests not what it says (seconds).

    @pitrou
    Copy link
    Member

    pitrou commented Apr 11, 2017

    Ok, thank you. I've tweaked the script to remove most threads and use maps.google.com (which is faster here), and I managed to bisect the leak to deduce that the offending changeset is 598894f.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 11, 2017

    The following addition fixes the leak:

    diff --git a/Modules/_ssl.c b/Modules/_ssl.c
    index bb40051..8f5facd 100644
    --- a/Modules/_ssl.c
    +++ b/Modules/_ssl.c
    @@ -1203,6 +1203,8 @@ _get_crl_dp(X509 *certificate) {
         Py_XDECREF(lst);
     #if OPENSSL_VERSION_NUMBER < 0x10001000L
         sk_DIST_POINT_free(dps);
    +#else
    +    CRL_DIST_POINTS_free(dps);
     #endif
         return res;
     }

    Christian, what do you think?

    @pitrou pitrou added the 3.7 (EOL) end of life label Apr 11, 2017
    @tiran
    Copy link
    Member

    tiran commented Apr 11, 2017

    CRL_DIST_POINTS_free() should be available in all supported OpenSSL versions. The function is defined by DECLARE_ASN1_FUNCTIONS(CRL_DIST_POINTS).

    @pitrou
    Copy link
    Member

    pitrou commented Apr 11, 2017

    So we should use it instead of sk_DIST_POINT_free()? I'd like to minimize potential breakage here.

    @thehesiod
    Copy link
    Mannequin Author

    thehesiod mannequin commented Apr 11, 2017

    awesome! Thanks for finding a proposing fix pitrou! btw I found an example of freeing this structure here: http://www.zedwood.com/article/c-openssl-parse-x509-certificate-pem

    @tiran
    Copy link
    Member

    tiran commented Apr 11, 2017

    Yes, I'm currently testing the change with a bunch of OpenSSL and LibreSSL versions.

    By the way the memory issue can be reproduced with any certificate that contains a CRL distribution point. Letsencrypt certs don't have a CRL DP. I guess Alexander's test cert doesn't have a CRL DP either. The Nokia test cert in our test suite contains one.

    ---

    import _ssl
    import sys
    
    PEM = 'Lib/test/nokia.pem'
    
    def mem():
        with open('/proc/self/status') as f:
            for line in f:
                if line.startswith('RssAnon'):
                    print(line, end='')
    
    for i in range(10000):
        if i % 1000 == 0:
            mem()
        d = _ssl._test_decode_cert(PEM)
        assert d['crlDistributionPoints']
    
    mem()

    Without fix:

    $ ./python t.py 
    RssAnon:            4376 kB
    RssAnon:            4840 kB
    RssAnon:            5224 kB
    RssAnon:            5608 kB
    RssAnon:            6120 kB
    RssAnon:            6504 kB
    RssAnon:            6888 kB
    RssAnon:            7272 kB
    RssAnon:            7656 kB
    RssAnon:            8040 kB
    RssAnon:            8424 kB

    With fix:

    $ ./python t.py 
    RssAnon:            4376 kB
    RssAnon:            4376 kB
    RssAnon:            4376 kB
    RssAnon:            4376 kB
    RssAnon:            4376 kB
    RssAnon:            4376 kB
    RssAnon:            4376 kB
    RssAnon:            4376 kB
    RssAnon:            4376 kB
    RssAnon:            4376 kB
    RssAnon:            4376 kB

    @tiran
    Copy link
    Member

    tiran commented Apr 11, 2017

    Antoine, you might find multissl.py helpful. I wrote a script to automate testing with multiple versions of OpenSSL and libressl. The first time it takes about half an hour to download, compile and install all versions locally. https://github.com/tiran/multissl/blob/master/multissl.py

    @pitrou
    Copy link
    Member

    pitrou commented Apr 11, 2017

    Antoine, you might find multissl.py helpful.

    Very nice, thank you!

    @Mariatta
    Copy link
    Member

    I believe this was fixed in https://bugs.python.org/issue29738, so I'm closing this.
    Thanks :)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life performance Performance or resource usage topic-SSL
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants