Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncio: race condition in SSLProtocol #77855

Closed
vstinner opened this issue May 28, 2018 · 15 comments
Closed

asyncio: race condition in SSLProtocol #77855

vstinner opened this issue May 28, 2018 · 15 comments
Labels

Comments

@vstinner
Copy link
Member

BPO 33674
Nosy @vstinner, @ned-deily, @asvetlov, @1st1, @miss-islington
PRs
  • bpo-33674: asyncio: Fix SSLProtocol race #7175
  • [3.7] bpo-33674: asyncio: Fix SSLProtocol race (GH-7175) #7187
  • [3.6] bpo-33674: asyncio: Fix SSLProtocol race (GH-7175) #7188
  • bpo-33674: Pause the transport as early as possible #7192
  • [3.7] bpo-33674: Pause the transport as early as possible (GH-7192) #7193
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-05-29.15:17:16.102>
    created_at = <Date 2018-05-28.20:37:18.279>
    labels = ['3.8', 'expert-asyncio']
    title = 'asyncio: race condition in SSLProtocol'
    updated_at = <Date 2018-05-29.15:17:16.101>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2018-05-29.15:17:16.101>
    actor = 'yselivanov'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-05-29.15:17:16.102>
    closer = 'yselivanov'
    components = ['asyncio']
    creation = <Date 2018-05-28.20:37:18.279>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33674
    keywords = ['patch']
    message_count = 15.0
    messages = ['317916', '317917', '317919', '317922', '317923', '317924', '317960', '317974', '317981', '317983', '317984', '318003', '318004', '318025', '318044']
    nosy_count = 5.0
    nosy_names = ['vstinner', 'ned.deily', 'asvetlov', 'yselivanov', 'miss-islington']
    pr_nums = ['7175', '7187', '7188', '7192', '7193']
    priority = None
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue33674'
    versions = ['Python 3.8']

    @vstinner
    Copy link
    Member Author

    While debugging bpo-32458 (functional test on START TLS), I found a race condition in SSLProtocol of asyncio/sslproto.py.

    Sometimes, _sslpipe.feed_ssldata() is called before _sslpipe.shutdown().

    • SSLProtocol.connection_made() -> SSLProtocol._start_handshake(): self._loop.call_soon(self._process_write_backlog)
    • SSLProtoco.data_received(): direct call to self._sslpipe.feed_ssldata(data)
    • Later, self._process_write_backlog() calls self._sslpipe.do_handshake()

    The first write is delayed by call_soon(), whereas the first read is a direct call to the SSL pipe.

    @vstinner
    Copy link
    Member Author

    Workaround:

    diff --git a/Lib/asyncio/sslproto.py b/Lib/asyncio/sslproto.py
    index 2bfa45dd15..4a5dbb38a1 100644
    --- a/Lib/asyncio/sslproto.py
    +++ b/Lib/asyncio/sslproto.py
    @@ -592,7 +592,7 @@ class SSLProtocol(protocols.Protocol):
             # (b'', 1) is a special value in _process_write_backlog() to do
             # the SSL handshake
             self._write_backlog.append((b'', 1))
    -        self._loop.call_soon(self._process_write_backlog)
    +        self._process_write_backlog()
             self._handshake_timeout_handle = \
                 self._loop.call_later(self._ssl_handshake_timeout,
                                       self._check_handshake_timeout)

    @vstinner
    Copy link
    Member Author

    loop.start_tls() method is new in Python 3.7. If possible, I would prefer to not see it with a builtin race condition, since such race condition is really hard to debug :-( So I raise the priority to release blocker. Sorry again Ned!

    @vstinner
    Copy link
    Member Author

    Yury Selivanov told me that usually it's safer to add call_soon(), than to remove call_soon(). But adding many call_soon() can make asyncio SSL slower.

    SSLProtocol doesn't seem to like call_soon(), it's only used at:

    • connection_lost(): call_soon(self._app_protocol.connection_lost, exc)
    • connection_made() ~~> call_soon(self._process_write_backlog)
    • _on_handshake_complete(): call_soon(self._process_write_backlog)

    That's all. All other operations are done immediately.

    _on_handshake_complete() contains a long comment:

            # In case transport.write() was already called. Don't call
            # immediately _process_write_backlog(), but schedule it:
            # _on_handshake_complete() can be called indirectly from
            # _process_write_backlog(), and _process_write_backlog() is not
            # reentrant.
            self._loop.call_soon(self._process_write_backlog)

    @1st1
    Copy link
    Member

    1st1 commented May 28, 2018

    The fix is correct and the bug is now obvious: data_received() occur pretty much any time after connection_made() call; if call_soon() is used in connection_made(), data_received() may find the protocol in an incorrect state.

    Kudos Victor for debugging this.

    @1st1
    Copy link
    Member

    1st1 commented May 28, 2018

    And I agree, this should make it to 3.7.0

    @vstinner
    Copy link
    Member Author

    New changeset be00a55 by Victor Stinner in branch 'master':
    bpo-33674: asyncio: Fix SSLProtocol race (GH-7175)
    be00a55

    @ned-deily
    Copy link
    Member

    New changeset 0dd8fd0 by Ned Deily (Miss Islington (bot)) in branch '3.7':
    bpo-33674: asyncio: Fix SSLProtocol race (GH-7175) (bpo-7187)
    0dd8fd0

    @1st1
    Copy link
    Member

    1st1 commented May 29, 2018

    New changeset 7593b8a by Yury Selivanov (Victor Stinner) in branch '3.6':
    bpo-33674: asyncio: Fix SSLProtocol race (GH-7175) (GH-7188)
    7593b8a

    @1st1
    Copy link
    Member

    1st1 commented May 29, 2018

    New changeset f295587 by Yury Selivanov in branch 'master':
    bpo-33674: Pause the transport as early as possible (bpo-7192)
    f295587

    @miss-islington
    Copy link
    Contributor

    New changeset eca0859 by Miss Islington (bot) in branch '3.7':
    bpo-33674: Pause the transport as early as possible (GH-7192)
    eca0859

    @vstinner
    Copy link
    Member Author

    I fixed the main issue, so I remove the "release blocker" priority.

    @vstinner
    Copy link
    Member Author

    bpo-33674: Pause the transport as early as possible (bpo-7192)
    f295587

    Is it ok to always resume reading? Previously reading was only resumed if the transport was reading.

    @1st1
    Copy link
    Member

    1st1 commented May 29, 2018

    yes, the method is idempotent.

    @1st1
    Copy link
    Member

    1st1 commented May 29, 2018

    Closing this one. Please open new issues to track regressions.

    @1st1 1st1 closed this as completed May 29, 2018
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants