Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup DefaultSelectors.modify() by 2x #74200

Closed
giampaolo opened this issue Apr 7, 2017 · 22 comments
Closed

Speedup DefaultSelectors.modify() by 2x #74200

giampaolo opened this issue Apr 7, 2017 · 22 comments
Labels
3.7 (EOL) end of life performance Performance or resource usage stdlib Python modules in the Lib dir topic-asyncio

Comments

@giampaolo
Copy link
Contributor

BPO 30014
Nosy @vstinner, @giampaolo, @asvetlov, @1st1
PRs
  • bpo-30014: make poll-like selector's modify() method faster #1030
  • Files
  • selectors_modify.diff
  • bench.py
  • bench_selectors_modify.py
  • echo_server.py
  • echo_client.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-12-20.20:36:49.970>
    created_at = <Date 2017-04-07.10:40:07.361>
    labels = ['3.7', 'expert-asyncio', 'library', 'performance']
    title = 'Speedup DefaultSelectors.modify() by 2x'
    updated_at = <Date 2017-12-20.20:36:49.968>
    user = 'https://github.com/giampaolo'

    bugs.python.org fields:

    activity = <Date 2017-12-20.20:36:49.968>
    actor = 'asvetlov'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-12-20.20:36:49.970>
    closer = 'asvetlov'
    components = ['Library (Lib)', 'asyncio']
    creation = <Date 2017-04-07.10:40:07.361>
    creator = 'giampaolo.rodola'
    dependencies = []
    files = ['46787', '46788', '46789', '46792', '46793']
    hgrepos = []
    issue_num = 30014
    keywords = ['patch']
    message_count = 22.0
    messages = ['291261', '291264', '291265', '291266', '291267', '291268', '291269', '291270', '291271', '291272', '291273', '291274', '291280', '291282', '291286', '291306', '291340', '292586', '292602', '295338', '295564', '308807']
    nosy_count = 5.0
    nosy_names = ['vstinner', 'giampaolo.rodola', 'asvetlov', 'neologix', 'yselivanov']
    pr_nums = ['1030']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue30014'
    versions = ['Python 3.7']

    @giampaolo
    Copy link
    Contributor Author

    Patch in attachment modifies DefaultSelector.modify() so that it uses the underlying selector's modify() method instead of unregister() and register() resulting in a 2x speedup.

    Without patch:
    ~/svn/cpython {master}$ ./python bench.py
    0.006010770797729492

    With patch:
    ~/svn/cpython {master}$ ./python bench.py
    0.00330352783203125

    @giampaolo giampaolo added 3.7 (EOL) end of life stdlib Python modules in the Lib dir topic-asyncio performance Performance or resource usage labels Apr 7, 2017
    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 7, 2017

    Hm, do you have a realistic benchmark which would show the benefit?
    Because this is really a micro-benchmark, and I'm not convinced that
    Selector.modify() is a significant bottleneck in a real-world
    application.

    @giampaolo
    Copy link
    Contributor Author

    modify() can be used often in verbose protocols such as FTP and SMTP where a lot of requests and responses are continuously exchanged between client and server, so you need to switch from EVENT_READ to EVENT_WRITE all the time. I did a similar change some years ago in pyftpdlib but I don't have another benchmark other than this one.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 7, 2017

    Hi Giampaolo Rodola'! It seems like you proposed the same idea 4 years ago and I wrote a similar patch: issue bpo-18932 :-)

    I suggest you to use my perf module to produce more reliable benchmarks. Here is my results on my computer smithers tuned for benchmarks:

    haypo@smithers$ ./python bench_selectors.py -o ref.json
    [apply the patch]
    haypo@smithers$ ./python bench_selectors.py -o patch.json
    haypo@smithers$ ./python -m perf compare_to ref.json patch.json --table
    +----------------------+---------+------------------------------+
    | Benchmark | ref | patch |
    +======================+=========+==============================+
    | PollSelector.modify | 11.3 us | 8.22 us: 1.37x faster (-27%) |
    +----------------------+---------+------------------------------+
    | EpollSelector.modify | 13.5 us | 8.88 us: 1.52x faster (-34%) |
    +----------------------+---------+------------------------------+

    Not significant (1): SelectSelector.modify

    @neologix: "Hm, do you have a realistic benchmark which would show the benefit?"

    I don't think that selector.modify() can be a bottleneck, but IMHO the change is simple and safe enough to be worth it. In a network server with 10k client, an optimization making .modify() 1.52x faster is welcomed.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 7, 2017

    @giampaolo Rodola: CPython development moved to GitHub, can you please create a pull request instead of a patch? Thank you.
    https://docs.python.org/devguide/pullrequest.html

    Hum, I see that my old patch of issue bpo-18932 (selectors_optimize_modify-2.patch) is different. I tried to factorize code. What do you think of my change?

    @giampaolo
    Copy link
    Contributor Author

    Hey Stinner, thanks for chiming in! Your old patch uses unregister() and register() so I'm not sure what speedup that'll take as the whole point is to avoid doing that. You may want to rewrite it and benchmark it but it looks like it'll be slower.

    @giampaolo
    Copy link
    Contributor Author

    PR: #1030

    @vstinner
    Copy link
    Member

    vstinner commented Apr 7, 2017

    @neologix: Do you have a GitHub account? It's hard to me to see if https://github.com/neologix is you or not, and your GitHub username is not filled in the Bug Tracker database.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 7, 2017

    Giampaolo Rodola': "Your old patch uses unregister() and register() so I'm not sure what speedup that'll take as the whole point is to avoid doing that."

    My patch calls generic unregister() and register() of the base classes, but it only uses one syscall per modify() call.

    Oh by the way, it seems like my patch changes KqueueSelector, but your change doesn't. Am I right? KqueueSelector is the default selector on FreeBSD and macOS. IMHO it's worth it to optimize it as well.

    @vstinner vstinner changed the title Speedup DefaultSelectors.modify() by 2x Speedup DefaultSelectors.modify() by 1.5x Apr 7, 2017
    @giampaolo
    Copy link
    Contributor Author

    My patch calls generic unregister() and register() of the base
    classes, but it only uses one syscall per modify() call.

    Doesn't that mean doing 3 operations (unregister(), register(), modify()) instead of the current 2 (unregister(), register())? I don't see how it can be faster than a single modify() syscall.

    Oh by the way, it seems like my patch changes KqueueSelector,
    but your change doesn't. Am I right?

    You are right but it looks like you end up doing the same thing as unregister() and register(). kqueue() has no modify() method so I don't think it can benefit from this change.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 7, 2017

    Doesn't that mean doing 3 operations (unregister(), register(), modify()) instead of the current 2 (unregister(), register())? I don't see how it can be faster than a single modify() syscall.

    The idea is to reuse _BaseSelectorImpl.register() and _BaseSelectorImpl.unregister() to factorize the code. These methods don't use syscall, they create the SelectorKey object and update _fd_to_key. So each class doesn't have to redo these things.

    I don't insist to redo what I did, I'm just trying to explain my change because your change basically copy/paste the same code 3 times, and you forgot KqueueSelector, so you even may have to copy it a 4th time ;-)

    @giampaolo
    Copy link
    Contributor Author

    The idea is to reuse _BaseSelectorImpl.register() and
    _BaseSelectorImpl.unregister() to factorize the code.

    You can't factorize the logic of modify() into those as they do two different things. I also don't like repeating the same thing 3 times but given how the module is organized I'm not sure how to do that as I need to pass 3 things around: the low-level selector (epoll, poll, whatever) and the read and write constants (POLLIN, EPOLLIN) which change depending on the selector being used.
    The same thing applies to the devpoll class (http://bugs.python.org/issue18931).
    I can write a second patch which to refactor the whole module if that is desirable but I prefer to do that in another PR.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 7, 2017

    I don't think that selector.modify() can be a bottleneck, but IMHO the change is simple and safe enough to be worth it. In a network server with 10k client, an optimization making .modify() 1.52x faster is welcomed.

    IMHO it complicates the code for little benefit: that's why I asked
    for a realistic benchmark. This patch could made modify() 10x faster,
    if modify() only accounts for 1% of the overall overhead in a
    realistic use-case, then it's not worth it.

    @neologix neologix mannequin changed the title Speedup DefaultSelectors.modify() by 1.5x Speedup DefaultSelectors.modify() by 2x Apr 7, 2017
    @giampaolo
    Copy link
    Contributor Author

    In certain protocols modify() is supposed to be used on every interaction between client and server. E.g. an FTP server does this:

    • register(fd, EVENT_READ); recv() # wait command from client
    • modify(fd, EVENT_WRITE); send(response) # send response
    • modify(fd, EVENT_READ); recv() # wait for new command

    ...so it's two calls for each command received.
    In asyncio modify() is also used in different circumstances.
    If you're worried about code duplication I can refactor selectors.py first, but IMO it would be a bad idea to reject this or future improvements because the current code status prevents code reuse. Right now there are already 3 classes sharing basically the same code (poll, epoll and devpoll related classes). Those can be refactored similarly to this:
    https://github.com/giampaolo/pyftpdlib/blob/ab699b5f89223e03593f3e004d6a370b4c2e5308/pyftpdlib/ioloop.py#L465-L565

    @giampaolo
    Copy link
    Contributor Author

    @neologix: here's a PR which refactors the poll-related classes: https://github.com/python/cpython/pull/1035/files
    From there, we'll be able to avoid modify() code duplication.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 7, 2017

    This refactoring was already suggested a long time ago, and at the
    time both Guido and I didn't like it because it makes the code
    actually more complicated: DRY in this case doesn't apply IMO.

    Also, this whole thread is a repeat of:
    http://bugs.python.org/issue18932

    At the time, I already asked for one realistic use case demonstrating
    the benefit of implementing modify() natively instead of
    unregister()+register().
    I know that this code makes modify() faster, but as I said multiple
    times, I'd like to se the impact on something else than a
    micro-benchmark: arguably if you propose this it's because you've
    profiled/found it to be an actual bottleneck, do you have *one*
    benchmark to support this change?

    @giampaolo
    Copy link
    Contributor Author

    For the sake of experiment I'm attaching a toy echo server which uses modify() to switch between EVENT_READ and EVENT_WRITE. Without patch I get 35000 req/sec, with patch around 39000 req/sec (11.4% faster).
    To be entirely honest a smarter echo server would switch between EVENT_READ / EVENT_WRITE only in case of BlockingIOError on send() or recv(), but unfortunately it's a condition which (for send()) does not occur often by testing on localhost (for recv() it naturally occurs +27000 times out of 39000). On the other hand, by experimenting with a real network it occurs quite soon:

    import socket
    sock = socket.socket()
    sock.connect(('google.com', 80))
    sock.setblocking(False)
    print(sock.send(b'x' * 50000))
    print(sock.send(b'x' * 50000))  # raise BlockingIOError

    So basically my benchmark is emulating a worst case scenario in which send() always blocks on the first call and succeed on the next one, in order to mimic recv() which blocks half of the times.
    The whole point is that the speedup entirely depends on how many times send() or recv() will block in a real world app connected to the internet (because that's when modify() is supposed to be used) but that's hard to determine, especially under heavy server loads which is hard to emulate. This also involves the SSL handshake when it fails with WANT_READ / WANT_WRITE, which is another circumstance where modify() is going to be used and for which I have no benchmarks.
    I personally don't mind about selectors module per-se, but given that it's the core of important libs such as asyncio and curio I would like to hear a better rationale than "let's reject this 1.5x speedup because DRY does not apply in this case".
    Also please consider that Tornado, Twisted and gevent use native modify() method.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 29, 2017

    The rationale for rejecting wouldn't be "DRY does not apply in this
    case", it would be that this makes the code more complicated, and that
    a negligible speedup would not be worth it.
    Now, thanks to your benchmark, a 10% speedup is not negligible, so
    that seems like a reasonable change.
    I'll have a look at your refactoring code, and once pushed, we can
    optimize modify().

    @vstinner
    Copy link
    Member

    I also like the plan starting with refactoring ;-)

    @giampaolo
    Copy link
    Contributor Author

    OK, #1030 should be good to go.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 9, 2017

    New changeset fbfaa6f by Victor Stinner (Giampaolo Rodola) in branch 'master':
    bpo-30014: make poll-like selector's modify() method faster (bpo-1030)
    fbfaa6f

    @asvetlov
    Copy link
    Contributor

    PR was merged 6 months ago, closing the issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life performance Performance or resource usage stdlib Python modules in the Lib dir topic-asyncio
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants