Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

selectors: provide a helper to choose a selector using constraints #63664

Closed
vstinner opened this issue Oct 31, 2013 · 14 comments
Closed

selectors: provide a helper to choose a selector using constraints #63664

vstinner opened this issue Oct 31, 2013 · 14 comments
Labels
docs Documentation in the Doc dir

Comments

@vstinner
Copy link
Member

BPO 19465
Nosy @gvanrossum, @pitrou, @vstinner

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2014-02-11.18:36:59.398>
created_at = <Date 2013-10-31.22:36:21.453>
labels = ['docs']
title = 'selectors: provide a helper to choose a selector using constraints'
updated_at = <Date 2014-02-11.18:36:59.397>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2014-02-11.18:36:59.397>
actor = 'gvanrossum'
assignee = 'docs@python'
closed = True
closed_date = <Date 2014-02-11.18:36:59.398>
closer = 'gvanrossum'
components = ['Documentation']
creation = <Date 2013-10-31.22:36:21.453>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 19465
keywords = []
message_count = 14.0
messages = ['201855', '201856', '201857', '201858', '201859', '201863', '201865', '201866', '201889', '201906', '201910', '204108', '204119', '210994']
nosy_count = 5.0
nosy_names = ['gvanrossum', 'pitrou', 'vstinner', 'neologix', 'docs@python']
pr_nums = []
priority = 'normal'
resolution = 'wont fix'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue19465'
versions = ['Python 3.4']

@vstinner
Copy link
Member Author

multiprocess, telnetlib (and subprocess in a near future, see bpo-18923) use the following code to select the best selector:

# poll/select have the advantage of not requiring any extra file descriptor,
# contrarily to epoll/kqueue (also, they require a single syscall).
if hasattr(selectors, 'PollSelector'):
    _TelnetSelector = selectors.PollSelector
else:
    _TelnetSelector = selectors.SelectSelector

I don't like the principle of "a default selector", selectors.DefaultSelector should be removed in my opinion.

I would prefer a function returning the best selector using constraints. Example:

def get_selector(use_fd=True) -> BaseSelector:
  ...

By default, it would return the same than the current DefaultSelector. But if you set use_fd=False, the choice would be restricted to select() or poll().

I don't want to duplicate code like telnetlib uses in each module, it's harder to maintain. The selectors module may get new selectors in the future, see for example bpo-18931.

Except use_fd, I don't have other ideas of constraints. I read somewhere that differenet selectors may have different limits on the number of file descriptors. I don't know if it's useful to use such constraint?

@gvanrossum
Copy link
Member

What's the use case for not wanting to use an extra FD?

Nevertheless I'm fine with using a function to pick the default selector (but it requires some changes to asyncio too, which currently uses DefaultSelector).

Something I would find useful would be a way to override the selector choice on the command line. I currently have to build this into the app's arg parser and main(), e.g. http://code.google.com/p/tulip/source/browse/examples/sink.py#64

@vstinner
Copy link
Member Author

What's the use case for not wanting to use an extra FD?

A selector may be used a few millisecond just to check if a socket is ready, and then destroyed. For such use case, select() is maybe enough (1 syscall). Epoll requires more system calls: create the epoll FD, register the socket, poll, destroy the epoll FD (4 syscalls).

@gvanrossum
Copy link
Member

Hm... I'm trying to understand how you're using the selector in
telnetlib.py (currently the only example outside asyncio). It seems you're
always using it with a single file/object, which is always 'self' (which
wraps a socket), except one place where you're also selecting on stdin.
Sometimes you're using select(0) to check whether I/O is possible right
now, using select(0), and then throw away the selector; other times you've
got an actual loop.

I wonder if you could just create the selector when the Telnet class is
instantiated (or the first time you need the selector) and keep the socket
permanently registered; IIUC selectors are level-triggered, and no
resources are consumed when you're not calling its select() method. (I
think this means that if the socket was ready at some point in the past,
but you already read those bytes, and now you're calling select(), it won't
be considered ready even though it was registered the whole time.)

It still seems to me that this is pretty atypical use of selectors; the
extra FD used doesn't bother me much, since it doesn't really scale anyway
(that would require hooking multiple Telnet instances into the the same
selector, probably using an asyncio EventLoop).

If you insist on having a function that prefers poll and select over kqueue
or epoll, perhaps we can come up with a slightly higher abstraction for the
preference order? Maybe faster startup time vs. better scalability? (And I
wouldn't be surprised if on Windows you'd still be better off using
IocpProactor instead of SelectSelector -- but that of course has a
different API altogether.)

1 similar comment
@gvanrossum
Copy link
Member

Hm... I'm trying to understand how you're using the selector in
telnetlib.py (currently the only example outside asyncio). It seems you're
always using it with a single file/object, which is always 'self' (which
wraps a socket), except one place where you're also selecting on stdin.
Sometimes you're using select(0) to check whether I/O is possible right
now, using select(0), and then throw away the selector; other times you've
got an actual loop.

I wonder if you could just create the selector when the Telnet class is
instantiated (or the first time you need the selector) and keep the socket
permanently registered; IIUC selectors are level-triggered, and no
resources are consumed when you're not calling its select() method. (I
think this means that if the socket was ready at some point in the past,
but you already read those bytes, and now you're calling select(), it won't
be considered ready even though it was registered the whole time.)

It still seems to me that this is pretty atypical use of selectors; the
extra FD used doesn't bother me much, since it doesn't really scale anyway
(that would require hooking multiple Telnet instances into the the same
selector, probably using an asyncio EventLoop).

If you insist on having a function that prefers poll and select over kqueue
or epoll, perhaps we can come up with a slightly higher abstraction for the
preference order? Maybe faster startup time vs. better scalability? (And I
wouldn't be surprised if on Windows you'd still be better off using
IocpProactor instead of SelectSelector -- but that of course has a
different API altogether.)

@vstinner
Copy link
Member Author

vstinner commented Nov 1, 2013

It still seems to me that this is pretty atypical use of selectors

I already implemented something similar to subprocess.Popen.communicate() when I was working on old Python versions without the timeout parameter of communicate().
http://ufwi.org/projects/edw-svn/repository/revisions/master/entry/trunk/src/nucentral/nucentral/common/process.py#L222

IMO calling select with a few file descriptors (between 1 and 3) and destroying quickly the "selector" is no a rare use case.

If I would port my code to selectors, I don't want to rewrite it to keep the selector alive longer, just because selectors force me to use the super-powerful fast epoll/kqueue selector.

(To be honest, I will probably not notice any performance impact. But I like reducing the number of syscalls, not the opposite :-))

@gvanrossum
Copy link
Member

OK. Let's have a function to select a default selector. Can you think of a
better name for the parameter? Or maybe there should be two functions?

@vstinner
Copy link
Member Author

vstinner commented Nov 1, 2013

OK. Let's have a function to select a default selector.
Can you think of a better name for the parameter? Or
maybe there should be two functions?

I prefer to leave the question to the author of the module, Charles-François :-)

@neologix
Copy link
Mannequin

neologix mannequin commented Nov 1, 2013

There are actually two reasons to choosing poll over epoll/kqueue
(i.e. no extra FD):

  • it's a bit faster (1 syscall vs 3)
  • but more importantly - and that's the main reason I did it in
    telnetlib/multiprocessing/subprocess - sometimes, you really don't
    want to use an extra FD: for example, if you're creating 300
    telnet/subprocess instances, one more FD per instance can make you
    reach RLIMIT_NOFILE, which makes some syscalls fail with EMFILE (at
    work we have up to a 100 machines, and we spawn 1 subprocess per
    machine when distributing files with bittorrent).

So I agree it would be nice to have a better way to get a selector not
requiring any extra FD.

The reason I didn't add such a method in the first place is that I
don't want to end up like many Java APIs:
Foo.getBarFactory().getInstance().initialize().provide() :-)

I read somewhere that differenet selectors may have different limits on the number of file descriptors.

Apart from select(), all other selectors don't have an upper limit.

As for the performance profiles, depending on the application usage,
select() can be faster than poll(), poll() can be faster than epoll(),
etc. But since it's really highly usage-specific - and of course OS
specific - I think the current choice heuristic is fine: people with
specific needs can just use PollSelector/EpollSelector themselves.

To sum up, get_selector(use_fd=True) looks fine to me.

@gvanrossum
Copy link
Member

Hm. If you really are going to create 300 instances, you should probably
use asyncio. Otherwise, how are you going to multiplex them? Create 300
threads each doing select() on 1 FD? That sounds like a poor architecture
and I don't want to bend over backwards to support or encourage that.

@neologix
Copy link
Mannequin

neologix mannequin commented Nov 1, 2013

Of course, when I have 300 connections to remote nodes, I use poll()
to multiplex between them.

But there are times when you can have a large number of threads
running concurrently, and if many of them call e.g.
subprocess.check_output() at the same time (which does call
subprocess.communicate() behind the scene, and thus calls
select/poll), then one extra FD per instance could be an issue.
For example, in http://bugs.python.org/issue18756, os.urandom() would
start failing when multiple threads called it at the same time.

@pitrou
Copy link
Member

pitrou commented Nov 23, 2013

I think this is more of a documentation issue. People who don't want a new fd can hardcode PollSelector (poll has been POSIX for a long time).

@neologix
Copy link
Mannequin

neologix mannequin commented Nov 23, 2013

Antoine Pitrou added the comment:

I think this is more of a documentation issue. People who don't want a new fd can hardcode PollSelector (poll has been POSIX for a long time).

That's also what I now think.
I don't think that the use case is common enough to warrant a
"factory", a default selector is fine.

@pitrou pitrou added the docs Documentation in the Doc dir label Nov 23, 2013
@vstinner
Copy link
Member Author

It looks you rejected my idea, so I'm in favor of just closing the issue. Do you agree?

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir
Projects
None yet
Development

No branches or pull requests

3 participants