Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_selectors.PollSelectorTestCase.test_above_fd_setsize reported killed by shell #66100

Closed
bitdancer opened this issue Jul 1, 2014 · 14 comments
Labels
topic-asyncio type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@bitdancer
Copy link
Member

BPO 21901
Nosy @gvanrossum, @vstinner, @bitdancer, @1st1

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2014-07-22.20:52:49.898>
created_at = <Date 2014-07-01.22:40:03.775>
labels = ['type-crash', 'expert-asyncio']
title = 'test_selectors.PollSelectorTestCase.test_above_fd_setsize reported killed by shell'
updated_at = <Date 2014-07-26.21:56:32.359>
user = 'https://github.com/bitdancer'

bugs.python.org fields:

activity = <Date 2014-07-26.21:56:32.359>
actor = 'r.david.murray'
assignee = 'none'
closed = True
closed_date = <Date 2014-07-22.20:52:49.898>
closer = 'neologix'
components = ['asyncio']
creation = <Date 2014-07-01.22:40:03.775>
creator = 'r.david.murray'
dependencies = []
files = []
hgrepos = []
issue_num = 21901
keywords = []
message_count = 14.0
messages = ['222059', '222062', '222075', '222534', '222951', '223002', '223165', '223181', '223563', '223571', '223573', '223691', '223696', '224088']
nosy_count = 6.0
nosy_names = ['gvanrossum', 'vstinner', 'r.david.murray', 'neologix', 'python-dev', 'yselivanov']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue21901'
versions = ['Python 3.4', 'Python 3.5']

@bitdancer
Copy link
Member Author

On one particular linux vserver virtual machine (which is unfortunately my development platform for python), test.test_selectors.PollSelectorTestCase.test_above_fd_setsize fails with the following message:

zsh: killed

and at that point the test suite stops running, regardless of whether or not I started it with -j.

As far as I can tell, the configuration of this vserver is the same as the one my buildbots run on, but they are on different host machines, so there could be some differences I'm not remembering. On the buldbots, the test gets skipped with the message 'FD limit reached'.

Anyone have any clues how to debug this?

@bitdancer bitdancer added topic-asyncio type-crash A hard crash of the interpreter, possibly with a core dump labels Jul 1, 2014
@bitdancer bitdancer changed the title test_selectors.PollSelectorTestCase.test_above_fd_setsize killed by shell test_selectors.PollSelectorTestCase.test_above_fd_setsize reported killed by shell Jul 1, 2014
@vstinner
Copy link
Member

vstinner commented Jul 2, 2014

The test changes the maximum number of open files. What is the limit in your shell? You can try to modify the test to add print(soft, hard) after getrlimit().

On Fedora 20:

$ python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))'
(1024, 4096)

The test tries to use the hard limit (4096) to set the soft limit (1024).

@neologix
Copy link
Mannequin

neologix mannequin commented Jul 2, 2014

There's probably a special mechanism due to vserver which makes the
kernel kill the process instead of failing with EPERM, but it's really
surprising.

What happens if you try the following:
$ python -c "from resource import *; _, hard =
getrlimit(RLIMIT_NOFILE); setrlimit(RLIMIT_NOFILE, (hard, hard))"

You could run the process under strace to see what's going on: you'll
likely just see the reception of a signal though. Maybe "dmesg" would
show interesting logs.

@vstinner
Copy link
Member

vstinner commented Jul 7, 2014

ping?

@bitdancer
Copy link
Member Author

The python command just returns.

The dmesg was a good call:

python invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
python cpuset=pydev mems_allowed=0
[...]
Out of memory: kill process python(28623:#112) score 85200 or a child
Killed process python(28623:#112) vsz:340800kB, anon-rss:330764kB, file-rss:3864kB

I *thought* I had this virtual server configured with the same resources as I do the buildbots, but I could be wrong. It's been quite some time since I set both of them up, and I don't even remember how the resources are set at the moment.

Let me know if you want to see the entire dmesg output.

@vstinner
Copy link
Member

Killed process python(28623:#112) vsz:340800kB, anon-rss:330764kB, file-rss:3864kB

340 MB to run test_selectors sounds high.

What is the value of NUM_FDS? And what is the result of this command in your vserver?

$ python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))'
(1024, 4096)

@bitdancer
Copy link
Member Author

rdmurray@pydev:~/python/p34>python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))'
(1024L, 1048576L)

Unfortunately the buildbot box is offline at the moment and it may be a bit before I can get it back, so I can't compare the results above with that VM.

@vstinner
Copy link
Member

rdmurray@pydev:~/python/p34>python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))'
(1024L, 1048576L)

Oh, 1 million files is much bigger than 4 thousand files (4096).

The test should only test FD_SETSIZE + 10 files, the problem is to get FD_SETSITE:

    # A scalable implementation should have no problem with more than
    # FD_SETSIZE file descriptors. Since we don't know the value, we just
    # try to set the soft RLIMIT_NOFILE to the hard RLIMIT_NOFILE ceiling.

For example, on my Linux FD_SETSIZE is 1024, whereas the hard limit of RLIMIT_NOFILE is 4096.

/usr/include/linux/posix_types.h:#define __FD_SETSIZE 1024

Maybe we can simply expose the FD_SETSIZE constant in the select module? The constant is useful when you use select.select(), which is still heavily used on Windows.

@neologix
Copy link
Mannequin

neologix mannequin commented Jul 21, 2014

> rdmurray@pydev:~/python/p34>python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))'
> (1024L, 1048576L)

Oh, 1 million files is much bigger than 4 thousand files (4096).

The test should only test FD_SETSIZE + 10 files, the problem is to get FD_SETSITE:

We could cap it to let's say 2**16, it's larger than any possible
FD_SETSIZE (which are usually low since fd_set are often allocated on
the stack and select() doesn't scale well behind that anyway).

But I don't see anything wrong with the test, it's really the buildbot
setting which is to blame: I expect other tests to fail with such a
low max virtual memory.

@bitdancer
Copy link
Member Author

That is the only test that fails for lack of memory. And it's not the buildbot, it's my development virtual machine. Having the test suite be killed when I do a full test run is...rather annoying.

@neologix
Copy link
Mannequin

neologix mannequin commented Jul 21, 2014

Alright, I'll cap the value then (no need to expose FD_SETSIZE).

@python-dev
Copy link
Mannequin

python-dev mannequin commented Jul 22, 2014

New changeset 7238c6a05ca6 by Charles-François Natali in branch '3.4':
Issue bpo-21901: Cap the maximum number of file descriptors to use for the test.
http://hg.python.org/cpython/rev/7238c6a05ca6

New changeset 89665cc05592 by Charles-François Natali in branch 'default':
Issue bpo-21901: Cap the maximum number of file descriptors to use for the test.
http://hg.python.org/cpython/rev/89665cc05592

@neologix
Copy link
Mannequin

neologix mannequin commented Jul 22, 2014

Sorry for the delay, should be fixed now.

@neologix neologix mannequin closed this as completed Jul 22, 2014
@bitdancer
Copy link
Member Author

Test passes for me now, thanks.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-asyncio type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

2 participants