test_selectors.PollSelectorTestCase.test_above_fd_setsize reported killed by shell #66100

bitdancer · 2014-07-01T22:40:04Z

BPO	21901
Nosy	@gvanrossum, @vstinner, @bitdancer, @1st1

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2014-07-22.20:52:49.898>
created_at = <Date 2014-07-01.22:40:03.775>
labels = ['type-crash', 'expert-asyncio']
title = 'test_selectors.PollSelectorTestCase.test_above_fd_setsize reported killed by shell'
updated_at = <Date 2014-07-26.21:56:32.359>
user = 'https://github.com/bitdancer'

bugs.python.org fields:

activity = <Date 2014-07-26.21:56:32.359>
actor = 'r.david.murray'
assignee = 'none'
closed = True
closed_date = <Date 2014-07-22.20:52:49.898>
closer = 'neologix'
components = ['asyncio']
creation = <Date 2014-07-01.22:40:03.775>
creator = 'r.david.murray'
dependencies = []
files = []
hgrepos = []
issue_num = 21901
keywords = []
message_count = 14.0
messages = ['222059', '222062', '222075', '222534', '222951', '223002', '223165', '223181', '223563', '223571', '223573', '223691', '223696', '224088']
nosy_count = 6.0
nosy_names = ['gvanrossum', 'vstinner', 'r.david.murray', 'neologix', 'python-dev', 'yselivanov']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue21901'
versions = ['Python 3.4', 'Python 3.5']

bitdancer · 2014-07-01T22:40:03Z

On one particular linux vserver virtual machine (which is unfortunately my development platform for python), test.test_selectors.PollSelectorTestCase.test_above_fd_setsize fails with the following message:

zsh: killed

and at that point the test suite stops running, regardless of whether or not I started it with -j.

As far as I can tell, the configuration of this vserver is the same as the one my buildbots run on, but they are on different host machines, so there could be some differences I'm not remembering. On the buldbots, the test gets skipped with the message 'FD limit reached'.

Anyone have any clues how to debug this?

vstinner · 2014-07-02T00:20:53Z

The test changes the maximum number of open files. What is the limit in your shell? You can try to modify the test to add print(soft, hard) after getrlimit().

On Fedora 20:

$ python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))'
(1024, 4096)

The test tries to use the hard limit (4096) to set the soft limit (1024).

neologix · 2014-07-02T06:50:57Z

There's probably a special mechanism due to vserver which makes the
kernel kill the process instead of failing with EPERM, but it's really
surprising.

What happens if you try the following:
$ python -c "from resource import *; _, hard =
getrlimit(RLIMIT_NOFILE); setrlimit(RLIMIT_NOFILE, (hard, hard))"

You could run the process under strace to see what's going on: you'll
likely just see the reception of a signal though. Maybe "dmesg" would
show interesting logs.

vstinner · 2014-07-07T22:55:23Z

ping?

bitdancer · 2014-07-13T16:29:05Z

The python command just returns.

The dmesg was a good call:

python invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
python cpuset=pydev mems_allowed=0
[...]
Out of memory: kill process python(28623:#112) score 85200 or a child
Killed process python(28623:#112) vsz:340800kB, anon-rss:330764kB, file-rss:3864kB

I *thought* I had this virtual server configured with the same resources as I do the buildbots, but I could be wrong. It's been quite some time since I set both of them up, and I don't even remember how the resources are set at the moment.

Let me know if you want to see the entire dmesg output.

vstinner · 2014-07-14T08:36:32Z

Killed process python(28623:#112) vsz:340800kB, anon-rss:330764kB, file-rss:3864kB

340 MB to run test_selectors sounds high.

What is the value of NUM_FDS? And what is the result of this command in your vserver?

$ python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))'
(1024, 4096)

bitdancer · 2014-07-16T01:25:53Z

rdmurray@pydev:~/python/p34>python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))'
(1024L, 1048576L)

Unfortunately the buildbot box is offline at the moment and it may be a bit before I can get it back, so I can't compare the results above with that VM.

vstinner · 2014-07-16T08:20:13Z

rdmurray@pydev:~/python/p34>python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))'
(1024L, 1048576L)

Oh, 1 million files is much bigger than 4 thousand files (4096).

The test should only test FD_SETSIZE + 10 files, the problem is to get FD_SETSITE:

    # A scalable implementation should have no problem with more than
    # FD_SETSIZE file descriptors. Since we don't know the value, we just
    # try to set the soft RLIMIT_NOFILE to the hard RLIMIT_NOFILE ceiling.

For example, on my Linux FD_SETSIZE is 1024, whereas the hard limit of RLIMIT_NOFILE is 4096.

/usr/include/linux/posix_types.h:#define __FD_SETSIZE 1024

Maybe we can simply expose the FD_SETSIZE constant in the select module? The constant is useful when you use select.select(), which is still heavily used on Windows.

neologix · 2014-07-21T07:08:37Z

> rdmurray@pydev:~/python/p34>python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))'
> (1024L, 1048576L)

Oh, 1 million files is much bigger than 4 thousand files (4096).

The test should only test FD_SETSIZE + 10 files, the problem is to get FD_SETSITE:

We could cap it to let's say 2**16, it's larger than any possible
FD_SETSIZE (which are usually low since fd_set are often allocated on
the stack and select() doesn't scale well behind that anyway).

But I don't see anything wrong with the test, it's really the buildbot
setting which is to blame: I expect other tests to fail with such a
low max virtual memory.

bitdancer · 2014-07-21T10:58:45Z

That is the only test that fails for lack of memory. And it's not the buildbot, it's my development virtual machine. Having the test suite be killed when I do a full test run is...rather annoying.

neologix · 2014-07-21T11:17:40Z

Alright, I'll cap the value then (no need to expose FD_SETSIZE).

python-dev · 2014-07-22T20:30:31Z

New changeset 7238c6a05ca6 by Charles-François Natali in branch '3.4':
Issue bpo-21901: Cap the maximum number of file descriptors to use for the test.
http://hg.python.org/cpython/rev/7238c6a05ca6

New changeset 89665cc05592 by Charles-François Natali in branch 'default':
Issue bpo-21901: Cap the maximum number of file descriptors to use for the test.
http://hg.python.org/cpython/rev/89665cc05592

neologix · 2014-07-22T20:52:50Z

Sorry for the delay, should be fixed now.

bitdancer · 2014-07-26T21:56:32Z

Test passes for me now, thanks.

bitdancer added topic-asyncio type-crash A hard crash of the interpreter, possibly with a core dump labels Jul 1, 2014

bitdancer changed the title ~~test_selectors.PollSelectorTestCase.test_above_fd_setsize killed by shell~~ test_selectors.PollSelectorTestCase.test_above_fd_setsize reported killed by shell Jul 1, 2014

neologix mannequin closed this as completed Jul 22, 2014

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_selectors.PollSelectorTestCase.test_above_fd_setsize reported killed by shell #66100

test_selectors.PollSelectorTestCase.test_above_fd_setsize reported killed by shell #66100

bitdancer commented Jul 1, 2014

bitdancer commented Jul 1, 2014

vstinner commented Jul 2, 2014

neologix mannequin commented Jul 2, 2014

vstinner commented Jul 7, 2014

bitdancer commented Jul 13, 2014

vstinner commented Jul 14, 2014

bitdancer commented Jul 16, 2014

vstinner commented Jul 16, 2014

neologix mannequin commented Jul 21, 2014

bitdancer commented Jul 21, 2014

neologix mannequin commented Jul 21, 2014

python-dev mannequin commented Jul 22, 2014

neologix mannequin commented Jul 22, 2014

bitdancer commented Jul 26, 2014

test_selectors.PollSelectorTestCase.test_above_fd_setsize reported killed by shell #66100

test_selectors.PollSelectorTestCase.test_above_fd_setsize reported killed by shell #66100

Comments

bitdancer commented Jul 1, 2014

bitdancer commented Jul 1, 2014

vstinner commented Jul 2, 2014

neologix mannequin commented Jul 2, 2014

vstinner commented Jul 7, 2014

bitdancer commented Jul 13, 2014

vstinner commented Jul 14, 2014

bitdancer commented Jul 16, 2014

vstinner commented Jul 16, 2014

neologix mannequin commented Jul 21, 2014

bitdancer commented Jul 21, 2014

neologix mannequin commented Jul 21, 2014

python-dev mannequin commented Jul 22, 2014

neologix mannequin commented Jul 22, 2014

bitdancer commented Jul 26, 2014