New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AndroidStudio fails to start with " library initialization failed - unable to allocate file descriptor table - out of memoryAborted" #10921

Closed
mbiebl opened this Issue Nov 25, 2018 · 59 comments

Comments

8 participants
@mbiebl
Copy link
Contributor

mbiebl commented Nov 25, 2018

systemd version the issue has been seen with

git master (v239-2694-g7af002f71)

Used distribution

Debian sid

Trying to start AndroidStudio (java application) via gnome-shell fails with the following error

Nov 25 16:23:39 pluto jetbrains-studio.desktop[2195]: library initialization failed - unable to allocate file descriptor table - out of memoryAborted

Starting the binary listed in jetbrains-studio.desktop directly in gnome-terminal works (gnome-terminal uses systemd --user ). Trying to start the binary via xterm fails with the same error message.

This worked in v239, so is a recent regression.

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 25, 2018

So, what precisely fails there? We need some more data (strace -f -s500 … for starters) to see what's failing there...

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 25, 2018

I don't know what precisely fails, I only know that starting the binary fails. But I'm happy to provide more information if you tell me how to gather the data. strace is attached.
strace.txt

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 26, 2018

Hmm, this is interesting:

5450  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1073741816, rlim_max=1073741816}) = 0
5450  mmap(NULL, 51539607552, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
5450  mmap(NULL, 51539607552, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
5450  brk(0x56336f19a000)               = 0x56276f186000
5450  mmap(NULL, 51539742720, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
5450  write(2, "library initialization failed - unable to allocate file descriptor table - out of memory", 88) = 88
5450  rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0

So it appears that that software queries RLIMIT_NOFILE and then tries to allocate memory depending on the limit returned.

In current git we have substantially bumped RLIMIT_NOFILE, but only to 256K, but in your case it shows it to be set to 1G (!!!). More importantly even: systemd sets only the hard limit to 256K, not the soft limit, thus normal apps should normally not be affected by this at all, as they generally only care about the soft limit. The prlimit64() call above shows that both limits are set to 1G though...

And then, to my knowledge Debian actually defaulted to 1M so far, for the soft limit, so where does the 1G come from if systemd is not doing that? This is seriously weird...

Further up we see this btw:

5408  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1073741816}) = 0
5408  prlimit64(0, RLIMIT_NOFILE, {rlim_cur=1073741816, rlim_max=1073741816}, NULL) = 0

So there we see that the soft limit is initially currently set to 1K, as it should, but the hard limit incorrectly to 1G. Then, the tool updates the soft limit to the hard limit.

The question really is, where does the 1G come from?

What does Debian do with the HIGH_RLIMIT_NOFILE meson build option?

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 26, 2018

So there we see that the soft limit is initially currently set to 1K, as it should, but the hard limit incorrectly to 1G. Then, the tool updates the soft limit to the hard limit.

The question really is, where does the 1G come from?

no idea.

What does Debian do with the HIGH_RLIMIT_NOFILE meson build option?

We do not currently set/modify that build option. So I assume this means it will be set to 256*1024 ?

Any idea why starting the application via gnome-terminal succeeds and fails via xterm and gnome-shell launcher?

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 26, 2018

We do not currently set/modify that build option. So I assume this means it will be set to 256*1024 ?

Any idea why starting the application via gnome-terminal succeeds and fails via xterm and gnome-shell launcher?

My educated guess is that gnome-terminal's factory is spawned as systemd --user service, while xterm is run directly as child of the PAM login session. And systemd --user might get this right, but the PAM login session doesn't?

What does /etc/security/limits.conf say about nofile on your system?

What does ulimit -n -H and ulimit -n -S say when run on a console getty login?

You have to track down what component sets the 1G RLMIT_NOFILE softlimit. /proc/$PID/limits can help you tracking down which processes do this. For example, if the console getty login shows the 1G hard limit being in effect, go up the process tree always checking /proc/$PID/limits to see where the last process is that has the high limit set

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 26, 2018

You have to track down what component sets the 1G RLMIT_NOFILE softlimit. /proc/$PID/limits can help you tracking down which processes do this. For example, if the console getty login shows the 1G hard limit being in effect, go up the process tree always checking /proc/$PID/limits to see where the last process is that has the high limit set

I assume this must be systemd, given that it's the upgrade from v239 to git master which broke this?

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 26, 2018

git master:
xterm

~$ ulimit -n -H ; ulimit -n -S
1073741816
1024

gnome-terminal

~$ ulimit -n -H; ulimit -n -S
262144
1024
@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 26, 2018

with v239:
gnome-terminal

~$ ulimit -n -H; ulimit -n -S
4096
1024

xterm

~$ ulimit -n -H; ulimit -n -S
1048576
1024

androidstudio starts succesfully in both shells.

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 26, 2018

I assume this must be systemd, given that it's the upgrade from v239 to git master which broke this?

No, I'd guess it's not systemd.

So here's my educated guess: some component on Debian (pam_limits.so for example, configured via /etc/security/limits.conf) sets LIMIT_NOFILE's hard limit to RLIM_INFINITY, and that's where things fall apart: with current systemd RLIM_INFINITY now means a lot more fds than it used to, because we set /proc/sys/fs/nr_open and /proc/sys/fs/file-max to really unlimited.

So, systemd is not really causing this, but this is for the first time triggered because some component asking for "infinity" now gets something much closer to "infinity" than previously possible...

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 26, 2018

Ok, so it's the fiddling with proc/sys/fs/nr_open and /proc/sys/fs/file-max which is causing this?

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 26, 2018

/etc/security/limits.conf is empty, btw

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 26, 2018

i don't know, I am just guessing. You need to check what debian does in /etc/security/limits.conf, as I suggested already

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 26, 2018

maybe /etc/security/limits.d/?

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 26, 2018

nope, empty as well

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 26, 2018

root@pluto:/etc/security# ls -l /etc/security/limits.d/
insgesamt 0
root@pluto:/etc/security# cat /etc/security/limits.conf 
# /etc/security/limits.conf
#
#Each line describes a limit for a user in the form:
#
#<domain>        <type>  <item>  <value>
#
#Where:
#<domain> can be:
#        - a user name
#        - a group name, with @group syntax
#        - the wildcard *, for default entry
#        - the wildcard %, can be also used with %group syntax,
#                 for maxlogin limit
#        - NOTE: group and wildcard limits are not applied to root.
#          To apply a limit to the root user, <domain> must be
#          the literal username root.
#
#<type> can have the two values:
#        - "soft" for enforcing the soft limits
#        - "hard" for enforcing hard limits
#
#<item> can be one of the following:
#        - core - limits the core file size (KB)
#        - data - max data size (KB)
#        - fsize - maximum filesize (KB)
#        - memlock - max locked-in-memory address space (KB)
#        - nofile - max number of open files
#        - rss - max resident set size (KB)
#        - stack - max stack size (KB)
#        - cpu - max CPU time (MIN)
#        - nproc - max number of processes
#        - as - address space limit (KB)
#        - maxlogins - max number of logins for this user
#        - maxsyslogins - max number of logins on the system
#        - priority - the priority to run user process with
#        - locks - max number of file locks the user can hold
#        - sigpending - max number of pending signals
#        - msgqueue - max memory used by POSIX message queues (bytes)
#        - nice - max nice priority allowed to raise to values: [-20, 19]
#        - rtprio - max realtime priority
#        - chroot - change root to directory (Debian-specific)
#
#<domain>      <type>  <item>         <value>
#

#*               soft    core            0
#root            hard    core            100000
#*               hard    rss             10000
#@student        hard    nproc           20
#@faculty        soft    nproc           20
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#ftp             -       chroot          /ftp
#@student        -       maxlogins       4

#*		soft	core		unlimited
# End of file
@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 26, 2018

ah, so if you google it, there are reports that pam_limits in debian patches in RLIMIT_NOFILE=infinity in code if no configuration is around. I can't verify that in the sources though, since the DEbian package is unmaintained/has no accessible VCS?

My recommendation would be to drop that debian-specific patch to PAM. It's really strange that Debian asks for an infinity limit on fds there, this had to explode one day, if apps do what that java thing does... (Moreover, bumping the RLIMIT_NOFILE through PAM like that is kinda sucky anyway, since it only applies to login sessions, not to traditional system services)

If you can't fix DEbian's PAM package, then consider either patching the code in systemd to not bump the two sysctls (but you'd do a disservice to your users then I'd say, as this means people needing large numbers of fds still can't), or ship a sysctl.d drop-in that lowers it again...

But really, the best approach is to fix the Debian PAM package and drop the Debian-specific patch there...

(BTW, this all suggests there's another bug somewhere: it appears pam_limits is included in regular PAM stacks, but not the one that systemd --user runs though, and that should be fixed: systemd --user should mostly use the same PAM session modules as a getty/ssh/gdm login.

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 26, 2018

Why does systemd need to fiddle with those settings in the first place?

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 26, 2018

The sources are available at https://sources.debian.org/src/pam/1.1.8-3.8/ fwiw. The VCS link is indeed broken after alioth has been shut down.

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 26, 2018

We generally ask downstreams to read NEWS when updating, such changes are announced there.

Besides NEWS, see #10244. Kernel people asked us for it, basically.

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 26, 2018

Regarding the PAM config:

michael@pluto:/etc/pam.d$ grep limits *
cron:# Sets up user limits, please define limits for cron tasks
cron:# through /etc/security/limits.conf
cron:session    required   pam_limits.so
gdm-autologin:session required        pam_limits.so
gdm-fingerprint:session required        pam_limits.so
gdm-launch-environment:session required        pam_limits.so
gdm-password:session required        pam_limits.so
login:# set access limits.
login:# Sets up user limits according to /etc/security/limits.conf
login:# (Replaces the use of /etc/limits in old login)
login:session    required   pam_limits.so
runuser:session		required	pam_limits.so
sshd:# access limits that are hard to express in sshd_config.
sshd:# Set up user limits from /etc/security/limits.conf.
sshd:session    required     pam_limits.so
su:# Sets up user limits according to /etc/security/limits.conf
su:# (Replaces the use of /etc/limits in old login)
su:session    required   pam_limits.so
systemd-user:session  required pam_limits.so

I use gdm, and its PAM config seems to include pam_limits.so

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 26, 2018

Just curious, which pam patch in particular did you have in mind,
https://sources.debian.org/src/pam/1.1.8-3.8/debian/patches-applied/pam-limits-nofile-fd-setsize-cap/ ?

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 26, 2018

Hmm, can't find it, maybe the google story i found was misleading (https://gitlab.eurecom.fr/oai/odroid-linux-3.10.y-rt/commit/60fd760fb9ff7034360bab7137c917c0330628c2)

There are also mentions of a patch called /027_pam_limits_better_init_allow_explicit_root that did this?

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 26, 2018

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 26, 2018

hmm, something in your stack appears to set RLIMIT_NOFILE to RLIM_INFINITY though, and I am very sure it's not systemd. If it's not PAM it must be something else (though I would still guess it is PAM).

It might be worth stracing a login session (maybe a getty on the console to make it simple) to see where exactly this happens.

But there's little I can help you with that, since on Fedora the RLIMIT_NOFILE systemd sets up actually just gets inherited down the tree cleanly, all the way through PAM, nothing interferes with it.

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 26, 2018

Building with

  -Dbump-proc-sys-fs-file-max=false \
  -Dbump-proc-sys-fs-nr-open=false

works around the problem. I suppose RLIMIT_NOFILE is derived from /proc/sys/fs/nr_open. At least those numbers do match (on a v239 system):

# cat /proc/sys/fs/nr_open 
1048576
# ulimit -n -H
1048576
@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 26, 2018

yes, those sysctls do control what RLIM_INFINITY means for RLIMIT_NOFILE. Still, turning off the bumping of those sysctls is ultimately the wrong approach and simply a work-around. it's important to figure out which piece of code in the PAM hooks actually set RLIMIT_NOFILE to RLIM_INFINITY (which doesn't do this on Fedora), and fix that.

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 26, 2018

So, i found something interesting: it appears that pam_limits copies resource limits from PID 1 (by accessing /proc/1/limits) in some case. Not sure I grok what that is about, but that's a really bad idea, given that PID 1 internally bumps RLIMIT_NOFILE to super high amounts since it might need to listen to a ton of sockets itself (service processes forked off by PID 1 do not inherit that soft limit though, we are careful to reset that for children). Maybe that's where this is coming from...

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 26, 2018

If you invoke strace like I suggested above and then grep it for RLIMIT_NOFILE, then you'll see something like this:

# grep RLIMIT_NOFILE /tmp/strace.log 
34554 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=256*1024}) = 0
36496 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=256*1024}) = 0
36500 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=256*1024}) = 0
36502 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=256*1024}) = 0
34554 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=256*1024}) = 0
36510 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=256*1024}) = 0
36524 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=256*1024}) = 0
36532 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=256*1024}) = 0
36534 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=256*1024}) = 0
36569 prlimit64(0, RLIMIT_NOFILE, {rlim_cur=256*1024, rlim_max=256*1024}, NULL) = 0
36572 prlimit64(0, RLIMIT_NOFILE, {rlim_cur=256*1024, rlim_max=256*1024}, NULL) = 0
36578 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=256*1024}) = 0
36583 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=256*1024}) = 0

Of these invocations of prlimit() all but two just query the limit, and don't manipulate it (i.e. the third argument is NULL for them). And the two calls that actually change it (i.e. the two that have the third argument non-NULL) only set soft limit to hard limit for themselves (presumably because the relevant process is fine with large numbers of fds, and knows it's not using select()). Hence, through the whole series of processes until the login prompt is reached the hard limit doesn't change at all! And that's how it should be: what systemd set up for getty@tty5.service propagates all the way down the chain.

When you do the same in Debian I figure you'll fine that something in there actually calls prlimit/setrlimit with a limit that is non-NULL. The goal is to figure out what that is.

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 27, 2018

Thanks for your detailed reply @poettering
I apologize for my overly grumpy tone at times and do appreciate that you take the time to answer so patiently.
To my defense, be aware that I just can't easily remodel how the PAM stack works in Debian. I'm neither the maintainer of those packages nor do I have the necessary knowledge for that.
That said, I'll try to digest all this information and hope I can find out more about this.

For the mean time, I hope you are not too disappointed if I use -Dbump-proc-sys-fs-file-max=false Dbump-proc-sys-fs-nr-open=false to avoid breaking popular applications like AndroidStudio in Debian.

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 27, 2018

Thanks for your detailed reply @poettering
I apologize for my overly grumpy tone at times and do appreciate that you take the time to answer so patiently.

oh, no prob. I doubt I am any better at that ;-)

To my defense, be aware that I just can't easily remodel how the PAM stack works in Debian. I'm neither the maintainer of those packages nor do I have the necessary knowledge for that.
That said, I'll try to digest all this information and hope I can find out more about this.

Yeah, would be great to figure out what's going on.

For the mean time, I hope you are not too disappointed if I use -Dbump-proc-sys-fs-file-max=false Dbump-proc-sys-fs-nr-open=false to avoid breaking popular applications like AndroidStudio in Debian.

Well, as short-term fix while figuring out what precisely is going on there that's of course fine to do. Just don#t forget to debug this properly, then just taping over it ;-)

@fsateler

This comment has been minimized.

Copy link
Member

fsateler commented Nov 28, 2018

I have done the strace dance. I found a lot less output:

% grep RLIMIT_NOFILE /tmp/strace.log                                                                
6813  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=4*1024}) = 0
6813  prlimit64(0, RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=1024*1024}, NULL) = 0
6855  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
6855  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0

However the limit seems to be 1M, not 1G.

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 28, 2018

@poettering
When building git master with -Dbump-proc-sys-fs-file-max=false Dbump-proc-sys-fs-nr-open=false, I get the following

$ cat /proc/1/limits
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             63279                63279                processes 
Max open files            1048576              1048576              files     
Max locked memory         67108864             67108864             bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       63279                63279                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us    

Login in via getty on a tty, I get

# ulimit -H -n
1048576

Login in via debug-shell on tty9

# ultimit -H -n
262144

Doing the same with -Dbump-proc-sys-fs-file-max=true Dbump-proc-sys-fs-nr-open=true

# cat /proc/1/pid
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             63279                63279                processes 
Max open files            1073741816           1073741816           files     
Max locked memory         67108864             67108864             bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       63279                63279                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us 

getty

# ulimit -H -n
1073741816

debug-shell

# ultimit -H -n
262144

This looks like getty inherits the limits from PID1.

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 28, 2018

Next try: commented out pam_limits.so in /etc/pam.d/login
Now ulimit -H -n yields 262144. So pam_limits.so does indeed look like it's responsible for setting RLIMIT_NOFILE

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 28, 2018

I'm a bit undecided about this: If we actually want to setup RLIMIT_NOFILE for login sessions, wouldn't it be more intuitive and straightforward to do that via a /etc/security/limits.d/systemd.conf containing

* hard nofile 262144

This might be an acceptable solution for Debian, so we don't need to set -Dbump-proc-sys-fs-file-max=false Dbump-proc-sys-fs-nr-open=false and not need to patch pam_limits.so.
That said, we should probably be compatible with older Debian releases and set nofile to 1048576

@fsateler wdyt?

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 28, 2018

@xnox , @martinpitt your input on this would be welcome as well.

@keszybz

This comment has been minimized.

Copy link
Member

keszybz commented Nov 28, 2018

https://sources.debian.org/src/pam/1.1.8-3.8/modules/pam_limits/pam_limits.c/#L369
The code is in "upstream" pam. At least Fedora also has it.

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 28, 2018

@keszybz hm, any idea why pam_limits behaves differently then on Fedora? Apparently it doesn't use parse_kernel_limits() there...

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 28, 2018

I filed linux-pam/linux-pam#85 now, btw. Maybe the PAM maintainers will fix this themselves, we'll see.

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 28, 2018

I'm a bit undecided about this: If we actually want to setup RLIMIT_NOFILE for login sessions, wouldn't it be more intuitive and straightforward to do that via a /etc/security/limits.d/systemd.conf containing

* hard nofile 262144

This might be an acceptable solution for Debian, so we don't need to set -Dbump-proc-sys-fs-file-max=false Dbump-proc-sys-fs-nr-open=false and not need to patch pam_limits.so.
That said, we should probably be compatible with older Debian releases and set nofile to 1048576

Well, the reason we changed systemd to bump the sysctls was to reduce the artificial limits imposed, simplify things, and allow people to change the default rlimit system-wide easily in system.conf, so that this would apply to the whole system: system services and PAM sessions. Hence it's definitely preferable if pam_limits would not fiddle with the limit at all and just leave it set.

the fewer knobs affecting the limits by default the better... with the systemd upstream approach and the way pam_limits apparently works on Fedora there's only one knob that by default is used: DefaultLimitNOFILE= in system.conf. all the other knobs default to "no effect" now (but if people want to set them, they can and they will have an effect then).

@keszybz

This comment has been minimized.

Copy link
Member

keszybz commented Nov 28, 2018

Hmm, the code should only hit that code path when set_all options is used for pam_limits.so. Fedora does not set this option, and does not hit this code path. Debian does't set the option either, but it get the limit. When I add set_all under Fedora, I get the Debian behaviour...

@keszybz keszybz added the not-our-bug label Nov 28, 2018

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 28, 2018

Ok, so we are slowly getting to the bottom of this:
https://sources.debian.org/src/pam/1.1.8-3.8/debian/patches-applied/027_pam_limits_better_init_allow_explicit_root/#L66

This patch removes the if (ctrl & PAM_SET_ALL) { check.
So Debian's pam_limits.so behaves as set_all is always set.

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 28, 2018

indeed, @mbiebl you found it!

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 28, 2018

that patch looks a bit misguided to me. i mean if there's a security boundary from one user to another, it should be the job of the tool involved to reset the limits, not of PAM. PAM is not the place to ensure well-defined execution environments, that#s really the job of su/sudo or whatever calls into PAM...

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 28, 2018

Now, the thing is, PAM seems to be in a sad state in Debian and doesn't appear to have an active maintainer. The last maintainer upload was in 2014. I don't have high hopes to get a reaction if I file a bug report against the Debian pam package :-/

@keszybz

This comment has been minimized.

Copy link
Member

keszybz commented Nov 28, 2018

This is something to figure out on the Debian side then. Either remove the misguided patch from PAM, or put some kludge in systemd to make pam behave. (An interesting hack would be to mount something over /proc/1/limits, with the contents chosen to make pam_limit.so behave.)

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 28, 2018

Indeed. Thanks for all the help tracking this down.

@mbiebl mbiebl closed this Nov 28, 2018

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Nov 28, 2018

@keszybz I guess a simpler (temporary) workaround could be to ship a /etc/security/limits.d/systemd.conf config file with reasonable defaults. Such a pam config could be shipped by the systemd package, so would be under our control until we have a fixed pam package.

@xnox

This comment has been minimized.

Copy link
Contributor

xnox commented Nov 28, 2018

@vorlonofportland Any thoughts on this?

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Nov 28, 2018

(btw, if anyone cares about java, it might be nice to file a bug upstream against the jdk so that they stop allocating such huge RLIMIT_NOFILE sized arrays. Even if the 512K hard limit we now default to means only 4M are wasted this way they are still wasted...

@mbiebl

This comment has been minimized.

Copy link
Contributor

mbiebl commented Dec 23, 2018

@poettering kdeinit5 is broken by this in a similar way:
https://github.com/KDE/kinit/blob/master/src/kdeinit/kinit.cpp#L163

We have user reports in Debian, where startkde spins for minutes busily closing file descriptors.

@rastersoft

This comment has been minimized.

Copy link

rastersoft commented Dec 24, 2018

So, is there a workaround that we can manually apply temporary to our SID systems? I use MPlabX and SimplicityStudio (which are java-based IDEs) and have this same problem.

Thanks.

@adamnew123456

This comment has been minimized.

Copy link

adamnew123456 commented Dec 25, 2018

@rastersoft You can assign a value to /etc/security/limits.conf which limits the value of nofile to something smaller.

For reference, I had the same issue with IDEA that the OP did, and adding this line (chris is my primary login) and logging out and in again fixed the issue:

chris    hard    nofile    4096
@cpw

This comment has been minimized.

Copy link

cpw commented Dec 26, 2018

Fixed on debian sid with 240 by doing:

echo >/etc/security/limits.d/systemd.conf "* hard nofile 1048576"

And rebooting.

This is a ridiculous thing. Everything Java is broken because of this change. 👌👌👌
Since this looks like it might be a debian Pam "patch" that's causing this, I've filed a bug with debian:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=917374

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment