Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unkillable connections to ftp port 21 #1384

Closed
gojun077 opened this issue Jan 26, 2022 · 46 comments
Closed

unkillable connections to ftp port 21 #1384

gojun077 opened this issue Jan 26, 2022 · 46 comments

Comments

@gojun077
Copy link

What I Did

I have been using proftpd 1.2.x in a production environment for several years and recently upgraded to the latest stable 1.3.7c and run proftpd inside a Docker container. Since the upgrade, security scanners like Qualys and random (malicious?) users create hanging connections stuck on status "Authenticating" that are unkillable with ftpdctl and that ignore /etc/proftpd.conf settings "TimeoutNoTransfer", "TimeoutStalled", "TimeoutIdle". I am able to kill the connections by using the connection PID from ftptop and executing kill -9 <PID>. These hanging connections are causing CPU 100%.

Here's what I see in the docker logs for proftpd for one such malicious connection that was made on Jan 13 and remained in COMMAND (authenticating) for 270h!:

{"log":"2022-01-13 23:35:22,627 ce99838480c9 proftpd[17641] 172.31.0.2 (193.187.91.226[193.187.91.226]): mod_delay/0.7: no DelayOnEvent rules configured with \"DelayTable none\" in effect, disabling module\n","stream":"stderr","time":"2022-01-13T23:35:22.627867719Z"}
{"log":"2022-01-13 23:35:22,628 ce99838480c9 proftpd[17641] 172.31.0.2 (193.187.91.226[193.187.91.226]): FTP session opened.\n","stream":"stderr","time":"2022-01-13T23:35:22.628442412Z"}
{"log":"2022-01-13 23:35:22,706 ce99838480c9 proftpd[17641] 172.31.0.2 (193.187.91.226[193.187.91.226]): USER admin: no such user found from 193.187.91.226 [193.187.91.226] to 172.31.0.2:21\n","stream":"stderr","time":"2022-01-13T23:35:22.707114467Z"}
{"log":"2022-01-13 23:35:22,786 ce99838480c9 proftpd[17641] 172.31.0.2 (193.187.91.226[193.187.91.226]): USER admin: no such user found from 193.187.91.226 [193.187.91.226] to 172.31.0.2:21\n","stream":"stderr","time":"2022-01-13T23:35:22.786513963Z"}

The attacker tries to login with an invalid user admin but this connection is never terminated until I discover it 1.5 weeks after Jan 13 and manually kill the PID.

What I Expected/Wanted

I expect that connections should be dropped after failing user/passwd auth and that the proftpd.conf settings for "TimeoutNoTransfer", "TimeoutStalled", "TimeoutIdle" should apply to all connections. I also expected that ftpdctl kick <IP> should work on stalled connections. Note that ftpdctl kick works fine on non-zombie/malicious ftp sessions.

ProFTPD Version and Configuration

Please help us reproduce the problem/issue you are encountering. To do this,
we need to know which version of ProFTPD you are using, how it was built,
etc. The following command is an easy way to get all of this information:

$ proftpd -V
Compile-time Settings:
  Version: 1.3.7c (maint)
  Platform: LINUX [Linux 5.4.0-1057-gcp x86_64]
  Built: Sun Sep 5 2021 00:00:00 UTC
  Built With:
    configure  '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-depend
ency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--da
tadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--localstatedir=/var'
'--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--libexecdir=/usr/libexec/proftpd' '
--localstatedir=/run/proftpd' '--disable-strip' '--enable-ctrls' '--enable-dso' '--enable-facl' '--enable-ipv6' '--enab
le-memcache' '--enable-nls' '--enable-openssl' '--disable-pcre' '--disable-redis' '--enable-shadow' '--enable-tests=non
etwork' '--with-libraries=/usr/lib64/mariadb' '--with-includes=/usr/include/mysql' '--with-modules=mod_readme:mod_auth_
pam:mod_tls' '--with-shared=mod_sql:mod_sql_passwd:mod_sql_mysql:mod_sql_postgres:mod_sql_sqlite:mod_quotatab:mod_quota
tab_file:mod_quotatab_ldap:mod_quotatab_radius:mod_quotatab_sql:mod_ldap:mod_ban:mod_ctrls_admin:mod_facl:mod_load:mod_
vroot:mod_radius:mod_ratio:mod_rewrite:mod_site_misc:mod_exec:mod_shaper:mod_wrap2:mod_wrap2_file:mod_wrap2_sql:mod_cop
y:mod_deflate:mod_ifversion:mod_qos:mod_sftp:mod_sftp_pam:mod_sftp_sql:mod_tls_shmcache:mod_tls_memcache:mod_unique_id:
mod_ifsession' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CC=gcc' 'CFLAGS=-O2 -flto=au
to -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
-Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/r
edhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' '
LDFLAGS=-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redh
at/redhat-annobin-cc1 ' 'CXX=g++' 'CXXFLAGS=-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pip
e -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-har
dened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-u
nwind-tables -fstack-clash-protection -fcf-protection'

  CFLAGS: -g2 -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-securit
y -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-st
rong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-pro
tection -fcf-protection -Wall -fno-omit-frame-pointer -fno-strict-aliasing -Werror=implicit-function-declaration
  LDFLAGS: -L$(top_srcdir)/lib -L$(top_builddir)/lib -Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redha
t/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -rdynamic -L/usr/lib64/mariadb -L/usr/lib64/ -L/usr
/lib64
  LIBS: -lacl  -lssl -lcrypto -lcap  -lssl -lcrypto  -lpam -lsupp -lattr -lresolv -lresolv -lcrypt -lmemcachedutil -lme
mcached

  Files:
    Configuration File:
      /etc/proftpd.conf
    Pid File:
      /run/proftpd/proftpd.pid
    Scoreboard File:
      /run/proftpd/proftpd.scoreboard
    Header Directory:
      /usr/include/proftpd
    Shared Module Directory:
      /usr/libexec/proftpd

  Info:
    + Max supported UID: 4294967295
    + Max supported GID: 4294967295

  Features:
    - Autoshadow support
    + Controls support
    + curses support
    - Developer support
    + DSO support
    + IPv6 support
    + Largefile support
    - Lastlog support
    + Memcache support
    + ncursesw support
    + NLS support
    + OpenSSL support (OpenSSL 1.1.1k  FIPS 25 Mar 2021, FIPS enabled)
    - PCRE support
    + POSIX ACL support
    - Redis support
    + Sendfile support
    + Shadow file support
    - Sodium support
    + Trace support
    + xattr support

  Tunable Options:
    PR_TUNABLE_BUFFER_SIZE = 1024
    PR_TUNABLE_DEFAULT_RCVBUFSZ = 8192
    PR_TUNABLE_DEFAULT_SNDBUFSZ = 8192
    PR_TUNABLE_ENV_MAX = 2048
    PR_TUNABLE_GLOBBING_MAX_MATCHES = 100000
    PR_TUNABLE_GLOBBING_MAX_RECURSION = 8
    PR_TUNABLE_HASH_TABLE_SIZE = 40
    PR_TUNABLE_LOGIN_MAX = 256
    PR_TUNABLE_NEW_POOL_SIZE = 512
    PR_TUNABLE_PATH_MAX = 4096
    PR_TUNABLE_SCOREBOARD_BUFFER_SIZE = 80
    PR_TUNABLE_SCOREBOARD_SCRUB_TIMER = 30
    PR_TUNABLE_SELECT_TIMEOUT = 30
    PR_TUNABLE_TIMEOUTIDENT = 10
    PR_TUNABLE_TIMEOUTIDLE = 600
    PR_TUNABLE_TIMEOUTLINGER = 10
    PR_TUNABLE_TIMEOUTLOGIN = 300
    PR_TUNABLE_TIMEOUTNOXFER = 300
    PR_TUNABLE_TIMEOUTSTALLED = 3600
    PR_TUNABLE_XFER_SCOREBOARD_UPDATES = 10

I am using the proftpd v1.3.7c package provided by Fedora 35.

In addition, we need to see all of the ProFTPD configuration files you
are using (minus any sensitive information like passwords, of course). Armed
with the version and configuration data, then, we can set up ProFTPD locally
using the same configuration, and see what happens.

proftpd conf files:

proftpd-bug-rpt.tar.gz

ftptop screen shots of zombie connections

Screen Shot 2021-12-28 at 11 33 18

Screen Shot 2022-01-25 at 15 17 38

@Castaglia Castaglia self-assigned this Jan 30, 2022
@Castaglia
Copy link
Member

Thanks for the detailed bug report! For these scanner connections, can you tell if they are only sending the USER command (for e.g. "admin"), or do they appear to hang after sending a PASS command? There's a lot more machinery involved in the processing of a PASS command; if these "unkillable connections" happen only with USER commands, that will narrow down the code paths to examine...

@Castaglia
Copy link
Member

One way to tell about the above question is to use ps auxwww (especially as root); ProFTPD updates the process title for each session with its current/recent command. This means that ps auxwww will show a little more information than ftptop; I'm hoping that it can show us the last command/activity for these hanging connections.

@gojun077
Copy link
Author

One way to tell about the above question is to use ps auxwww (especially as root)

Sure -- I need to add ps to the stripped-down fedora35 container environment running proftpd and then next time I see a hanging connection I will run ps auxwww and share my findings with you. btw I am currently on holiday for the Lunar New Year in Asia so it will be several days before I can take a look at this. Thanks for your help.

@gojun077
Copy link
Author

gojun077 commented Mar 8, 2022

Sorry for the delay in getting back to you! I created a new proftpd container image that includes ps and when I got another hanging connection I ran ftptop and ps auxwww. Here are the screenshots (although I blurred out the user name and IP from ps).
ftptop_hanging_cnxn

psauxwww_hanging_cnxn

Note that this time the hanging connection came from a normal user I gave access to, not a scanner like Qualys.
The only difference between the ftptop and ps output is that the latter also shows the username of the connection stuck in "authenticating".

Thanks

@Castaglia
Copy link
Member

Thanks! For that normal user session, did the client eventually send the PASS command (to complete the authentication), or did it just send USER, and then do nothing?

@gojun077
Copy link
Author

For the normal user session w/ PID 32148 the connection got stuck after USER was sent, but I think this is an isolated incident b/c user al... has been using proftpd without problems.

Castaglia added a commit that referenced this issue Mar 24, 2022
…king signals, then forgetting to unblock them in some error conditions; such cases _might_ cause this reported behavior.
Castaglia added a commit that referenced this issue Mar 26, 2022
…gnals-issue1384

Issue #1384: I reviewed the codebase for cases where we might be bloc…
Castaglia added a commit that referenced this issue Mar 26, 2022
… long-running `while` loops might be lacking appropriate signal handling.

I didn't find many, and none immediately stood out to me as "smoking guns"
for the reported behavior.  Still, I think it's worth addressing them.
Castaglia added a commit that referenced this issue Mar 27, 2022
…ue1384

Issue #1384: I reviewed the codebase for cases where some potentially…
@Castaglia
Copy link
Member

OK. I'm poring over the code base, to see what else might be happening here. In the mean time, the next time you see one of these stuck/unkillable processes, can you try doing a strace -f -p $PID on the process ID? That might provide some more information about what system calls it is doing, which can then help me narrow down where in the code those system calls are happening. Thanks!

@jpfinger
Copy link

jpfinger commented Apr 5, 2022

We are also seeing this issue, though only from malicious connections so far.

An strace -f -p of the PID shows the same thing looping (two entries here):

--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0
rt_sigaction(SIGALRM, NULL, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, 8) = 0
rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0
alarm(5) = 0
rt_sigreturn({mask=[]}) = 35984944
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0
rt_sigaction(SIGALRM, NULL, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, 8) = 0
rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0
alarm(5) = 0
rt_sigreturn({mask=[]}) = 35952464

Our config:

#  /opt/proftpd/sbin/proftpd -V
Compile-time Settings:
  Version: 1.3.7d (git)
  Platform: LINUX [Linux 3.10.0-1160.53.1.el7.x86_64 x86_64]
  Built: Fri Feb 25 2022 20:09:42 GMT
  Built With:
    configure  '--prefix=/opt/proftpd-1.3.7d-56d500e' '--enable-ctrls' '--with-modules=mod_sftp:mod_tls:mod_ifsession:mod_ban:mod_auth_pam:mod_vroot'

  CFLAGS: -g2 -O2 -Wall -fno-omit-frame-pointer -fno-strict-aliasing
  LDFLAGS: -L$(top_srcdir)/lib -L$(top_builddir)/lib  -rdynamic
  LIBS:  -lssl -lcrypto -lpam  -lssl  -lcrypto -lsupp -lcrypt -ldl

  Files:
    Configuration File:
      /opt/proftpd-1.3.7d-56d500e/etc/proftpd.conf
    Pid File:
      /opt/proftpd-1.3.7d-56d500e/var/proftpd.pid
    Scoreboard File:
      /opt/proftpd-1.3.7d-56d500e/var/proftpd.scoreboard

  Info:
    + Max supported UID: 4294967295
    + Max supported GID: 4294967295

  Features:
    - Autoshadow support
    + Controls support
    - curses support
    - Developer support
    - DSO support
    + IPv6 support
    + Largefile support
    - Lastlog support
    - Memcache support
    - ncurses support
    - NLS support
    + OpenSSL support (OpenSSL 1.0.2k  26 Jan 2017, FIPS enabled)
    - PCRE support
    - POSIX ACL support
    - Redis support
    + Sendfile support
    + Shadow file support
    - Sodium support
    + Trace support
    + xattr support

  Tunable Options:
    PR_TUNABLE_BUFFER_SIZE = 1024
    PR_TUNABLE_DEFAULT_RCVBUFSZ = 8192
    PR_TUNABLE_DEFAULT_SNDBUFSZ = 8192
    PR_TUNABLE_ENV_MAX = 2048
    PR_TUNABLE_GLOBBING_MAX_MATCHES = 100000
    PR_TUNABLE_GLOBBING_MAX_RECURSION = 8
    PR_TUNABLE_HASH_TABLE_SIZE = 40
    PR_TUNABLE_LOGIN_MAX = 256
    PR_TUNABLE_NEW_POOL_SIZE = 512
    PR_TUNABLE_PATH_MAX = 4096
    PR_TUNABLE_SCOREBOARD_BUFFER_SIZE = 80
    PR_TUNABLE_SCOREBOARD_SCRUB_TIMER = 30
    PR_TUNABLE_SELECT_TIMEOUT = 30
    PR_TUNABLE_TIMEOUTIDENT = 10
    PR_TUNABLE_TIMEOUTIDLE = 600
    PR_TUNABLE_TIMEOUTLINGER = 10
    PR_TUNABLE_TIMEOUTLOGIN = 300
    PR_TUNABLE_TIMEOUTNOXFER = 300
    PR_TUNABLE_TIMEOUTSTALLED = 3600
    PR_TUNABLE_XFER_SCOREBOARD_UPDATES = 10

The OS is Centos 7.9, kernel 3.10.0-1160.53.1.el7.x86_64.

@Castaglia
Copy link
Member

@jpfinger Thanks for that info, that helps! The next time you can, can you run strace -f -p -t -T? The additional -t -T options will add the wall clock timestamps, and time spent in syscalls, to the strace output. For the above SIGALRM loop you provided, the timing can help us narrow down which timers are causing the SIGARLM signals, which in turn can narrow down the code path in question.

@jpfinger
Copy link

Unfortunately I had to kill the existing stuck sessions we had and we haven't seen any more yet. This is still on my radar and will get an update when we see it again.

@jpfinger
Copy link

Ok, we have one currently stuck at ~20 hours. Someone knows we're talking about them! I'll hold off on killing this one for a bit.

# strace -f -t -T -p 28387
strace: Process 28387 attached
19:13:36 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
19:13:36 rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0 <0.000019>
19:13:36 rt_sigaction(SIGALRM, NULL, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, 8) = 0 <0.000022>
19:13:36 rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0 <0.000016>
19:13:36 alarm(5)                       = 0 <0.000031>
19:13:36 rt_sigreturn({mask=[]})        = 40489792 <0.000017>
19:13:41 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
19:13:41 rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0 <0.000026>
19:13:41 rt_sigaction(SIGALRM, NULL, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, 8) = 0 <0.000021>
19:13:41 rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0 <0.000021>
19:13:41 alarm(5)                       = 0 <0.000022>
19:13:41 rt_sigreturn({mask=[]})        = 36750912 <0.000022>
19:13:46 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
19:13:46 rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0 <0.000025>
19:13:46 rt_sigaction(SIGALRM, NULL, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, 8) = 0 <0.000027>
19:13:46 rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0 <0.000020>
19:13:46 alarm(5)                       = 0 <0.000021>
19:13:46 rt_sigreturn({mask=[]})        = 35991104 <0.000021>
19:13:51 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
19:13:51 rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0 <0.000025>
19:13:51 rt_sigaction(SIGALRM, NULL, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, 8) = 0 <0.000022>
19:13:51 rt_sigaction(SIGALRM, {sa_handler=0x422ab0, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7fc76b4f0400}, NULL, 8) = 0 <0.000021>
19:13:51 alarm(5)                       = 0 <0.000021>
19:13:51 rt_sigreturn({mask=[]})        = 38513872 <0.000021>

@Castaglia
Copy link
Member

Thanks! From that trace output, we do see that the SIGARLM signal is being sent to the process every 5 secs. None of the hardcoded default timeouts are 5s; @jpfinger do you have any 5s timeouts in your ProFTPD configuration anywhere (including contrib modules)?

@jpfinger
Copy link

In fact we do in mod_ctrls:


...
<IfModule mod_ctrls.c>
  ControlsEngine        on
  ControlsACLs          all allow user root
  ControlsMaxClients    2
  ControlsLog           /some/path/logs/controls.log
  ControlsInterval      5
  ControlsSocket        /some/path/ctrls.sock
  ControlsSocketOwner   root root
  ControlsSocketACL     allow user root
  <IfModule mod_ctrls_admin.c>
    AdminControlsACLs all allow user root
  </IfModule>
</IfModule>
...

@Castaglia
Copy link
Member

OK, great. I'll work through the mod_ctrls code, with an eye toward newly forming connections/sessions, see if anything turns up.

In the mean time, for the next time you see such unkillable processes, in addition to the strace -f -p -t -T command, I'm thinking it might (or might not) be useful to configure trace logging for timers, e.g.:

TraceLog /path/to/proftpd/trace.log
Trace timer:30

and then search the generated TraceLog for the PID of the unkillable process. That should confirm that it is indeed the ControlsInterval timer which keeps firing. In theory, it should not be that particular timer, since mod_ctrls is supposed to remove that timer on TCP connect (see here); that timer is meant only for the listening daemon process, not the forked session process.

@Castaglia
Copy link
Member

Castaglia commented Apr 23, 2022

Hmm; it turns out that there is another way that we might see these 5s SIGALRM signals, without any explicitly configured timeouts matching that 5s: see https://github.com/proftpd/proftpd/blob/master/src/timers.c#L191.

The Timer API is not entirely accurate, and in this case, it will default to checking any registered timer intervals/callbacks every 5s anyway. So that's possibly what's happening with this unkillable process.

Unfortunately, that doesn't provide any clues yet about how the process got into this state. But the fact that it's "unkillable" does suggest that timers/alarms are blocked for that process somehow -- I'll be adding trace logging, for the timer trace channel, for when alarms are being blocked/unblocked, to aid in this diagnosis.

@Castaglia
Copy link
Member

@jpfinger for your stuck PID 28387, in the strace output, is there anything other than the SIGALRM? That is, in addition to this 5s SIGLARM signal being delivered, I would also expect to see some other system calls, such as read(2) on a socket or somesuch...

Castaglia added a commit that referenced this issue Apr 24, 2022
…y/tracing, and add trace logging of alarm blocking.
Castaglia added a commit that referenced this issue Apr 24, 2022
…y/tracing, and add trace logging of alarm blocking.
Castaglia added a commit that referenced this issue Apr 25, 2022
…y/tracing, and add trace logging of alarm blocking.
Castaglia added a commit that referenced this issue Apr 25, 2022
…y/tracing, and add trace logging of alarm blocking.
@jpfinger
Copy link

@Castaglia I ran the strace against it for 30 minutes this morning and all I got were the quoted SIGALRM lines.

Castaglia added a commit that referenced this issue Apr 27, 2022
…y/tracing, and add trace logging of alarm blocking.
Castaglia added a commit that referenced this issue Apr 27, 2022
…y/tracing, and add trace logging of alarm blocking.
@manuelm
Copy link
Contributor

manuelm commented Nov 4, 2022

no difference :/ compiled with -O0 but p->tag is already NULL. I've also disabled two of my own modules which both execute code prior to a successful login.

@Castaglia
Copy link
Member

Castaglia commented Nov 5, 2022

@manuelm No worries. At least my PR didn't cause regressions; I'll merge it anyway.

Now this means I'll need to work on reproducing this behavior locally even more. So I have some more questions; I know that you may not have answers, but as many details as you can provide will help:

  • How often do these unkillable processes occur? A handful a day (out of how many good connections), or more?

We know (from other reports in this issue) that these "stuck" processes show as "authenticating" in ftptop, which means that they did not complete the USER/PASS authentication. So, as far as you can tell:

  • Do these stuck processes happen for simple TCP connections, without TLS handshakes?
  • Do these stuck processes happen for TLS handshakes, but no USER command?
  • Do these stuck processes happen for TLS handshakes with USER, but then are rejected by your e.g. mod_dovecot_anvil module during the authentication process?
  • Do these stuck processes happen for TLS handshakes with USER, during the handling of a PASS command?
  • Do these stuck processes happen only for clients in your xclient class (or for clients not in the xclient class)?
  • Do these stuck processes happen for clients which have sent multiple USER/PASS where the previous attempts failed (due to e.g. wrong password)?

Each of these presents/exercises slightly different code paths, and I'll work on each one of them, locally, to see if I have any luck. Thanks in advance for all the info you've already provided!

Castaglia added a commit that referenced this issue Nov 5, 2022
Issue #1384: One possible cause of the infinite loops reported in thi…
Castaglia added a commit that referenced this issue Nov 5, 2022
…ussed out one more use-after-free code path.
Castaglia added a commit that referenced this issue Nov 5, 2022
…ced that the Auth API creates `cmd_recs` whose `pool` and `tmp_pool` pointers point to the same pool, which is somewhat surprising.

So let's use separate pools for those.
Castaglia added a commit to Castaglia/proftpd-mod_vroot that referenced this issue Nov 5, 2022
…ould _unmount_ its FS, rather than unregistering the FS and _assuming_ the unregistered FS is that from `mod_vroot`.

There are interactions (such as failed authentications, or successful authentications where `mod_vroot` should not apply) where we may have been unregistering the system FS, leading to segfaults; see: proftpd/proftpd#1384
Castaglia added a commit that referenced this issue Nov 5, 2022
Issue #1384: While examining the reported stack trace closely, I noti…
@Castaglia
Copy link
Member

@manuelm I think I was able to recreate this issue -- or something very like it! I was able to trigger a segfault, under gdb, using mod_vroot and FTP, where the first USER/PASS attempt failed (I deliberately used a wrong password), and the next USER/PASS attempt succeeded. Or rather, the password was correct -- and that's when the segfault occurred. And it produced a stack trace very much like the one you reported:

2022-11-05 20:26:36,267 23d40e87b2f2 proftpd[10578] localhost (localhost[127.0.0.1]): dispatching PRE_CMD command 'PASS (hidden)' to mod_vroot
2022-11-05 20:26:36,267 23d40e87b2f2 proftpd[10578] localhost (localhost[127.0.0.1]): mod_vroot/0.9.10: vroot registered
2022-11-05 20:26:36,267 23d40e87b2f2 proftpd[10578] localhost (localhost[127.0.0.1]): dispatching PRE_CMD command 'PASS (hidden)' to mod_delay
2022-11-05 20:26:36,267 23d40e87b2f2 proftpd[10578] localhost (localhost[127.0.0.1]): dispatching PRE_CMD command 'PASS (hidden)' to mod_auth

Program received signal SIGSEGV, Segmentation fault.
0x00005555555cbc9d in pr_fsio_close (fh=0x555555973be8) at fsio.c:5109
5109	  pr_trace_msg(trace_channel, 8, "using %s close() for path '%s'", fs->fs_name,
(gdb) p fs
$1 = (pr_fs_t *) 0x0
(gdb) bt
#0  0x00005555555cbc9d in pr_fsio_close (fh=0x555555973be8) at fsio.c:5109
#1  0x0000555555618fef in af_endpwent () at mod_auth_file.c:873
#2  0x0000555555619bca in authfile_endpwent (cmd=0x5555559730f8)
    at mod_auth_file.c:1156
#3  0x00005555555b7f24 in pr_module_call (m=0x5555558cf1e0 <auth_file_module>, 
    func=0x555555619baa <authfile_endpwent>, cmd=0x5555559730f8)
    at modules.c:59
#4  0x00005555555bd19f in dispatch_auth (cmd=0x5555559730f8, 
    match=0x55555568da91 "endpwent", m=0x0) at auth.c:430
#5  0x00005555555bd3c5 in pr_auth_endpwent (p=0x5555559730b0) at auth.c:492
#6  0x0000555555622611 in auth_pre_pass (cmd=0x555555972008) at mod_auth.c:2494
#7  0x00005555555b7f24 in pr_module_call (m=0x5555558cf520 <auth_module>, 
    func=0x5555556225dd <auth_pre_pass>, cmd=0x555555972008) at modules.c:59
#8  0x0000555555578a7e in _dispatch (cmd=0x555555972008, cmd_type=1, 
    validate=0, match=0x5555559720a0 "PASS") at main.c:366
#9  0x0000555555579445 in pr_cmd_dispatch_phase (cmd=0x555555972008, phase=0, 
    flags=3) at main.c:678
#10 0x000055555557993c in pr_cmd_dispatch (cmd=0x555555972008) at main.c:801
#11 0x0000555555579dda in cmd_loop (server=0x555555923378, c=0x5555559508a8)
    at main.c:944
#12 0x000055555557b1a5 in fork_server (fd=1, l=0x555555950448, 
    no_fork=1 '\001') at main.c:1517
#13 0x000055555557ba00 in daemon_loop () at main.c:1766

I'm not 100% convinced that this is the exact same issue, but it looks similar. The fix for this segfault of mine is here: Castaglia/proftpd-mod_vroot#38

It could be the some of my latest commits to ProFTPD, in conjunction with the above mod_vroot change, are needed? Care to see if they change this behavior (hopefully for the better) in your environment?

Castaglia added a commit to Castaglia/proftpd-mod_vroot that referenced this issue Nov 5, 2022
…ould _unmount_ its FS, rather than unregistering the FS and _assuming_ the unregistered FS is that from `mod_vroot`.

There are interactions (such as failed authentications, or successful authentications where `mod_vroot` should not apply) where we may have been unregistering the system FS, leading to segfaults; see: proftpd/proftpd#1384
@manuelm
Copy link
Contributor

manuelm commented Nov 7, 2022

I'm not 100% convinced that this is the exact same issue, but it looks similar. The fix for this segfault of mine is here: Castaglia/proftpd-mod_vroot#38

Actually the behavior is identical on my system. A successful login following after an unsuccessful login runs into the known clear_pool-loop. However the mod_vroot patch does not fix the behavior on my systems. Furthermore enabling mod_wrap2 + the failed login + good login-test now results in a SIGSEGV.

0x0000563c45a50383 in pr_fsio_lseek (fh=0x563c4760bf48, offset=0, whence=0) at fsio.c:5340
5340      while (fs && fs->fs_next && !fs->lseek) {
(gdb) bt
#0  0x0000563c45a50383 in pr_fsio_lseek (fh=0x563c4760bf48, offset=0, whence=0) at fsio.c:5340
#1  0x0000563c45a8081b in af_setpwent (p=0x563c4760c420) at mod_auth_file.c:1003
#2  0x0000563c45a82671 in authfile_getpwnam (cmd=0x563c47607a08) at mod_auth_file.c:1172
#3  0x0000563c45a433ee in pr_module_call (m=0x563c45b86ea0 <auth_file_module>, func=0x563c45a82651 <authfile_getpwnam>, cmd=cmd@entry=0x563c47607a08)
    at modules.c:59
#4  0x0000563c45a461db in dispatch_auth (cmd=cmd@entry=0x563c47607a08, match=match@entry=0x563c45b26344 "getpwnam", m=m@entry=0x7ffe00b11d50) at auth.c:425
#5  0x0000563c45a499b2 in pr_auth_getpwnam (p=p@entry=0x563c476079c0, name=name@entry=0x7ffe00b11e40 "ud_3") at auth.c:626
#6  0x0000563c45a4e36e in pr_fs_interpolate (path=path@entry=0x563c475aee3d "~/ftp.allow", buf=buf@entry=0x7ffe00b12060 "", buflen=buflen@entry=4096)
    at fsio.c:2777
#7  0x0000563c45a4dcdb in pr_fs_interpolate (buflen=4096, buf=0x7ffe00b12060 "", path=0x563c475aee3d "~/ftp.allow") at fsio.c:2688
#8  pr_fs_resolve_partial (path=path@entry=0x563c475aee3d "~/ftp.allow", buf=buf@entry=0x7ffe00b170f0 "", buflen=buflen@entry=4096, op=op@entry=0)
    at fsio.c:2829
#9  0x0000563c45a291e3 in dir_realpath (p=p@entry=0x563c4760e460, path=path@entry=0x563c475aee3d "~/ftp.allow") at support.c:612
#10 0x0000563c45a9e126 in filetab_open_cb (parent_pool=<optimized out>, srcinfo=0x563c475aee3d "~/ftp.allow") at mod_wrap2_file.c:263
#11 0x0000563c45a9f38e in wrap2_open_table (name=0x563c475aee38 "file") at mod_wrap2.c:190
#12 0x0000563c45a9f5d1 in wrap2_allow_access (conn=conn@entry=0x7ffe00b182b0) at mod_wrap2.c:1223
#13 0x0000563c45aa01ac in wrap2_pre_pass (cmd=0x563c475ca978) at mod_wrap2.c:1903
#14 0x0000563c45a433ee in pr_module_call (m=0x563c45b88620 <wrap2_module>, func=0x563c45a9fd24 <wrap2_pre_pass>, cmd=cmd@entry=0x563c475ca978)
    at modules.c:59
#15 0x0000563c45a16e5b in _dispatch (cmd=cmd@entry=0x563c475ca978, cmd_type=cmd_type@entry=1, validate=validate@entry=0, match=0x563c475caa10 "PASS", 
    match@entry=0x0) at main.c:366
#16 0x0000563c45a19a6a in pr_cmd_dispatch_phase (cmd=cmd@entry=0x563c475ca978, phase=phase@entry=0, flags=flags@entry=3) at main.c:678
#17 0x0000563c45a19ca8 in pr_cmd_dispatch (cmd=0x563c475ca978) at main.c:801
#18 cmd_loop (server=<optimized out>, c=<optimized out>) at main.c:944
#19 0x0000563c45a17d87 in fork_server (fd=<optimized out>, l=<optimized out>, no_fork=<optimized out>) at main.c:1517
#20 0x0000563c45a18b1a in daemon_loop () at main.c:1766
#21 0x0000563c45a164c6 in standalone_main () at main.c:1967
#22 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at main.c:2792

With WrapEngine on + WrapTables file:~/ftp.allow file:~/ftp.deny. I had this enabled on my test-machine :-)

@Castaglia
Copy link
Member

@manuelm Hmm, OK, I'll include mod_wrap2 in my local setup, see if I can trigger the same thing there. For you, are you also using ProFTPD as of this commit as well? That change also helps clean up a potential pool issue in the Auth API, whose functions always appear in these loop stacktraces.

I will also point out -- if you want try to the same thing -- that in my local setup, I'm configuring/compiling ProFTPD with CFLAGS=-DPR_DEVEL_NO_POOL_FREELIST. By default, whenever ProFTPD is destroying a pool, it places the destroyed (and cleared) pool on a "freelist", so that future requests for pools can scan that freelist for an already-allocated pool of appropriate size. However, if there are use-after-free memory issues (and/or stale pool pointers) at stake, compiling with PR_DEVEL_NO_POOL_FREELIST defined tells ProFTPD to not use a freelist, and instead just to use the free(3) C library function. This makes any use-after-free/stale-pointer cases happen much sooner. And this, in turn, might lead to slightly different stacktraces -- and those differences might reveal more information about the causes here.

@manuelm
Copy link
Contributor

manuelm commented Nov 7, 2022

Good point. I'm now testing with the latest commit. Still seeing a different backtrace:

(gdb) bt
#0  0x00007f1ba58724c0 in ?? () from target:/usr/lib64/libc.so.6
#1  0x00007f1ba582a02f in raise () from target:/usr/lib64/libc.so.6
#2  0x00007f1ba5815478 in abort () from target:/usr/lib64/libc.so.6
#3  0x00007f1ba586745d in ?? () from target:/usr/lib64/libc.so.6
#4  0x00007f1ba587b4f1 in ?? () from target:/usr/lib64/libc.so.6
#5  0x00007f1ba587cf67 in ?? () from target:/usr/lib64/libc.so.6
#6  0x00007f1ba587f9e2 in free () from target:/usr/lib64/libc.so.6
#7  0x0000560b8f0308ab in pool_release_free_block_list () at pool.c:475
#8  destroy_pool (p=0x560b90540a30) at pool.c:621
#9  0x0000560b8f031033 in destroy_pool (p=<optimized out>) at pool.c:591
#10 0x0000560b8f064ee0 in pr_fsio_close (fh=0x560b90540a78) at fsio.c:5127
#11 0x0000560b8f09604d in af_endpwent () at mod_auth_file.c:873
#12 authfile_endpwent (cmd=<optimized out>) at mod_auth_file.c:1156
[...]
(gdb) p ((pool *)0x560b90540a30)->tag
$1 = 0x560b8f13f656 "pr_fsio_open() subpool"

@manuelm
Copy link
Contributor

manuelm commented Nov 9, 2022

Still seeing a different backtrace

Upon further investigation: It's the same backtrace. I didn't have any debug logs enabled, thus my process didn't print out the invalid pointer... and crashed a bit later. However, Castaglia/proftpd-mod_vroot#38 did not have any effect.

@manuelm
Copy link
Contributor

manuelm commented Nov 9, 2022

The following mod_vroot-patch (tested together with Castaglia/proftpd-mod_vroot#38) fixes the login issue for me:

--- a/mod_vroot.c 2022-11-09 13:33:07.340457699 +0100
+++ b/mod_vroot.c       2022-11-09 13:32:35.770692966 +0100
@@ -598,8 +598,9 @@

 MODRET vroot_post_pass_err(cmd_rec *cmd) {
   if (vroot_engine == TRUE) {
+    int is_sftp = ((cmd->cmd_class & CL_SFTP) || (cmd->cmd_class & CL_SSH));
     const void *hint;
-
+
     /* Look for any notes/hints attached to this command which might indicate
      * that it is not a real PASS command error, but rather a fake command
      * dispatched for e.g. logging/handling by other modules.  We pay attention
@@ -617,9 +618,9 @@
      */

 #if PROFTPD_VERSION_NUMBER < 0x0001030707
-    if (hint == NULL) {
+    if (!is_sftp || hint == NULL) {
 #else
-    if (hint != NULL) {
+    if (!is_sftp || hint != NULL) {
 #endif /* ProFTPD 1.3.7b or later */
       /* If not chrooted, unmount our vroot FS. */
       if (session.chroot_path == NULL) {

I have no idea how the mod_sftp.nonfatal-attempt note should exist in non-sftp sessions. So the current check looks broken to me. But maybe I'm missing something? Also I'm wondering how you haven't noticed this during your testing. So maybe I am missing something.

@Castaglia
Copy link
Member

I'm able to see the segfault you reported in mod_wrap2, yes. I'm still working on a fix. But I can say that this new segfault does at least occur later in the authentication cycle. I'm not sure if ftptop would show these sessions as "authenticated" yet, but at least it's progress...

Castaglia added a commit to Castaglia/proftpd-mod_vroot that referenced this issue Nov 9, 2022
…dereferences and use-after-frees, related to `mod_vroot`, for proftpd/proftpd#1384, I determined that these segfaults all occurred during the authentication process.

With this change, `mod_vroot` now registers its custom FS closer to the
(possible) `chroot(2)` system call in timing, and _only_ if authentication
succeeds.  Previously the module would use a PRE_CMD PASS handler to register
the custom FS, then POST_CMD/POST_CMD_ERR PASS handlers to unregister the
FS in case of issues.  This, in turn, means that the custom FS is active (and
used!) _during_ the authentication process, which appeared to be related
to the above memory/reference issues.

So I took a step back, and thought of the purpose of `mod_vroot` (to provide
a custom "virtual chroot" for a session), and changed the timing of the custom
FSIO registration to be closer to the point where it is actually needed/used.
Castaglia added a commit that referenced this issue Nov 9, 2022
…auth.authentication-code" event, just like the core engine does.
@Castaglia
Copy link
Member

OK, here's my next iteration at addressing this: Castaglia/proftpd-mod_vroot#39

Note that you will also need c30c4c6 present, so that mod_sftp generates the same event that mod_vroot (in the above PR) now requires, listens for.

Castaglia added a commit to Castaglia/proftpd-mod_vroot that referenced this issue Nov 9, 2022
…dereferences and use-after-frees, related to `mod_vroot`, for proftpd/proftpd#1384, I determined that these segfaults all occurred during the authentication process.

With this change, `mod_vroot` now registers its custom FS closer to the
(possible) `chroot(2)` system call in timing, and _only_ if authentication
succeeds.  Previously the module would use a PRE_CMD PASS handler to register
the custom FS, then POST_CMD/POST_CMD_ERR PASS handlers to unregister the
FS in case of issues.  This, in turn, means that the custom FS is active (and
used!) _during_ the authentication process, which appeared to be related
to the above memory/reference issues.

So I took a step back, and thought of the purpose of `mod_vroot` (to provide
a custom "virtual chroot" for a session), and changed the timing of the custom
FSIO registration to be closer to the point where it is actually needed/used.
@manuelm
Copy link
Contributor

manuelm commented Nov 9, 2022

OK, here's my next iteration at addressing this

First tests are looking good. I'll do some more and roll out the code on production servers tomorrow morning. So we can roll back in case something else breaks.

Castaglia added a commit that referenced this issue Nov 10, 2022
…vent for anonymous logins as well; the `mod_vroot` integration tests caught this, with the moving of `mod_vroot`'s custom FS registration to this event.
@Castaglia
Copy link
Member

@manuelm In looking at the CI-run mod_vroot regression test failures, I realized that the ProFTPD source code also needs 19ff2f5 -- without this, FTP anonymous logins that use mod_vroot won't work as expected. Not sure if you support such anonymous logins in your deployments.

@Castaglia Castaglia added the bug label Nov 11, 2022
Castaglia added a commit to Castaglia/proftpd-mod_vroot that referenced this issue Nov 11, 2022
…dereferences and use-after-frees, related to `mod_vroot`, for proftpd/proftpd#1384, I determined that these segfaults all occurred during the authentication process.

With this change, `mod_vroot` now registers its custom FS closer to the
(possible) `chroot(2)` system call in timing, and _only_ if authentication
succeeds.  Previously the module would use a PRE_CMD PASS handler to register
the custom FS, then POST_CMD/POST_CMD_ERR PASS handlers to unregister the
FS in case of issues.  This, in turn, means that the custom FS is active (and
used!) _during_ the authentication process, which appeared to be related
to the above memory/reference issues.

So I took a step back, and thought of the purpose of `mod_vroot` (to provide
a custom "virtual chroot" for a session), and changed the timing of the custom
FSIO registration to be closer to the point where it is actually needed/used.
@manuelm
Copy link
Contributor

manuelm commented Nov 14, 2022

@Castaglia looking good so far. No negative feedback from users and no looping processes.

@Castaglia
Copy link
Member

Excellent news @manuelm ! Thanks for all of your help in this!

I've published a new release of mod_vroot with the merged fix: https://github.com/Castaglia/proftpd-mod_vroot/releases/tag/v0.9.11

I'll wait for another week or so, to see if there's any additional fallout from this issue. After that, I'll close this ticket. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants