Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.5.8 +seccomp halts memcached #399

Closed
SjonHortensius opened this issue Jun 20, 2018 · 7 comments
Closed

1.5.8 +seccomp halts memcached #399

SjonHortensius opened this issue Jun 20, 2018 · 7 comments

Comments

@SjonHortensius
Copy link
Contributor

SjonHortensius commented Jun 20, 2018

I just installed 1.5.8 (on x86_64) with seccomp enabled but it hangs (sometimes after processing a few calls) without any relevant error. ps lists the process as defunct. This seems seccomp related - but since there is no relevant output from either -vvv or gdb I'm not sure. This only fails on a single x86-64 machine, on others it seems to work fine

type=1326 audit(1529484161.034:9): auid=0 uid=999 gid=999 ses=1 pid=3140 comm="memcached" exe="/usr/bin/memcached" sig=31 arch=c000003e syscall=228 compat=0 ip=0x7ffff7ffab12 code=0x0

I also noticed https://forum.manjaro.org/t/memcached-not-starting-systemctl-reporting-it-is/49038, also on Arch. If I disable seccomp, everything works fine. Can I debug this further?

Shouldn't seccomp fails be logged somewhere instead of silently pausing/killing/hanging memcached ?

Both strace and the above audit line point to sys_clock_gettime being the culprit by the way:

Here it works fine:

# sudo -u memcached strace -ftte clock_gettime,seccomp /usr/bin/memcached -o modern
strace: Process 15542 attached
strace: Process 15543 attached
strace: Process 15544 attached
strace: Process 15545 attached
strace: Process 15546 attached
[pid 15544] 13:44:46.065258 seccomp(SECCOMP_SET_MODE_STRICT, 1, NULL) = -1 EINVAL (Invalid argument)
[pid 15544] 13:44:46.065329 seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=44, filter=0x7f09200172b0}) = 0
[pid 15543] 13:44:46.065610 seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=44, filter=0x7f09280172b0}) = 0
[pid 15546] 13:44:46.065768 seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=44, filter=0x7f09140172b0}) = 0
[pid 15545] 13:44:46.066148 seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=44, filter=0x7f090c0172b0}) = 0
strace: Process 15547 attached
strace: Process 15548 attached
strace: Process 15549 attached
strace: Process 15550 attached
[pid 15541] 13:44:46.069260 seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=28, filter=0x55cc8a42cf00}) = 0

here it hangs, and cannot be terminated (needs kill -9 from other console):

# sudo -u memcached strace -ftte clock_gettime,seccomp /tmp/memcached -o modern -p 11212
strace: Process 8475 attached
strace: Process 8476 attached
strace: Process 8479 attached
[pid  8479] 13:45:10.212815 seccomp(SECCOMP_SET_MODE_STRICT, 1, NULL) = -1 EINVAL (Invalid argument)
[pid  8479] 13:45:10.213131 seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=44, filter=0x7f7c700172b0}strace: Process 8478 attached
) = 0
strace: Process 8477 attached
[pid  8477] 13:45:10.214477 seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=44, filter=0x7f7c6c0172b0}) = 0
[pid  8476] 13:45:10.214943 seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=44, filter=0x7f7c7c0172b0}) = 0
[pid  8478] 13:45:10.215283 seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=44, filter=0x7f7c680172b0}) = 0
strace: Process 8481 attached
strace: Process 8482 attached
strace: Process 8483 attached
strace: Process 8485 attached
[pid  8469] 13:45:10.218777 clock_gettime(CLOCK_MONOTONIC, {tv_sec=11350, tv_nsec=615287801}) = 0
[pid  8469] 13:45:10.219056 clock_gettime(CLOCK_MONOTONIC, {tv_sec=11350, tv_nsec=615405401}) = 0
[pid  8469] 13:45:10.222289 seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=28, filter=0x56203c839f00}) = 0
[pid  8469] 13:45:11.219735 clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>) = ?

Can anyone explain the difference between these two machines ?

@SjonHortensius SjonHortensius changed the title 1.5.8 +seccomp crashes memcached 1.5.8 +seccomp halts memcached Jun 20, 2018
SjonHortensius added a commit to SjonHortensius/memcached that referenced this issue Jun 20, 2018
@dormando
Copy link
Member

in the HACKING file there's a description of how to debug seccomp stuff.

what're the OS/kernel/libc versions between the two machines? recently it seems like a lot of new restriction points are being added. :/

poking @viraptor as well

@SjonHortensius
Copy link
Contributor Author

Security always reduces user-friendliness, but that shouldn't be a reason not to add it! I think its great @viraptor added seccomp to memcached and the Archlinux developers enabled it.

I'm pretty sure the related PR fixes this, can you review it?

As for the details, both machines run the latest Archlinux packages (currently glibc 2.27-3) on top of a vps running 4.14.12. I expect something on the actual hosts to be the difference here, unfortunately I don't have any details on those

@dormando
Copy link
Member

It's in the queue for review; I'm taking a break this week. I have no problem with seccomp; I do have a problem of it getting enabled without testing, and I worry that for every user like you 100 are silently miserable. :/ It's been a bunch of changes to a bunch of different platforms, it's not maintainable in this form at all.

I think viraptor mentioned a better approach to implementing it, but I'm not too familiar with how it's supposed to be used yet.

@viraptor
Copy link
Contributor

viraptor commented Jul 1, 2018

I'm just testing the fix. It looks like a reasonable approach. Thank you @SjonHortensius !
Strangely, testing on a new manjaro docker image (source from jonathonf/manjaro:latest , then pacman -Syu), I don't get any failures during the tests, which is what I'd expect to see.
(testing with 4.16 as the kernel and glibc-2.27-3)

@viraptor
Copy link
Contributor

viraptor commented Jul 1, 2018

I think the check for CLOCK_MONOTONIC can be dropped though. As long as HAVE_CLOCK_GETTIME is set, it should be fine to allow it.

@SjonHortensius
Copy link
Contributor Author

SjonHortensius commented Jul 1, 2018

@viraptor thanks. I think the reason the testcase doesn't fail is related to the fact that some machines (like the first strace in this bugreport) don't actually call clock_gettime while running (even though they use the same binary/kernel/architecture), but my knowledge on this is limited and I am unsure what could cause this.

I've copied the IFDEF from the actual implementation in the code here

@ChazyTheBest
Copy link

@SjonHortensius That post on Manjaro is actually for ARM (sorry I forgot to mention it, but looks like it's not relevant now as the problem is not architecture specific). I've also experienced this same problem #384 but only on ARM, on my dev machine using Manjaro x86 I haven't encountered this problem yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants