Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signalfd support #139

Open
ianlewis opened this issue Mar 14, 2019 · 21 comments
Open

Signalfd support #139

ianlewis opened this issue Mar 14, 2019 · 21 comments

Comments

@ianlewis
Copy link
Contributor

@ianlewis ianlewis commented Mar 14, 2019

Support for:

  • signalfd
  • signalfd4
@zhangningdlut
Copy link
Contributor

@zhangningdlut zhangningdlut commented Apr 2, 2019

How is this going ?
We also need signalfd.

@fvoznika
Copy link
Member

@fvoznika fvoznika commented Apr 2, 2019

There is no one currently working on this. Is this something you could work on?

@zhangningdlut
Copy link
Contributor

@zhangningdlut zhangningdlut commented Apr 12, 2019

There is no one currently working on this. Is this something you could work on?

We (I am from antfin) have requirements for signalfd, but we are not starting working on this neither.
If you are intersiting in this, we can discuss the plan.

@amscanne
Copy link
Collaborator

@amscanne amscanne commented Apr 17, 2019

I took a stab at this here:
https://gvisor-review.googlesource.com/c/gvisor/+/16802

It's relatively straight-forward, but it first needs some changes to EventRegister / EventUnregister (plumbing through context).

@prattmic
Copy link
Member

@prattmic prattmic commented Jul 15, 2019

https://twitter.com/ptinsley/status/1150411814246715392?s=19

s6-overlay fails to start with:

s6-svscan: fatal: unable to selfpipe_init: Function not implemented

From https://github.com/djbtao/skalibs/blob/c2ed75c9838767af60a05451b6d216331c1dbccf/src/libstddjb/selfpipe_init.c, I believe it needs signalfd.

@sadok-f
Copy link

@sadok-f sadok-f commented Aug 28, 2019

I have the same error as @prattmic when I deploy a container uses s6-overlay on Cloud Run,
I hope it gets implemented soon.
+1

@wmuizelaar
Copy link

@wmuizelaar wmuizelaar commented Sep 29, 2019

Also ran into this issue when trying to deploy a container using s6-overlay on Cloud Run.

@amscanne
Copy link
Collaborator

@amscanne amscanne commented Oct 2, 2019

Signalfd support has been merged in c98e7f0. This will roll-out to Cloud Run at some point in the near future (depends on their release pipeline).

@amscanne
Copy link
Collaborator

@amscanne amscanne commented Oct 2, 2019

I'll leave this open for now because the support in c98e7f0 diverged from the core Linux semantics in a couple of ways, though there will be no effect on how signalfds are used in libstddjb.

@wmuizelaar
Copy link

@wmuizelaar wmuizelaar commented Oct 4, 2019

I tested some more with this locally, and while containers with s6-overlay now succesfully can start (yay!), the process that supervises services is not working properly. Normally, s6-supervise would start i.e. nginx, and supervise that process. With the nightly build from last night, s6-supervise gets started, but the forked nginx-process doesn't. Based on the output of the debug-logfiles, I assume that (still) has to do with signalfd, but I'm not 100% sure.

I created a small docker container I used to reproduce this issue without all the other stuff I initially had installed in the container. The Dockerfile + context can be found here:
https://gist.github.com/wmuizelaar/940b56da3d973f40669d82cbbdf6624a

All the debug-logfiles from a run of this container are attached.
runsc-s6-overlay-reproduce.tar.gz

@ahmetb
Copy link
Member

@ahmetb ahmetb commented Oct 4, 2019

@wmuizelaar I don't have a local gvisor setup but can you try this repo, which is a python based web server. It might be a more minimal repro, possibly. https://github.com/ahmetb/multi-process-container

@wmuizelaar
Copy link

@wmuizelaar wmuizelaar commented Oct 4, 2019

Sure! The result looks exactly the same. Log-output from the container:

docker run --runtime=runsc --rm -i -t --name=test test-image
[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] done.
[services.d] starting services
[services.d] done.

When running 'regular' docker, it gives the starting-statements as well, which don't appear with gvisor:

[services.d] starting services
starting service1
[services.d] done.
starting service2
Serving HTTP on 0.0.0.0 port 8080 (http://0.0.0.0:8080/) ...

Also, the output of pstree -without- gvisor:

s6-svscan-+-s6-supervise
          |-s6-supervise---sleep
          `-s6-supervise---python

And now -with- gvisor:

s6-svscan---3*[s6-supervise]

Logfiles again:
python-reproduce.tar.gz

gvisor-bot added a commit that referenced this issue Oct 4, 2019
The signalfd descriptors otherwise always show as available. This can lead
programs to spin, assuming they are looking to see what signals are pending.

Updates #139

PiperOrigin-RevId: 272949671
@amscanne
Copy link
Collaborator

@amscanne amscanne commented Oct 4, 2019

Thanks! These repro cases were super useful in understanding the problem. I have a pull request (#972) to fix the bug and have validated that both examples work as expected with that in. (The pull request includes a test for the specific issue.)

@amscanne
Copy link
Collaborator

@amscanne amscanne commented Oct 4, 2019

It would be great if you could also validate that the container works as expected with this change.

@wmuizelaar
Copy link

@wmuizelaar wmuizelaar commented Oct 4, 2019

Wow, that's a quick response! I can confirm that with #972 everything works as it should be. Thanks a lot!

gvisor-bot added a commit that referenced this issue Oct 10, 2019
The signalfd descriptors otherwise always show as available. This can lead
programs to spin, assuming they are looking to see what signals are pending.

Updates #139

PiperOrigin-RevId: 272949671
gvisor-bot added a commit that referenced this issue Oct 10, 2019
The signalfd descriptors otherwise always show as available. This can lead
programs to spin, assuming they are looking to see what signals are pending.

Updates #139

PiperOrigin-RevId: 272949671
gvisor-bot added a commit that referenced this issue Oct 10, 2019
The signalfd descriptors otherwise always show as available. This can lead
programs to spin, assuming they are looking to see what signals are pending.

Updates #139

PiperOrigin-RevId: 274017890
@erulabs
Copy link

@erulabs erulabs commented Oct 28, 2019

Hey @wmuizelaar - is there any chance you can share how you got s6-overlay running in gVisor? With gVisor I run into a mkfifo operation not permitted issue... It's not really within the scope of this issue, but any hints would be awesome :)

@wmuizelaar
Copy link

@wmuizelaar wmuizelaar commented Oct 29, 2019

I remember seeing that one, I believe it was fixed by using the latest version of s6-overlay icm the latest gvisor nightly. In my Dockerfile there is just a curl -L -s https://github.com/just-containers/s6-overlay/releases/download/v1.22.1.0/s6-overlay-amd64.tar.gz \ | tar xvzf - -C / to get this latest version.

@amscanne
Copy link
Collaborator

@amscanne amscanne commented Oct 29, 2019

@erulabs Do you have a specific version which is not working that I can test?

I suspect this is a result of the execution environment / configuration and not strictly a missing feature. I believe that mkfifo is only supported on an in-sandbox tmpfs, i.e. the gofer will not create a host-based named pipe.

There's work to ensure that EmptyDirs are automatically turned into sandbox-internal tmpfs mounts for Kubernetes (e.g. fc746ef is related). @fvoznika can comment further. You may be able to support those by specifying appropriate EmptyDirs for the application.

@erulabs
Copy link

@erulabs erulabs commented Oct 29, 2019

@wmuizelaar hrm - thanks for the reply :) @amscanne I'm using nightly gVisor (runsc --version returns runsc version release-20190806.1-329-g1c480abc39b9) and s6-overlay 1.22.0 and 1.22.1 and get this issue with both versions of s6:

s6-mkfifo: fatal: unable to mkfifo /var/run/s6/services/s6-fdholderd/supervise/control: Operation not permitted

I've tried mounting an EmptyDir after looking at the tmpfs stuff at /var/run and get the same issue - Here is a Kubernetes example: https://kubesail.com/template/erulabs/sonarr/1 (Running that on KubeSail reproduces the error - KubeSail uses gVisor under the hood 💃)

Let me know if that's helpful - I'll keep digging on my side. Thanks!

@amscanne
Copy link
Collaborator

@amscanne amscanne commented Oct 29, 2019

I think the EmptyDir stuff may still be in flight. Since it's unrelated to signfalfd, I've forked off into a separate issue (#1102) where we can investigate. Thanks!

@wmuizelaar
Copy link

@wmuizelaar wmuizelaar commented Nov 19, 2019

All the basics of s6-overlay now work, but now I'm trying to do some more advanced things that fail, which might also be related to signalfd.

What I'm trying to do is use s6-notifyoncheck to let 1 service be started after a previous one is deemed ready. This tool then runs a check-script, and uses a file descriptor to signal the output to. See https://skarnet.org/software/s6/s6-notifyoncheck.html and https://wiki.gentoo.org/wiki/S6#s6-notifyoncheck for details on how this should work. The part where you specific a specific file descriptor-number that should be used for this process makes me assume we're hitting a corner-case in signalfd-support.

When running without gvisor, I see the check executed every second (like it should), with gvisor I only see the first check-attempt being made, but a second attempt is never made. Probably because the writing to the custom file descriptor hangs? I found it hard to troubleshoot where this is going wrong.

Full logs attached, it there's any hint on how to further troubleshoot/pinpoit, that would be welcome!

runsc-logs.tar.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants