New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use containerd restart manager to monitor services #3168

Open
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
5 participants
@kkroo
Copy link
Contributor

kkroo commented Aug 26, 2018

- What I did
Adds system service uptime monitor that restarts stopped system services periodically (default is 10 seconds)

- How I did it
Integrate containerd containerd/containerd#2318
Requires CONTAINERD_COMMIT v1.2.0

- How to verify it

(ns: sshd) linuxkit-08002771aa74:/var/log# ctr -n services.linuxkit c info sshd
{
    "ID": "sshd",
    "Labels": {
        "containerd.io/restart.logpath": "/var/log/sshd.restart.log",
        "containerd.io/restart.status": "running"
    },
...

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@GordonTheTurtle

This comment has been minimized.

Copy link
Collaborator

GordonTheTurtle commented Aug 26, 2018

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "restart_monitor" git@github.com:kkroo/linuxkit.git somewhere
$ cd somewhere
$ git commit --amend -s --no-edit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@justincormack

This comment has been minimized.

Copy link
Collaborator

justincormack commented Aug 28, 2018

Hi, we can't merge PRs unless they are signed-off-by as per instructions above.

I think restart policy should probably be configurable.

@ijc
Copy link
Collaborator

ijc left a comment

I'm not sure this is using the provided interfaces correctly, I can't seem to find any other examples (there seem to be none in the containerd tree, e.g. in ctr), but it certainly looks suspiciously racy to me as is.

Show resolved Hide resolved pkg/init/cmd/service/cmd.go Outdated
Show resolved Hide resolved pkg/init/cmd/service/cmd.go Outdated
Show resolved Hide resolved pkg/init/cmd/service/cmd.go Outdated
@ijc

This comment has been minimized.

Copy link
Collaborator

ijc commented Aug 30, 2018

containerd/containerd#2318 shows it using container.Update to set it after starting, also using WithNoRestart on teardown. However I'm not completely convinced this is intended for this use case. From containerd/containerd#2318 (comment):

The usecase is for system level containers, it will allow remote snapshotters, runtimes, and such to be kept running without any external deps on a system. Its not built for the kube or docker restart policies, just a simple internal function to keep low level system containers running, maybe something like kubelet in a container ;)

It seems like it is for containerd internal functionality not 3rd party use (as much as some linuxkit containers could be considered "system level containers").

@kkroo kkroo force-pushed the kkroo:restart_monitor branch from d4b515d to 5eb1260 Nov 27, 2018

@GordonTheTurtle GordonTheTurtle removed the dco/no label Nov 27, 2018

@kkroo

This comment has been minimized.

Copy link
Contributor

kkroo commented Nov 27, 2018

Addressing comments regarding race conditions. Regarding the use of this interface, it is a plugin with public interfaces and seems very well suited for improving the availability of linuxkit's system services.

@kkroo kkroo force-pushed the kkroo:restart_monitor branch from 5eb1260 to cdf964f Nov 27, 2018

@medic15

This comment has been minimized.

Copy link

medic15 commented Nov 27, 2018

This needs to be configurable for each service. Not all those who wander are lost. Not all services that stop are broken.

@kkroo

This comment has been minimized.

Copy link
Contributor

kkroo commented Nov 28, 2018

This needs to be configurable for each service. Not all those who wander are lost. Not all services that stop are broken.

Long running services are typically ones where any exit code would be anomalous. Yes, we need to continue to support one off execution. This change would obviously change that and restart each service that has exited. One off executables are still supported and probably better served under the onboot section of the runtime definition where there is reliable execution ordering for dependencies. This would provide reliable long running services to Linuxkit without the need to restart the whole system. Besides, I think it would be interesting to have a service that exits as part of its nominal behavior be monitored by this change to implement watchdogs or custom polling functionality.

@ijc

This comment has been minimized.

Copy link
Collaborator

ijc commented Nov 28, 2018

@medic15 and @justincormack are correct, this needs to be configurable per service.

I still have concerns regarding this use of a plugin which is "The usecase is for system level containers" and "Its not built for the kube or docker restart policies, just a simple internal function to keep low level system containers running". Whether or not it happens to currently meet the use case here if we are using it outside its "support envelope" we have no guarantee that it will continue to be suitable in the future. I'd be happy to be overruled by the other maintainers on that though.

@justincormack

This comment has been minimized.

Copy link
Collaborator

justincormack commented Nov 28, 2018

At present a few of the services we run may terminate (eg rngd if no rng is supported) but this could be changed.

@medic15

This comment has been minimized.

Copy link

medic15 commented Nov 28, 2018

My use case involves populating a Redis in-memory database using data from several disparate sources. This can't be done in the onboot phase since my init service and Redis cannot be running at the same time.

@justincormack

This comment has been minimized.

Copy link
Collaborator

justincormack commented Nov 28, 2018

@medic15 yes that seems a reasonable use case where it should be configurable.

Add containerd restart plugin over moby services
Signed-off-by: Omar Ramadan <omar.ramadan93@gmail.com>

@kkroo kkroo force-pushed the kkroo:restart_monitor branch from cdf964f to 1d1c8fc Nov 28, 2018

@kkroo

This comment has been minimized.

Copy link
Contributor

kkroo commented Nov 28, 2018

@medic15 @justincormack I still don't understand the use case and why such a one-off can't be added to the end of the onboot with some bash script that would wait for redis to come up. Hopefully you can help clarify.

In any case, I've added a boolean flag noRestart to the runtime configuration section than can disable the restart. PTAL!

@justincormack

This comment has been minimized.

Copy link
Collaborator

justincormack commented Nov 28, 2018

Services are not started until all onboot containers have exited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment