Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restarting a extension service restarts other extension services #8229

Closed
frezbo opened this issue Feb 1, 2024 · 0 comments · Fixed by #8246
Closed

Restarting a extension service restarts other extension services #8229

frezbo opened this issue Feb 1, 2024 · 0 comments · Fixed by #8246
Assignees

Comments

@frezbo
Copy link
Member

frezbo commented Feb 1, 2024

Bug Report

restarting a single extension service cause the other services to exit without much information and get started again

Logs

❯ talosctl -n 10.5.0.3 get extensions
NODE       NAMESPACE   TYPE              ID   VERSION   NAME                  VERSION
10.5.0.3   runtime     ExtensionStatus   0    1         nut-client            2.8.1-v1.6.3
10.5.0.3   runtime     ExtensionStatus   1    1         hello-world-service   v1.6.3
10.5.0.3   runtime     ExtensionStatus   2    1         iscsi-tools           v0.1.4
10.5.0.3   runtime     ExtensionStatus   3    1         schematic             93430e9a2cf181a0f55fc18cf8b2379ab93b2a797b89f91dcc630451cb898232

❯ talosctl -n 10.5.0.3 services
NODE       SERVICE           STATE     HEALTH   LAST CHANGE   LAST EVENT
10.5.0.3   apid              Running   OK       1m11s ago     Health check successful
10.5.0.3   containerd        Running   OK       1m15s ago     Health check successful
10.5.0.3   cri               Running   OK       1m12s ago     Health check successful
10.5.0.3   dashboard         Running   ?        1m14s ago     Process Process(["/sbin/dashboard"]) started with PID 1470
10.5.0.3   ext-hello-world   Running   ?        1m14s ago     Started task ext-hello-world (PID 1448) for container ext-hello-world
10.5.0.3   ext-iscsid        Running   ?        1m11s ago     Started task ext-iscsid (PID 1746) for container ext-iscsid
10.5.0.3   ext-nut-client    Running   ?        1m12s ago     Started task ext-nut-client (PID 1687) for container ext-nut-client
10.5.0.3   ext-tgtd          Running   ?        1m12s ago     Started task ext-tgtd (PID 1680) for container ext-tgtd
10.5.0.3   kubelet           Running   OK       1m10s ago     Health check successful
10.5.0.3   machined          Running   OK       1m20s ago     Health check successful
10.5.0.3   udevd             Running   OK       1m19s ago     Health check successful

❯ talosctl -n 10.5.0.3 services ext-hello-world restart
NODE       RESPONSE
10.5.0.3   Service "ext-hello-world" restarted

❯ talosctl -n 10.5.0.3 services
NODE       SERVICE           STATE     HEALTH   LAST CHANGE   LAST EVENT
10.5.0.3   apid              Running   OK       1m36s ago     Health check successful
10.5.0.3   containerd        Running   OK       1m40s ago     Health check successful
10.5.0.3   cri               Running   OK       1m37s ago     Health check successful
10.5.0.3   dashboard         Running   ?        1m39s ago     Process Process(["/sbin/dashboard"]) started with PID 1470
10.5.0.3   ext-hello-world   Running   ?        9s ago        Started task ext-hello-world (PID 2749) for container ext-hello-world
10.5.0.3   ext-iscsid        Running   ?        4s ago        Started task ext-iscsid (PID 2847) for container ext-iscsid
10.5.0.3   ext-nut-client    Running   ?        4s ago        Started task ext-nut-client (PID 2846) for container ext-nut-client
10.5.0.3   ext-tgtd          Running   ?        1m36s ago     Started task ext-tgtd (PID 1680) for container ext-tgtd
10.5.0.3   kubelet           Running   OK       1m35s ago     Health check successful
10.5.0.3   machined          Running   OK       1m45s ago     Health check successful
10.5.0.3   udevd             Running   OK       1m44s ago     Health check successful

this is a talos cluster on v1.6.3 with the following extensions:

customization:
    systemExtensions:
        officialExtensions:
            - siderolabs/nut-client
            - siderolabs/hello-world-service
            - siderolabs/iscsi-tools

Dmesg

[   14.040718] [talos] boot sequence: done: 3.248686188s
[   14.042419] [talos] machine is running and ready {"component": "controller-runtime", "controller": "runtime.MachineStatusController"}







[   99.465601] [talos] service[ext-hello-world](Stopping): Sending SIGTERM to task ext-hello-world (PID 1448, container ext-hello-world)
[   99.500051] [talos] service[ext-nut-client](Waiting): Error running Containerd(ext-nut-client), going to restart forever: task "ext-nut-client" failed: exit code 1
[   99.506680] [talos] service[ext-hello-world](Finished): Service finished successfully
[   99.507725] [talos] service[ext-iscsid](Waiting): Runner Containerd(ext-iscsid) exited without error, going to restart it
[   99.509073] [talos] service[ext-hello-world](Waiting): Waiting for service "containerd" to be "up", network
[   99.510391] [talos] service[ext-hello-world](Preparing): Running pre state
[   99.511719] [talos] service[ext-hello-world](Preparing): Creating service runner
[   99.601409] [talos] service[ext-hello-world](Running): Started task ext-hello-world (PID 2749) for container ext-hello-world
[  104.613904] [talos] service[ext-nut-client](Running): Started task ext-nut-client (PID 2846) for container ext-nut-client
[  104.641427] [talos] service[ext-iscsid](Running): Started task ext-iscsid (PID 2847) for container ext-iscsid

Restarting ext-hello-world causes the other two extension services to exit and start again.

Looking at logs of ext-nut-client seems it's getting a signal 15

10.5.0.3: UPS [${upsmonHost}]: connect failed: Connection failure: Connection refused
10.5.0.3: Signal 15: exiting
10.5.0.3: upsmon parent: read
10.5.0.3: Network UPS Tools upsmon 2.8.1
@smira smira self-assigned this Feb 2, 2024
smira added a commit to smira/talos that referenced this issue Feb 2, 2024
Fixes siderolabs#8229

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
smira added a commit to smira/talos that referenced this issue Feb 2, 2024
Fixes siderolabs#8229

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
smira added a commit to smira/talos that referenced this issue Feb 21, 2024
Fixes siderolabs#8229

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit ddbabc7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants