Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: protokube restart always and spawn new containers #7927

Closed
wants to merge 1 commit into from

Conversation

MqllR
Copy link

@MqllR MqllR commented Nov 14, 2019

The issue is described here #7928

TL;DR
The systemd service for protokube enter in an infinity loop and flood new containers.

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Nov 14, 2019
@k8s-ci-robot
Copy link
Contributor

Welcome @MqllR!

It looks like this is your first PR to kubernetes/kops 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kops has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @MqllR. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 14, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: MqllR
To complete the pull request process, please assign mikesplain
You can assign the PR to them by writing /assign @mikesplain in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@MqllR
Copy link
Author

MqllR commented Nov 14, 2019

CLA signed

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Nov 14, 2019
@k8s-ci-robot
Copy link
Contributor

@MqllR: The label(s) /remove-label needs-ok-to-test cannot be applied. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/remove-label needs-ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

1 similar comment
@k8s-ci-robot
Copy link
Contributor

@MqllR: The label(s) /remove-label needs-ok-to-test cannot be applied. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/remove-label needs-ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@MqllR MqllR force-pushed the fix_restart_always_protokube branch from c161d7c to f659b00 Compare November 15, 2019 08:29
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 15, 2019
@MqllR
Copy link
Author

MqllR commented Nov 15, 2019

/assign @mikesplain

@mikesplain
Copy link
Contributor

Hi @MqllR! This looks pretty good. Have you tested this? Were you able to replicate the crashing issue and confirm this protects things (as it should)?

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 16, 2019
@MqllR
Copy link
Author

MqllR commented Nov 18, 2019

Hi @mikesplain !

I reproduced the issue with sending a sig term to journald. This is how I reproduce:

# ps auxw | grep journald                                                                                                                                                                 
root      1441  0.1  0.8 131212 69580 ?        Ss   Oct25  37:03 /usr/lib/systemd/systemd-journald                                                                                                                 
# kill 1441

Then, the service protokube will start to flap and create new failing containers:

# docker ps -a | grep protokube
700d344a8928        protokube:1.13.2                                                    "/usr/bin/protokube …"   Less than a second ago   Exited (2) Less than a second ago                       jovial_clarke
c2d7092885d8        protokube:1.13.2                                                    "/usr/bin/protokube …"   3 seconds ago            Exited (2) 3 seconds ago                                quizzical_wiles
1d76cd819002        protokube:1.13.2                                                    "/usr/bin/protokube …"   6 seconds ago            Exited (2) 5 seconds ago                                frosty_bell
1d785a37a145        protokube:1.13.2                                                    "/usr/bin/protokube …"   9 seconds ago            Exited (2) 8 seconds ago                                eager_noyce
74c62eb400d1        protokube:1.13.2                                                    "/usr/bin/protokube …"   11 seconds ago           Exited (2) 11 seconds ago                               festive_tesla
21ef746f8178        protokube:1.13.2                                                    "/usr/bin/protokube …"   14 seconds ago           Exited (2) 14 seconds ago                               optimistic_swirle
s
e3d3948b3bad        protokube:1.13.2                                                    "/usr/bin/protokube …"   17 seconds ago           Exited (2) 16 seconds ago                               brave_allen
a5bff6a56c2f        protokube:1.13.2                                                    "/usr/bin/protokube …"   20 seconds ago           Exited (2) 19 seconds ago                               optimistic_benz
ee08c8c82dbf        protokube:1.13.2                                                    "/usr/bin/protokube …"   23 seconds ago           Exited (2) 22 seconds ago                               zealous_sammet
0392c89fda35        protokube:1.13.2                                                    "/usr/bin/protokube …"   25 seconds ago           Exited (2) 25 seconds ago                               competent_volhard
2ebc60f5c327        protokube:1.13.2                                                    "/usr/bin/protokube …"   28 seconds ago           Exited (2) 28 seconds ago                               vigilant_bohr
d1ecdc9c7341        protokube:1.13.2                                                    "/usr/bin/protokube …"   31 seconds ago           Exited (2) 30 seconds ago                               agitated_meninsky
7b9dd9aec196        protokube:1.13.2                                                    "/usr/bin/protokube …"   35 seconds ago           Exited (2) 33 seconds ago                               lucid_wescoff
88a9d5e3f3af        protokube:1.13.2                                                    "/usr/bin/protokube …"   3 weeks ago              Up 3 weeks                                              elegant_chebyshev

To test my configuration, I edited the systemd conf and tested it again:

# cat /lib/systemd/system/protokube.service 
[Unit]
Description=Kubernetes Protokube Service
Documentation=https://github.com/kubernetes/kops

[Service]
ExecStartPre=/bin/true
ExecStartPre=-/usr/bin/docker stop protokube
ExecStartPre=-/usr/bin/docker rm protokube
ExecStart=/usr/bin/docker run -v /:/rootfs/ -v /var/run/dbus:/var/run/dbus -v /run/systemd:/run/systemd --name=protokube ...
Restart=always
RestartSec=2s
StartLimitInterval=0

[Install]
WantedBy=multi-user.target

It fixes the issue as expected.

We must test with a full kops deployment but I'm fighting a bit to build the binary.

@maruina
Copy link
Contributor

maruina commented Nov 20, 2019

@MqllR can I ask you why the docker command starts with a - in the systemd unit file?

@MqllR
Copy link
Author

MqllR commented Nov 20, 2019

@MqllR can I ask you why the docker command starts with a - in the systemd unit file?

Hi @maruina,
I put a - because both stop and rm could failed at the first start (the container is not created). The dash record the event but don't consider the service as failed: https://www.freedesktop.org/software/systemd/man/systemd.service.html#ExecStart=

@maruina
Copy link
Contributor

maruina commented Nov 20, 2019

Thanks! I learned a new thing :)

@MqllR
Copy link
Author

MqllR commented Nov 25, 2019

Hi @mikesplain,

I've build nodeup and run it to verify the output. It's the expected result.

@k8s-ci-robot
Copy link
Contributor

@MqllR: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kops-verify-staticcheck f659b00 link /test pull-kops-verify-staticcheck

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 22, 2019
@k8s-ci-robot
Copy link
Contributor

@MqllR: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@MqllR
Copy link
Author

MqllR commented Dec 23, 2019

Since the merge of #7986, the change I proposed has been implemented inside... don't sure to see the relation between splitting the CRI/containerd and protokube systemd!

@MqllR MqllR closed this Dec 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants