e2e-aws: stream bootkube and 'kubectl get all' to artifacts #2064

sttts · 2018-11-05T15:49:45Z

This PR should make issues during bootstrapping much easier to understand.

wking · 2018-11-05T15:59:19Z

ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml

+
+          # stream bootkube into artifact file
+          mkdir -p /tmp/artifacts/bootstrap
+          ssh -o "StrictHostKeyChecking=no" core@${ip} sudo journal -u bootkube 2>&1 > /tmp/artifacts/bootstrap/bootkube.log


This container is named stream-kubectl-get-all, but it is streaming journal -u bootkube? And it looks like your earlier container isstreaming bootkube too?

typo. Had planned two containers, but then noticed the amount of shared code and merged. will fix.

wking · 2018-11-05T16:01:52Z

ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml

+          # wait for the bootstrap node to show up with an IP
+          ip=""
+          while [ -z "${ip}" ]; do
+            ip=$(terraform state show -state=terraform.tfstate module.bootstrap.aws_instance.bootstrap | sed -n 's/^public_ip *= *//p')


@smarterclayton wanted this IP in the build logs (I think?), I'm just not clear if there's a way to get information there from a pod container. Any ideas?

wking · 2018-11-05T16:07:11Z

ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml

+          stream-kubectl-get-all ${ip} &
+        }
+
+        stream &


stream is backgrounding the long-running resource streamers internally, so we can probably drop & here.

wking · 2019-01-02T20:40:34Z

ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml

+        function stream-bootkube () {
+          ip=${1}
+          while true; do
+            ssh -o "StrictHostKeyChecking=no" core@${ip} sudo journal -u bootkube 2>&1 >> /tmp/artifacts/bootstrap/bootkube.log


nit: can we use the fully-qualified bootkube.service?

Also, this redirect isn't quite right, since you redirect stderr into stdout before adjusting stdout. For example:

$ (echo hi >&2) 2>&1 >>/tmp/bootkube.log hi

You should instead redirect stdout first, and then redirect stderr into stdout:

$ (echo hi >&2) >>/tmp/bootkube.log 2>&1 $ cat /tmp/bootkube.log hi

I moved the file redirection out of the loop. Makes it slightly easier to read.

wking · 2019-01-02T21:11:08Z

ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml

+          if [[ -f /tmp/shared/exit ]]; then
+            exit 0
+          fi
+          sleep 60 & wait


I don't understand the backgrounded sleep here.

$ (sleep 10 && echo 'done sleeping') & (wait && echo 'done waiting') done waiting # this shows up very quickly done sleeping # this is delayed by 10 seconds

Why have a non-blocking sleep? I'd expect sleep here with a trailing kill:

for i in $(seq 1 120); do if [[ -f /tmp/shared/exit ]]; then break fi sleep 60 done kill-streams

where kill-streams had internal wait calls (possibly using explicit PIDs).

the loop with the sleep is copied from the teardown script below.

the kill-streams is not necessary due to the TERM trap

openshift-ci-robot · 2019-01-03T11:41:18Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sttts
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: rajatchopra

If they are not already assigned, you can assign the PR to them by writing /assign @rajatchopra in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

ci-operator/templates/openshift/installer/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sttts · 2019-01-03T11:42:04Z

@wking addressed your comments.

Still wondering how to test the script. It's only blindly written right now. How do you do that usually?

wking · 2019-01-03T21:32:34Z

Still wondering how to test the script. It's only blindly written right now. How do you do that usually?

There are some docs here. Personally I've been using the "hope it works and file patches if it turns out to be broken" approach unless I'm doing something big 😊

wking · 2019-01-03T21:35:32Z

ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml

+          while true; do
+            ssh -o "StrictHostKeyChecking=no" core@${ip} sudo journalctl -u bootkube.service -f --no-tail 2>&1
+            echo "=================== journalctl terminated ==================="
+            date


Can we embed this in the termination marker? It feels like it might be associated with the next iteration's logs if it comes after a big banner. Something like:

echo "========== journalctl terminated $(date --iso=s --utc) =========="

would do it.

smarterclayton · 2019-01-03T23:55:46Z

ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml

+        cp /etc/openshift-installer/ssh-privatekey .ssh/id_rsa
+        cp /etc/openshift-installer/ssh-publickey .ssh/id_rsa.pub
+
+        function stream-bootkube () {


This is super complicated. Why isn't the installer fetching the bootkube logs if anything fails?

smarterclayton · 2019-01-03T23:56:46Z

This is way more complicated than I would like. If you can't debug a failure of bootstrap from the installer logs, I think we're doing something wrong. I don't really see the value in watch -w over possibly just dumping all the objects at the beginning of teardown.

wking · 2019-01-04T05:07:30Z

If you can't debug a failure of bootstrap from the installer logs, I think we're doing something wrong.

openshift/installer#967 is work towards the installer being able to collect bootkube.service logs on its own.

wking · 2019-01-04T05:09:36Z

I don't really see the value in watch -w over possibly just dumping all the objects at the beginning of teardown.

For successful runs, the bootstrap node will be gone by the time the teardown CI container starts going through it's log collection. But yeah, you could put something there to attempt to grab the bootstrap logs, and have it only succeed when bootstrapping hung.

wking · 2019-01-24T00:40:09Z

Obsoleted by #2633?

openshift-ci-robot · 2020-10-09T00:49:56Z

@sttts: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/prow-config	`99c4bec`	link	`/test prow-config`
ci/prow/step-registry-shellcheck	`99c4bec`	link	`/test step-registry-shellcheck`
ci/prow/app-ci-config	`99c4bec`	link	`/test app-ci-config`
ci/prow/ci-operator-config	`99c4bec`	link	`/test ci-operator-config`
ci/prow/release-controller-config	`99c4bec`	link	`/test release-controller-config`
ci/prow/step-registry-metadata	`99c4bec`	link	`/test step-registry-metadata`
ci/prow/ci-testgrid-allow-list	`99c4bec`	link	`/test ci-testgrid-allow-list`
ci/prow/yamllint	`99c4bec`	link	`/test yamllint`
ci/prow/boskos-config	`99c4bec`	link	`/test boskos-config`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-merge-robot · 2020-10-20T15:59:57Z

@sttts: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/release-config	`99c4bec`	link	`/test release-config`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

sttts · 2020-10-22T10:00:44Z

/close

openshift-ci-robot · 2020-10-22T10:01:07Z

@sttts: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 5, 2018

openshift-ci-robot requested review from abhinavdahiya and crawford November 5, 2018 15:49

sttts force-pushed the sttts-bootstrap-logs branch from 99cc41d to 557bca9 Compare November 5, 2018 15:53

wking reviewed Nov 5, 2018

View reviewed changes

sttts force-pushed the sttts-bootstrap-logs branch from 557bca9 to 2f2b226 Compare November 5, 2018 17:15

openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 5, 2018

sttts mentioned this pull request Jan 2, 2019

kube-apiserver restarting on bootstrap node openshift/cluster-kube-apiserver-operator#175

Closed

wking reviewed Jan 2, 2019

View reviewed changes

e2e-aws: stream bootkube output and 'kubectl get all' to artifacts

99c4bec

sttts force-pushed the sttts-bootstrap-logs branch from 2f2b226 to 99c4bec Compare January 3, 2019 11:41

wking reviewed Jan 3, 2019

View reviewed changes

smarterclayton reviewed Jan 3, 2019

View reviewed changes

openshift-ci-robot closed this Oct 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

e2e-aws: stream bootkube and 'kubectl get all' to artifacts #2064

e2e-aws: stream bootkube and 'kubectl get all' to artifacts #2064

sttts commented Nov 5, 2018

wking Nov 5, 2018

sttts Nov 5, 2018

sttts Nov 5, 2018

wking Nov 5, 2018

wking Nov 5, 2018

sttts Nov 5, 2018

wking Jan 2, 2019

sttts Jan 3, 2019

wking Jan 2, 2019

sttts Jan 3, 2019

sttts Jan 3, 2019

openshift-ci-robot commented Jan 3, 2019

sttts commented Jan 3, 2019

wking commented Jan 3, 2019

wking Jan 3, 2019

smarterclayton Jan 3, 2019

smarterclayton commented Jan 3, 2019

wking commented Jan 4, 2019

wking commented Jan 4, 2019

wking commented Jan 24, 2019 •

edited

Loading

openshift-ci-robot commented Oct 9, 2020

openshift-merge-robot commented Oct 20, 2020

sttts commented Oct 22, 2020

openshift-ci-robot commented Oct 22, 2020

e2e-aws: stream bootkube and 'kubectl get all' to artifacts #2064

e2e-aws: stream bootkube and 'kubectl get all' to artifacts #2064

Conversation

sttts commented Nov 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci-robot commented Jan 3, 2019

sttts commented Jan 3, 2019

wking commented Jan 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smarterclayton commented Jan 3, 2019

wking commented Jan 4, 2019

wking commented Jan 4, 2019

wking commented Jan 24, 2019 • edited Loading

openshift-ci-robot commented Oct 9, 2020

openshift-merge-robot commented Oct 20, 2020

sttts commented Oct 22, 2020

openshift-ci-robot commented Oct 22, 2020

wking commented Jan 24, 2019 •

edited

Loading