Bug 1888192: add rsync & log re-tries #631

soltysh · 2020-10-30T16:29:55Z

There are 2 main issues with oc adm must-gather that this PR addresses:

On the env, the logs streaming from gather pod ends after ~10min, when the gather runs longer and on that cluster I was given access it was taking ~30mins, you'll only get info about the initial 10mins. The time-out is coming from server, so we just need to ensure we re-try streaming logs after that time-out if the pod is still running.
Current copy logic is missing compression.
Current sync logic is copying the artifacts only once, if it fails it doesn't re-try.

openshift-merge-robot · 2020-11-05T20:08:44Z

@soltysh: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws-upgrade	`a00be68`	link	`/test e2e-aws-upgrade`
ci/prow/e2e-aws	`a00be68`	link	`/test e2e-aws`
ci/prow/e2e-agnostic-cmd	`a00be68`	link	`/test e2e-agnostic-cmd`
ci/prow/e2e-aws-serial	`a00be68`	link	`/test e2e-aws-serial`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot · 2021-01-18T20:09:35Z

@soltysh: This pull request references Bugzilla bug 1888192, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.7.0) matches configured target release for branch (4.7.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1888192: add rsync & log re-tries

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

soltysh · 2021-01-19T09:39:22Z

/retest

openshift-ci-robot · 2021-01-19T09:43:14Z

@soltysh: This pull request references Bugzilla bug 1888192, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.7.0) matches configured target release for branch (4.7.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1888192: add rsync & log re-tries

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2021-01-19T09:44:44Z

@soltysh: This pull request references Bugzilla bug 1888192, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.7.0) matches configured target release for branch (4.7.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1888192: add rsync & log re-tries

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

soltysh · 2021-01-19T19:06:15Z

/retest

sallyom

suggestion to move the isGatherDone b4 the lost logs message, other than that lgtm.

sallyom · 2021-01-21T21:29:34Z

pkg/cli/admin/mustgather/mustgather.go

+		// to ensure we don't print all of history set since to past 2 seconds
+		opts.Options.(*corev1.PodLogOptions).SinceSeconds = &since2s
+
+		if done, _ := o.isGatherDone(pod); done {


move this above the klog "lost logs, re-trying", so that only prints if !isGatherDone -
(see easily w/ oc adm must-gather --dest-dir=/tmp/must-gather -v=4 -- date)

Good point, updated.

sallyom · 2021-01-25T14:42:02Z

/lgtm

openshift-bot · 2021-01-25T15:33:03Z

/retest