add pod lifecycle intervals to separate pages #26908

deads2k · 2022-03-14T21:17:27Z

This may be the final edition. It sorts the pod lifecycle on the intervals chart by namespace.

There are still lots and lots of them, but you can see it ripple out by namespace if this works.

The openshift-tests artifact directory contains multiple intervals charts for pods in particular namespaces. This allows for determining which pods were running under what conditions. Keep in mind that readiness here reflects the container status, not the success or failure of an individual check.

openshift-ci · 2022-03-14T21:18:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [deads2k]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

deads2k · 2022-03-21T17:22:13Z

/retest

deads2k · 2022-03-21T19:45:02Z

/retest

DennisPeriquet · 2022-03-21T20:05:07Z

In this job from this PR,
I looked at https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/26908/pull-ci-openshift-origin-master-e2e-gcp/1503480460666212352/artifacts/e2e-gcp/openshift-e2e-test/artifacts/junit/e2e-intervals_everything_20220314-220614.json

searched for: "ns/openshift-etcd pod/revision-pruner-7-ci-op-33pz4yh4-2a78c-c8x64-master-1 uid/0f5d9416-976b-4825-953b-084df247fffc" and saw:

        {
            "level": "Info",
            "locator": "ns/openshift-etcd pod/revision-pruner-7-ci-op-33pz4yh4-2a78c-c8x64-master-1 uid/0f5d9416-976b-4825-953b-084df247fffc",
            "message": "constructed/true reason/Created ",
            "from": "2022-03-14T22:06:14Z",
            "to": "2022-03-14T22:06:14Z"
        },
        {
            "level": "Info",
            "locator": "ns/openshift-etcd pod/revision-pruner-7-ci-op-33pz4yh4-2a78c-c8x64-master-1 uid/0f5d9416-976b-4825-953b-084df247fffc",
            "message": "constructed/true reason/Scheduled node/ci-op-33pz4yh4-2a78c-c8x64-master-1",
            "from": "2022-03-14T22:06:14Z",
            "to": "2022-03-14T21:57:54Z"
        },
        {
            "level": "Info",
            "locator": "ns/openshift-etcd pod/revision-pruner-7-ci-op-33pz4yh4-2a78c-c8x64-master-1 uid/0f5d9416-976b-4825-953b-084df247fffc container/pruner",
            "message": "constructed/true reason/NotReady ",
            "from": "2022-03-14T22:06:14Z",
            "to": "2022-03-14T21:57:54Z"
        },

The from and to fields look backwards so they don't show up on the chart.

DennisPeriquet · 2022-03-21T21:04:28Z

I looked at this one (de49307):

curl -sk 'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/26908/pull-ci-openshift-origin-master-e2e-aws-serial/1505958001251454976/artifacts/e2e-aws-serial/openshift-e2e-test/artifacts/junit/e2e-intervals_operators_20220321-180600.json'|jq '.items[]|select (.from > .to)'

...
(there are 82 cases where from > to)

{
  "level": "Info",
  "locator": "ns/openshift-etcd pod/installer-4-ip-10-0-140-153.us-east-2.compute.internal uid/da3166b9-2b57-4191-82a2-292d8e3faa75",
  "message": "constructed/true reason/Scheduled node/ip-10-0-140-153.us-east-2.compute.internal",
  "from": "2022-03-21T18:06:01Z",
  "to": "2022-03-21T17:51:32Z"
}
{
  "level": "Info",
  "locator": "ns/openshift-etcd pod/installer-4-ip-10-0-140-153.us-east-2.compute.internal uid/da3166b9-2b57-4191-82a2-292d8e3faa75 container/installer",
  "message": "constructed/true reason/NotReady ",
  "from": "2022-03-21T18:06:01Z",
  "to": "2022-03-21T17:51:32Z"
}

DennisPeriquet · 2022-03-21T21:08:24Z

here's another one (using CI for de49307):

$ curl -sk 'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/26908/pull-ci-openshift-origin-master-e2e-aws-csi/1505935488840634368/artifacts/e2e-aws-csi/openshift-e2e-test/artifacts/junit/e2e-intervals_kube-apiserver_20220321-164339.json' |jq '.items[]|select (.from > .to)'

...
{
  "level": "Info",
  "locator": "ns/openshift-kube-apiserver pod/revision-pruner-7-ip-10-0-214-214.us-west-1.compute.internal uid/6a81964d-169c-47e0-a986-551429370ae9",
  "message": "constructed/true reason/Scheduled node/ip-10-0-214-214.us-west-1.compute.internal",
  "from": "2022-03-21T16:43:39Z",
  "to": "2022-03-21T16:43:14Z"
}
{
  "level": "Info",
  "locator": "ns/openshift-kube-apiserver pod/revision-pruner-7-ip-10-0-214-214.us-west-1.compute.internal uid/6a81964d-169c-47e0-a986-551429370ae9 container/pruner",
  "message": "constructed/true reason/NotReady ",
  "from": "2022-03-21T16:43:39Z",
  "to": "2022-03-21T16:43:14Z"
}

DennisPeriquet · 2022-03-22T14:07:15Z

e2echart/e2e-chart-template.html

+            if (m[2] == "ContainerStart") {
+                return [item.locator, ` (container lifecycle)`, "ContainerStart"];
+            }
+        }


should we add a case for ContainerExit?

should we add a case for ContainerExit?

container exit shouldn't have an interval because that's the absence of an interval, right?

DennisPeriquet · 2022-03-22T14:29:58Z

pkg/monitor/monitorapi/identification_pod.go

+			continue
+		}
+		annotationTokens := strings.Split(curr, "/")
+		annotations[annotationTokens[0]] = annotationTokens[1]


small savings but can we just do this:

if annotationTokens[0] == "reason" { return annotationTokens[1] }

though I'm not sure what to return if you didn't find "reason" in the tokens.

small savings but can we just do this:

if annotationTokens[0] == "reason" {
return annotationTokens[1]
}
though I'm not sure what to return if you didn't find "reason" in the tokens.

I'd like to keep the logic that produces the annotations because I suspect we will need it again.

deads2k · 2022-03-22T15:44:19Z

/retest

deads2k · 2022-03-22T15:44:58Z

/test all

deads2k · 2022-03-22T17:29:06Z

@DennisPeriquet ok, worked it out. Those are terminate before the test starts. I can fix the times, but they still won't show up on the chart.

DennisPeriquet · 2022-03-22T20:15:02Z

pkg/monitor/intervalcreation/pod.go

+	podCoordinates := monitorapi.PodFrom(inLocator)
+
+	// no hit for deleted, but if it's a RunOnce pod with all terminated containers, the logical "this pod is over"
+	// happens when the last container is terminated.


I can see how this comment is relevant on lines 172-173 but not sure how it's relevant here.

DennisPeriquet · 2022-03-22T21:29:26Z

pkg/monitor/intervalcreation/pod.go

+	if !ok {
+		return t.delegate.getEndTime(locator)
+	}
+	for i := len(containerEvents) - 1; i >= 0; i-- {


why did you choose to walk the containerEvents from the end like this vs. using for ... range like in line 141?

deads2k · 2022-03-22T21:33:12Z

/retest

DennisPeriquet · 2022-03-22T22:41:05Z

@DennisPeriquet ok, worked it out. Those are terminate before the test starts. I can fix the times, but they still won't show up on the chart.

Cool. I did a check on the junit/jsons on a jobrun from 122151c and didn't find any cases where the .from was after .to.

openshift-bot · 2022-03-31T08:48:58Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-03-31T13:44:30Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-03-31T14:37:51Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-03-31T16:20:28Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-03-31T16:33:29Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-03-31T16:46:28Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-03-31T18:30:29Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

deads2k · 2022-03-31T19:21:43Z

/test all

openshift-bot · 2022-03-31T19:35:28Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-03-31T20:53:28Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-04-01T08:27:05Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-04-01T10:28:06Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-04-01T12:55:06Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-04-01T14:24:03Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-04-01T17:00:04Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-04-01T18:44:08Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

DennisPeriquet · 2022-04-01T20:29:53Z

On the two required (and failing) jobs, I'm seeing a lot of:

ns/e2e-test-whereabouts-e2e-59pln pod/whereabouts-pod-fgf26 node/ci-op-vqn19vmz-2a78c-tgfb2-worker-c-drphp - never deleted - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_whereabouts-pod-fgf26_e2e-test-whereabouts-e2e-59pln_3a5636c4-7d4d-4e2c-ad3c-e35ba7bc63df_0(83b31776a78f0592ff291661771cfed6a1af6d718fc348e73b8f73f1fa549c62): error adding pod e2e-test-whereabouts-e2e-59pln_whereabouts-pod-fgf26 to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [e2e-test-whereabouts-e2e-59pln/whereabouts-pod-fgf26/3a5636c4-7d4d-4e2c-ad3c-e35ba7bc63df:whereaboutstestbridge]: error adding container to network "whereaboutstestbridge": Error at storage engine: Could not allocate IP in range: ip: 192.168.2.225 / - 192.168.2.230 / range: net.IPNet{IP:net.IP{0xc0, 0xa8, 0x2, 0xe0}, Mask:net.IPMask{0xff, 0xff, 0xff, 0xf8}}

DennisPeriquet · 2022-04-01T20:40:18Z

/lgtm cancel

deads2k · 2022-04-04T14:25:07Z

fix identified in a different PR by @DennisPeriquet relabelling

deads2k · 2022-04-04T14:48:54Z

/test all

deads2k · 2022-04-04T14:49:00Z

/skip

openshift-bot · 2022-04-04T15:33:16Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-04-04T16:57:18Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

deads2k · 2022-04-04T17:10:59Z

failed on "pods should successfully create sandboxes by other"

/override ci/prow/e2e-aws-fips
/override ci/prow/e2e-gcp

openshift-ci · 2022-04-04T17:11:04Z

@deads2k: Overrode contexts on behalf of deads2k: ci/prow/e2e-aws-fips, ci/prow/e2e-gcp

In response to this:

failed on "pods should successfully create sandboxes by other"

/override ci/prow/e2e-aws-fips
/override ci/prow/e2e-gcp

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2022-04-04T17:13:49Z

@deads2k: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-cgroupsv2	`612c4db`	link	false	`/test e2e-aws-cgroupsv2`
ci/prow/e2e-aws-single-node	`612c4db`	link	false	`/test e2e-aws-single-node`
ci/prow/e2e-agnostic-cmd	`612c4db`	link	false	`/test e2e-agnostic-cmd`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci bot requested review from adambkaplan and sjenning March 14, 2022 21:18

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 14, 2022

deads2k force-pushed the pod-events-4-sort branch from 42e3198 to de49307 Compare March 21, 2022 15:52

deads2k changed the title ~~Pod events 4 sort~~ add pod lifecycle intervals to separate pages Mar 21, 2022

DennisPeriquet reviewed Mar 22, 2022

View reviewed changes

deads2k added 10 commits March 23, 2022 12:13

add intervals for pod lifecycle

c201d9a

include pod lifecycle events in intervals

526173a

stop ignoring pods that are created before the test begins

ea03cec

track every pod in the monitor

7e5cbdc

have runonce pods display as gone when their containers are stopped

8723d83

order the pod lifecycle by namespace

b39521b

make separate pages for pod lifecycle

4f43492

tidy up deleted runonce pod end time

21d28e5

sanitize start/end of pod lifecycle

7051bfe

bound container lifecycle based on container status termination time

0a6748a

openshift-ci bot assigned DennisPeriquet Apr 1, 2022

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 1, 2022

deads2k added the lgtm Indicates that a PR is ready to be merged. label Apr 4, 2022

openshift-merge-robot merged commit 02cd062 into openshift:master Apr 4, 2022

add pod lifecycle intervals to separate pages #26908

add pod lifecycle intervals to separate pages #26908

Conversation

deads2k commented Mar 14, 2022 • edited Loading

openshift-ci bot commented Mar 14, 2022

deads2k commented Mar 21, 2022

deads2k commented Mar 21, 2022

DennisPeriquet commented Mar 21, 2022

DennisPeriquet commented Mar 21, 2022 • edited Loading

DennisPeriquet commented Mar 21, 2022 • edited Loading

DennisPeriquet Mar 22, 2022

Choose a reason for hiding this comment

deads2k Mar 22, 2022

Choose a reason for hiding this comment

DennisPeriquet Mar 22, 2022

Choose a reason for hiding this comment

deads2k Mar 22, 2022

Choose a reason for hiding this comment

deads2k commented Mar 22, 2022

deads2k commented Mar 22, 2022

deads2k commented Mar 22, 2022

DennisPeriquet Mar 22, 2022

Choose a reason for hiding this comment

DennisPeriquet Mar 22, 2022

Choose a reason for hiding this comment

deads2k commented Mar 22, 2022

DennisPeriquet commented Mar 22, 2022

openshift-bot commented Mar 31, 2022

openshift-bot commented Mar 31, 2022

openshift-bot commented Mar 31, 2022

openshift-bot commented Mar 31, 2022

openshift-bot commented Mar 31, 2022

openshift-bot commented Mar 31, 2022

openshift-bot commented Mar 31, 2022

deads2k commented Mar 31, 2022

openshift-bot commented Mar 31, 2022

openshift-bot commented Mar 31, 2022

openshift-bot commented Apr 1, 2022

openshift-bot commented Apr 1, 2022

openshift-bot commented Apr 1, 2022

openshift-bot commented Apr 1, 2022

openshift-bot commented Apr 1, 2022

openshift-bot commented Apr 1, 2022

DennisPeriquet commented Apr 1, 2022

DennisPeriquet commented Apr 1, 2022

deads2k commented Apr 4, 2022

deads2k commented Apr 4, 2022

deads2k commented Apr 4, 2022

openshift-bot commented Apr 4, 2022

openshift-bot commented Apr 4, 2022

deads2k commented Apr 4, 2022

openshift-ci bot commented Apr 4, 2022

openshift-ci bot commented Apr 4, 2022

deads2k commented Mar 14, 2022 •

edited

Loading

DennisPeriquet commented Mar 21, 2022 •

edited

Loading

DennisPeriquet commented Mar 21, 2022 •

edited

Loading