Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-introduce DPCR Loki logging for GCP and Azure clusters #39064

Merged
merged 5 commits into from May 8, 2023

Conversation

dgoodwin
Copy link
Contributor

@dgoodwin dgoodwin commented May 5, 2023

  • Revert "Revert "Enable DPCR Loki for specific set of jobs (dpcr loki #38914)""
  • Stop sending audit logs to loki
  • Set resource requests on new promtail prod-bearer-token container
  • Enable loki logging for all azure jobs

TRT-968

Initial tests shows about 1.9 million log lines sent to loki for a
single job run. 1 million of them were audit logs, so this will
eliminate almost half our logging load by itself.

Remove unused mounts for the audit logs
Will fail a test without this:

: [sig-arch] Managed cluster should set requests but not limits [Suite:openshift/conformance/parallel] expand_less
Run #0: Failed expand_less 	6s
{  fail [github.com/openshift/origin/test/extended/operators/resources.go:196]: May  5 09:10:34.626: Pods in platform namespaces are not following resource request/limit rules or do not have an exception granted:
  apps/v1/DaemonSet/openshift-e2e-loki/loki-promtail/container/prod-bearer-token does not have a cpu request (rule: "apps/v1/DaemonSet/openshift-e2e-loki/loki-promtail/container/prod-bearer-token/request[cpu]")
  apps/v1/DaemonSet/openshift-e2e-loki/loki-promtail/container/prod-bearer-token does not have a memory request (rule: "apps/v1/DaemonSet/openshift-e2e-loki/loki-promtail/container/prod-bearer-token/request[memory]")
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 5, 2023
@dgoodwin
Copy link
Contributor Author

dgoodwin commented May 5, 2023

/pj-rehearse periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade
/pj-rehearse periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn-upgrade
/pj-rehearse periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-sdn-upgrade

@dgoodwin dgoodwin changed the title dpcr loki 2 Re-introduce DPCR Loki logging for GCP and Azure clusters May 5, 2023
@dgoodwin
Copy link
Contributor Author

dgoodwin commented May 5, 2023

/pj-rehearse periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade
/pj-rehearse periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn-upgrade
/pj-rehearse periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-sdn-upgrade

@dgoodwin
Copy link
Contributor Author

dgoodwin commented May 5, 2023

/pj-rehearse periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn-upgrade

@dgoodwin
Copy link
Contributor Author

dgoodwin commented May 5, 2023

/pj-rehearse periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-sdn-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 5, 2023

@dgoodwin: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade 16d29fa link unknown /pj-rehearse periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 5, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2023
@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@dgoodwin: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-redhat-developer-gitops-operator-master-v4.12-e2e redhat-developer/gitops-operator presubmit Registry content changed
pull-ci-redhat-developer-gitops-operator-master-v4.11-e2e redhat-developer/gitops-operator presubmit Registry content changed
pull-ci-redhat-developer-gitops-operator-master-v4.10-e2e redhat-developer/gitops-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-master-e2e-aws-ovn openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.15-e2e-aws-ovn openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.14-e2e-aws-ovn openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.13-e2e-aws-ovn openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.12-e2e-aws-ovn openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.11-e2e-aws openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.10-e2e-aws openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.9-e2e-aws openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.8-e2e-aws openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.7-e2e-aws openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.6-e2e-aws openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.5-e2e-aws openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.4-e2e-aws openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.3-e2e-aws openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-master-operator-e2e openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.15-operator-e2e openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.14-operator-e2e openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.13-operator-e2e openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.12-operator-e2e openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.11-operator-e2e openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.10-operator-e2e openshift/ptp-operator presubmit Registry content changed
pull-ci-openshift-ptp-operator-release-4.9-operator-e2e openshift/ptp-operator presubmit Registry content changed
pull-ci-redhat-cne-cloud-event-proxy-main-e2e-aws redhat-cne/cloud-event-proxy presubmit Registry content changed
pull-ci-redhat-cne-cloud-event-proxy-release-4.15-e2e-aws redhat-cne/cloud-event-proxy presubmit Registry content changed
pull-ci-redhat-cne-cloud-event-proxy-release-4.14-e2e-aws redhat-cne/cloud-event-proxy presubmit Registry content changed
pull-ci-redhat-cne-cloud-event-proxy-release-4.13-e2e-aws redhat-cne/cloud-event-proxy presubmit Registry content changed
pull-ci-redhat-cne-cloud-event-proxy-release-4.12-e2e-aws redhat-cne/cloud-event-proxy presubmit Registry content changed
pull-ci-redhat-cne-cloud-event-proxy-release-4.11-e2e-aws redhat-cne/cloud-event-proxy presubmit Registry content changed
pull-ci-redhat-cne-cloud-event-proxy-release-4.10-e2e-aws redhat-cne/cloud-event-proxy presubmit Registry content changed
pull-ci-redhat-cne-cloud-event-proxy-release-4.9-e2e-aws redhat-cne/cloud-event-proxy presubmit Registry content changed
pull-ci-openshift-service-ca-operator-master-e2e-aws-operator openshift/service-ca-operator presubmit Registry content changed
pull-ci-openshift-service-ca-operator-release-4.15-e2e-aws-operator openshift/service-ca-operator presubmit Registry content changed

A total of 12431 jobs have been affected by this change. The above listing is non-exhaustive and limited to 35 jobs.

A full list of affected jobs can be found here

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 10 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 20 rehearsals
Comment: /pj-rehearse max to run up to 35 rehearsals
Comment: /pj-rehearse auto-ack to run up to 10 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse abort to abort all active rehearsals

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@dgoodwin
Copy link
Contributor Author

dgoodwin commented May 8, 2023

/pj-rehearse ack

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label May 8, 2023
@stbenjam
Copy link
Member

stbenjam commented May 8, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 8, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 8, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit ad7db2f into openshift:master May 8, 2023
13 checks passed
ascerra pushed a commit to ascerra/release that referenced this pull request May 8, 2023
…39064)

* Revert "Revert "Enable DPCR Loki for specific set of jobs (openshift#38914)""

This reverts commit 2b7a44f.

* Stop sending audit logs to loki

Initial tests shows about 1.9 million log lines sent to loki for a
single job run. 1 million of them were audit logs, so this will
eliminate almost half our logging load by itself.

Remove unused mounts for the audit logs

* Set resource requests on new promtail prod-bearer-token container

Will fail a test without this:

: [sig-arch] Managed cluster should set requests but not limits [Suite:openshift/conformance/parallel] expand_less
Run #0: Failed expand_less 	6s
{  fail [github.com/openshift/origin/test/extended/operators/resources.go:196]: May  5 09:10:34.626: Pods in platform namespaces are not following resource request/limit rules or do not have an exception granted:
  apps/v1/DaemonSet/openshift-e2e-loki/loki-promtail/container/prod-bearer-token does not have a cpu request (rule: "apps/v1/DaemonSet/openshift-e2e-loki/loki-promtail/container/prod-bearer-token/request[cpu]")
  apps/v1/DaemonSet/openshift-e2e-loki/loki-promtail/container/prod-bearer-token does not have a memory request (rule: "apps/v1/DaemonSet/openshift-e2e-loki/loki-promtail/container/prod-bearer-token/request[memory]")

* Enable loki logging for all azure jobs
danielerez added a commit to danielerez/assisted-test-infra that referenced this pull request May 9, 2023
In Makefile, ensure ADDITIONAL_MANIFEST_DIR exists
before trying to move its content.

Needed for trying to resolve the issue in
e2e-metal-single-node-live-iso job[*]

```
mv /root/sno-additional-manifests/* /home/sno/sno-additional-manifests/
mv: cannot stat '/root/sno-additional-manifests/*': No such file or directory
make: *** [Makefile:280: deploy_ibip] Error 1
```

Note: could be related to openshift/release#39064

[*] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_assisted-test-infra/2143/pull-ci-openshift-assisted-test-infra-master-e2e-metal-single-node-live-iso/1655682994788110336/build-log.txt
danielerez added a commit to danielerez/assisted-test-infra that referenced this pull request May 9, 2023
In Makefile, ensure ADDITIONAL_MANIFEST_DIR exists
before trying to move its content.

Needed for trying to resolve the issue in
e2e-metal-single-node-live-iso job[*]

```
mv /root/sno-additional-manifests/* /home/sno/sno-additional-manifests/
mv: cannot stat '/root/sno-additional-manifests/*': No such file or directory
make: *** [Makefile:280: deploy_ibip] Error 1
```

Note: could be related to openshift/release#39064

[*] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_assisted-test-infra/2143/pull-ci-openshift-assisted-test-infra-master-e2e-metal-single-node-live-iso/1655682994788110336/build-log.txt
danielerez added a commit to danielerez/assisted-test-infra that referenced this pull request May 9, 2023
In Makefile, ensure ADDITIONAL_MANIFEST_DIR exists
before trying to move its content.

Needed for trying to resolve the issue in
e2e-metal-single-node-live-iso job[*]

```
mv /root/sno-additional-manifests/* /home/sno/sno-additional-manifests/
mv: cannot stat '/root/sno-additional-manifests/*': No such file or directory
make: *** [Makefile:280: deploy_ibip] Error 1
```

Note: could be related to openshift/release#39064

[*] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_assisted-test-infra/2143/pull-ci-openshift-assisted-test-infra-master-e2e-metal-single-node-live-iso/1655682994788110336/build-log.txt
danielerez added a commit to danielerez/assisted-test-infra that referenced this pull request May 9, 2023
In Makefile, ensure ADDITIONAL_MANIFEST_DIR exists
before trying to move its content.

Needed for trying to resolve the issue in
e2e-metal-single-node-live-iso job[*]

```
mv /root/sno-additional-manifests/* /home/sno/sno-additional-manifests/
mv: cannot stat '/root/sno-additional-manifests/*': No such file or directory
make: *** [Makefile:280: deploy_ibip] Error 1
```

Note: could be related to openshift/release#39064

[*] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_assisted-test-infra/2143/pull-ci-openshift-assisted-test-infra-master-e2e-metal-single-node-live-iso/1655682994788110336/build-log.txt
openshift-merge-robot pushed a commit to openshift/assisted-test-infra that referenced this pull request May 9, 2023
In Makefile, ensure ADDITIONAL_MANIFEST_DIR exists
before trying to move its content.

Needed for trying to resolve the issue in
e2e-metal-single-node-live-iso job[*]

```
mv /root/sno-additional-manifests/* /home/sno/sno-additional-manifests/
mv: cannot stat '/root/sno-additional-manifests/*': No such file or directory
make: *** [Makefile:280: deploy_ibip] Error 1
```

Note: could be related to openshift/release#39064

[*] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_assisted-test-infra/2143/pull-ci-openshift-assisted-test-infra-master-e2e-metal-single-node-live-iso/1655682994788110336/build-log.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged
Projects
None yet
4 participants