Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hook up resource watch to one job #33109

Merged

Conversation

xueqzhan
Copy link
Contributor

@xueqzhan xueqzhan commented Oct 13, 2022

Hook up resource watch observer

TRT-469

@xueqzhan
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 13, 2022

@xueqzhan: the repo openshift/release does not contribute to the OpenShift official images

@openshift-ci openshift-ci bot added the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label Oct 13, 2022
@openshift-ci openshift-ci bot removed the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label Oct 13, 2022
@xueqzhan xueqzhan force-pushed the resource-watch-observer branch 2 times, most recently from 3ee322e to 37aa374 Compare October 14, 2022 12:46
@xueqzhan
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 14, 2022

@xueqzhan: the repo openshift/release does not contribute to the OpenShift official images

@xueqzhan
Copy link
Contributor Author

/pj-rehearse

@xueqzhan
Copy link
Contributor Author

/test pj-rehearse

1 similar comment
@xueqzhan
Copy link
Contributor Author

/test pj-rehearse

@danilo-gemoli
Copy link
Contributor

@xueqzhan if you're not sure how to configure an observer or what fields are allowed, here you can find the entire ci-operator configuration reference

@xueqzhan
Copy link
Contributor Author

@xueqzhan if you're not sure how to configure an observer or what fields are allowed, here you can find the entire ci-operator configuration reference

@danilo-gemoli Thanks for the link! Maybe I am missing something. I do not see how to define timeout for observer in that link. I also posted the question on the slack: https://coreos.slack.com/archives/CBN38N3MW/p1665769983392569. Feel free to communicate at the place of your preference.

@xueqzhan xueqzhan force-pushed the resource-watch-observer branch 5 times, most recently from bc147d5 to 0ce6484 Compare October 19, 2022 15:32

echo "ended resource watch gracefully"
}
trap cleanup SIGTERM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I suggest to replace this with trap cleanup EXIT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That worked! Thanks!

@xueqzhan
Copy link
Contributor Author

@dgoodwin We should be good to go with this change.

@vrutkovs
Copy link
Member

/uncc

@openshift-ci openshift-ci bot removed the request for review from vrutkovs October 24, 2022 06:52
Copy link
Contributor

@dgoodwin dgoodwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

approvers:
- deads2k
- dgoodwin
- stbenjam
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please feel free to add yourself here, we'll need your expertise to approve having dug into this.

cpu: 10m
memory: 10Mi
documentation: |-
A observer for watch and record cluster resources
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A observer for watch and record cluster resources
An observer to watch all changes to a defined set of cluster resources throughout the life of the cluster, and record them to a git repository.

commands: observers-resource-watch-commands.sh
resources:
requests:
cpu: 10m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would bump that up quite a bit, from my testing it looks like we sometimes struggle to commit to the repo fast enough, some of that is probably disk but I think we should err on the side of caution and a tenth of a CPU is very low. (this is 0.01 CPUs) I suggest just changing this to "1" for a full CPU, or maybe just "500m" for a half CPU.

resources:
requests:
cpu: 10m
memory: 10Mi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems dangerously low as well, maybe 500Mi?

kill ${CHILDREN} && wait
fi

tar -czC $REPOSITORY_PATH -f "${ARTIFACT_DIR}/resource-watch-store.tar.gz" .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why this is coming out not gzipped? https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/33109/rehearse-33109-periodic-ci-openshift-release-master-ci-4.12-e2e-aws-ovn-serial/1583447789566693376/artifacts/e2e-aws-ovn-serial/observers-resource-watch/artifacts/

Is this the thing we've heard mentioned where gzip appears to be getting automatically undone somewhere in our artifacts stack? (cc @DennisPeriquet @stbenjam, can't recall who mentioned this) If so we should probably just skip the gzip step.

I do have one nit here, could we structure this such that there's another sub-directory in play? I notice when you download the tar and extract it, all the files drop in your current dir. It would be great if they were isolated in a sub-dir so we don't accidentally spew the files all over places we didn't intend to.

I think this would be using REPOSITORY_PATH="${ARTIFACT_DIR}/resource-watch-store/repo", but when we go to tar, only using ${ARTIFACT_DIR}/resource-watch-store/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like it's actually zipped even through it appears with just .tar.

Mozart:Downloads dperique$ file resource-watch-store.tar 
resource-watch-store.tar: gzip compressed data
Mozart:Downloads dperique$ ls -lh resource-watch-store.tar 
-rw-r--r--@ 1 dperique  staff   3.8M Oct 24 08:59 resource-watch-store.tar

Mozart:Downloads dperique$ mv resource-watch-store.tar resource-watch-store.tar.gz
Mozart:Downloads dperique$ gzip -d resource-watch-store.tar.gz 
Mozart:Downloads dperique$ ls -lh resource-watch-store.tar 
-rw-r--r--  1 dperique  staff    11M Oct 24 08:59 resource-watch-store.tar

set -o nounset
set -o pipefail

export REPOSITORY_PATH="${ARTIFACT_DIR}/resource-watch-store"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's neat that we still get the git repo even if the signal is lost and we fail to tar it up.

@dgoodwin
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 24, 2022
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 24, 2022
@dgoodwin
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 24, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 24, 2022

@xueqzhan: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-gcp-ovn-upgrade 905056a link unknown /test pj-rehearse
ci/rehearse/periodic-ci-openshift-release-master-nightly-4.12-e2e-metal-ipi-sdn-bm 905056a link unknown /test pj-rehearse
ci/rehearse/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-azure-sdn-upgrade 905056a link unknown /test pj-rehearse

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@danilo-gemoli
Copy link
Contributor

@jmguzik @bbguimaraes observers have not caused blocking issues so far.
/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 25, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danilo-gemoli, dgoodwin, xueqzhan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 25, 2022
@openshift-merge-robot openshift-merge-robot merged commit dcb99f9 into openshift:master Oct 25, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 25, 2022

@xueqzhan: Updated the ci-operator-master-configs configmap in namespace ci at cluster app.ci using the following files:

  • key openshift-release-master__ci-4.12-upgrade-from-stable-4.11.yaml using file ci-operator/config/openshift/release/openshift-release-master__ci-4.12-upgrade-from-stable-4.11.yaml
  • key openshift-release-master__ci-4.12.yaml using file ci-operator/config/openshift/release/openshift-release-master__ci-4.12.yaml
  • key openshift-release-master__nightly-4.12.yaml using file ci-operator/config/openshift/release/openshift-release-master__nightly-4.12.yaml

In response to this:

Testing resource watch

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
6 participants