Add monitor store to resource watch #27492

xueqzhan · 2022-10-24T19:29:18Z

Main changes with this PR:

Reorg existing monitor files (for node, pod, cluster operator etc) so that codes can be called from within monitor package and outside monitor package (e.g. for resource watch)
Add one monitor store for resource watch. For resources supported by monitor (currently node, pod, cluster operator and cluster version), an effort has been made to maintain the same event formats. Events is not supported yet.
Monitor store can be selected instead of git repo. But in a real observer, both stores can be started with two different instances of openshift-tests binary.

openshift-ci · 2022-10-24T19:30:02Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: xueqzhan
Once this PR has been reviewed and has the lgtm label, please assign spadgett for approval by writing /assign @spadgett in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

xueqzhan · 2022-10-28T11:57:38Z

/retest-required

dgoodwin

This one was challenging for me to review but I'm starting to get a handle on it. Would be good to do some discussion sometime this week and that might help me clarify more. I was struggling with the dual entrypoints, and whether or not we could simplify the process for adding a new resource type to monitor in the future, as right now it requires several changes in several places. Hopefully this will become more clear with some discussion, if it's even feasible at all.

dgoodwin · 2022-10-31T12:24:56Z

pkg/monitor/event.go

@@ -16,9 +16,19 @@ import (
 	"k8s.io/client-go/tools/cache"
 )

-func startEventMonitoring(ctx context.Context, m Recorder, client kubernetes.Interface) {
-	reMatchFirstQuote := regexp.MustCompile(`"([^"]+)"( in (\d+(\.\d+)?(s|ms)$))?`)
+var reMatchFirstQuote = regexp.MustCompile(`"([^"]+)"( in (\d+(\.\d+)?(s|ms)$))?`)


Could you godoc this regex variable, that's a tough one for a reader to understand what it does and how/where it's used at a glance.

dgoodwin · 2022-10-31T14:25:56Z

pkg/monitor/node.go

 	nodeInformer := informercorev1.NewNodeInformer(client, time.Hour, nil)
 	nodeInformer.AddEventHandler(
 		cache.ResourceEventHandlerFuncs{
-			AddFunc: func(obj interface{}) {},
+			AddFunc: NodeAddFunc,


NodeAddFunc remains empty, was there a need to do anything there that got missed?

dgoodwin · 2022-10-31T16:49:44Z

pkg/monitor/resourcewatch/operator/starter.go

@@ -21,7 +25,35 @@ import (
 	"github.com/openshift/origin/pkg/monitor/resourcewatch/storage"
 )

+var (
+	monitorStoreStr = "monitor"
+	gitStoreStr     = "git"


With regards to running the watch twice in the observer, this does have memory implications, we mirror every watched resource in memory, and doing so twice means we double that. I don't know how significant it is though with the size of clusters we're dealing with, but it could be worth mentioning. Would it be non-trivial to compose the two stores into one, so we only listwatch once and dispatch to both?

dgoodwin · 2022-10-31T16:53:33Z

pkg/monitor/resourcewatch/operator/starter.go

+	flags := cmd.Flags()
+	flags.StringVar(&store, "store", store, "Store to use for resource watch. Currently supported values are git or monitor.")
+	flags.StringVar(&artifactDir, "artifact-dir", artifactDir, "The directory to write test reports to.")
+}


This might be worth a convo after scrum and when David is there, but I've gotten confused about where we go with the monitor intervals.

Does this PR shut off the normal monitoring of pod/node resources while openshift-tests runs tests, in favor of only monitoring these from an observer? My concern is, we can't assume observers are used globally can we? In which case we'd lose the old intervals if the observer wasn't running.

The one thing I want to very explicitly avoid is both processes catching the same events, and then trying to deduplicate them when they probably have slightly different timestamps.

Getting them from the observer seems better because we have more timeline available to us. How can we balance rolling that out, but keep other jobs not using an observer still getting their events, avoid duplication, or do we try to ensure that everyone uses the observer?

dgoodwin · 2022-10-31T17:19:52Z

pkg/monitor/resourcewatch/storage/monitor_store.go

+func (s *monitorStorage) OnAdd(obj interface{}) {
+	objUnstructured, ok := obj.(*unstructured.Unstructured)
+	if !ok {
+		klog.Warningf("Object is not unstructured: %v", obj)


Looks like you want a return here.

dgoodwin · 2022-10-31T17:23:02Z

pkg/monitor/resourcewatch/storage/monitor_store.go

+			if err != nil {
+				klog.Warningf("Decoding %s failed with error: %v", objUnstructured.GetName(), err)
+			}
+			monitor.NodeAddFunc(nodeObj)


This will get called when we fail to decode, looks like we should return here and for all examples like this below.

dgoodwin · 2022-10-31T17:28:36Z

pkg/monitor/resourcewatch/operator/starter.go

+		if err != nil {
+			return err
+		}
+		configStore, err = storage.NewMonitorStorage(artifactDir, eventsClient)


Would it make any more sense to launch the startClusterOperatorMonitoring funcs, instead of an actual separate store? This is a very loose question just something to think about. Are there benefits to separate store?

Then again, maybe the startX funcs die off depending on the other question posed here about duplicated intervals.

openshift-ci · 2022-11-04T21:27:16Z

@xueqzhan: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-single-node-upgrade	`3665393`	link	false	`/test e2e-aws-ovn-single-node-upgrade`
ci/prow/e2e-agnostic-ovn-cmd	`3665393`	link	false	`/test e2e-agnostic-ovn-cmd`
ci/prow/e2e-metal-ipi-ovn-ipv6	`3665393`	link	false	`/test e2e-metal-ipi-ovn-ipv6`
ci/prow/e2e-metal-ipi-sdn	`3665393`	link	false	`/test e2e-metal-ipi-sdn`
ci/prow/e2e-aws-csi	`3665393`	link	false	`/test e2e-aws-csi`
ci/prow/e2e-gcp-csi	`3665393`	link	false	`/test e2e-gcp-csi`
ci/prow/e2e-aws-ovn-single-node-serial	`3665393`	link	false	`/test e2e-aws-ovn-single-node-serial`
ci/prow/e2e-aws-ovn-image-registry	`3665393`	link	true	`/test e2e-aws-ovn-image-registry`
ci/prow/e2e-gcp-ovn-image-ecosystem	`3665393`	link	true	`/test e2e-gcp-ovn-image-ecosystem`
ci/prow/e2e-gcp-ovn-builds	`3665393`	link	true	`/test e2e-gcp-ovn-builds`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-merge-robot · 2022-11-04T21:27:24Z

@xueqzhan: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci bot requested review from csrwng and spadgett October 24, 2022 19:30

xueqzhan force-pushed the resource-watch-monitor branch from 8031e1b to 83def6c Compare October 24, 2022 20:39

Add monitor store to resource watch

c2a9c4d

xueqzhan force-pushed the resource-watch-monitor branch from 83def6c to c2a9c4d Compare October 25, 2022 21:52

Add test case to monitor store

3665393

dgoodwin reviewed Oct 31, 2022

View reviewed changes

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 4, 2022

xueqzhan closed this Nov 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add monitor store to resource watch #27492

Add monitor store to resource watch #27492

xueqzhan commented Oct 24, 2022 •

edited

openshift-ci bot commented Oct 24, 2022

xueqzhan commented Oct 28, 2022

dgoodwin left a comment

dgoodwin Oct 31, 2022

dgoodwin Oct 31, 2022

dgoodwin Oct 31, 2022

dgoodwin Oct 31, 2022

dgoodwin Oct 31, 2022

dgoodwin Oct 31, 2022

dgoodwin Oct 31, 2022

openshift-ci bot commented Nov 4, 2022

openshift-merge-robot commented Nov 4, 2022

Add monitor store to resource watch #27492

Add monitor store to resource watch #27492

Conversation

xueqzhan commented Oct 24, 2022 • edited

openshift-ci bot commented Oct 24, 2022

xueqzhan commented Oct 28, 2022

dgoodwin left a comment

Choose a reason for hiding this comment

dgoodwin Oct 31, 2022

Choose a reason for hiding this comment

dgoodwin Oct 31, 2022

Choose a reason for hiding this comment

dgoodwin Oct 31, 2022

Choose a reason for hiding this comment

dgoodwin Oct 31, 2022

Choose a reason for hiding this comment

dgoodwin Oct 31, 2022

Choose a reason for hiding this comment

dgoodwin Oct 31, 2022

Choose a reason for hiding this comment

dgoodwin Oct 31, 2022

Choose a reason for hiding this comment

openshift-ci bot commented Nov 4, 2022

openshift-merge-robot commented Nov 4, 2022

xueqzhan commented Oct 24, 2022 •

edited