-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add monitor store to resource watch #27492
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: xueqzhan The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
8031e1b
to
83def6c
Compare
83def6c
to
c2a9c4d
Compare
/retest-required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one was challenging for me to review but I'm starting to get a handle on it. Would be good to do some discussion sometime this week and that might help me clarify more. I was struggling with the dual entrypoints, and whether or not we could simplify the process for adding a new resource type to monitor in the future, as right now it requires several changes in several places. Hopefully this will become more clear with some discussion, if it's even feasible at all.
@@ -16,9 +16,19 @@ import ( | |||
"k8s.io/client-go/tools/cache" | |||
) | |||
|
|||
func startEventMonitoring(ctx context.Context, m Recorder, client kubernetes.Interface) { | |||
reMatchFirstQuote := regexp.MustCompile(`"([^"]+)"( in (\d+(\.\d+)?(s|ms)$))?`) | |||
var reMatchFirstQuote = regexp.MustCompile(`"([^"]+)"( in (\d+(\.\d+)?(s|ms)$))?`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you godoc this regex variable, that's a tough one for a reader to understand what it does and how/where it's used at a glance.
nodeInformer := informercorev1.NewNodeInformer(client, time.Hour, nil) | ||
nodeInformer.AddEventHandler( | ||
cache.ResourceEventHandlerFuncs{ | ||
AddFunc: func(obj interface{}) {}, | ||
AddFunc: NodeAddFunc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeAddFunc remains empty, was there a need to do anything there that got missed?
@@ -21,7 +25,35 @@ import ( | |||
"github.com/openshift/origin/pkg/monitor/resourcewatch/storage" | |||
) | |||
|
|||
var ( | |||
monitorStoreStr = "monitor" | |||
gitStoreStr = "git" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With regards to running the watch twice in the observer, this does have memory implications, we mirror every watched resource in memory, and doing so twice means we double that. I don't know how significant it is though with the size of clusters we're dealing with, but it could be worth mentioning. Would it be non-trivial to compose the two stores into one, so we only listwatch once and dispatch to both?
flags := cmd.Flags() | ||
flags.StringVar(&store, "store", store, "Store to use for resource watch. Currently supported values are git or monitor.") | ||
flags.StringVar(&artifactDir, "artifact-dir", artifactDir, "The directory to write test reports to.") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be worth a convo after scrum and when David is there, but I've gotten confused about where we go with the monitor intervals.
Does this PR shut off the normal monitoring of pod/node resources while openshift-tests runs tests, in favor of only monitoring these from an observer? My concern is, we can't assume observers are used globally can we? In which case we'd lose the old intervals if the observer wasn't running.
The one thing I want to very explicitly avoid is both processes catching the same events, and then trying to deduplicate them when they probably have slightly different timestamps.
Getting them from the observer seems better because we have more timeline available to us. How can we balance rolling that out, but keep other jobs not using an observer still getting their events, avoid duplication, or do we try to ensure that everyone uses the observer?
func (s *monitorStorage) OnAdd(obj interface{}) { | ||
objUnstructured, ok := obj.(*unstructured.Unstructured) | ||
if !ok { | ||
klog.Warningf("Object is not unstructured: %v", obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you want a return here.
if err != nil { | ||
klog.Warningf("Decoding %s failed with error: %v", objUnstructured.GetName(), err) | ||
} | ||
monitor.NodeAddFunc(nodeObj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will get called when we fail to decode, looks like we should return here and for all examples like this below.
if err != nil { | ||
return err | ||
} | ||
configStore, err = storage.NewMonitorStorage(artifactDir, eventsClient) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make any more sense to launch the startClusterOperatorMonitoring funcs, instead of an actual separate store? This is a very loose question just something to think about. Are there benefits to separate store?
Then again, maybe the startX funcs die off depending on the other question posed here about duplicated intervals.
@xueqzhan: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@xueqzhan: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Main changes with this PR:
TRT-469