Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add performance metrics for initial sync and netpol #3450

Merged
merged 2 commits into from Mar 6, 2023

Conversation

npinaeva
Copy link
Member

@npinaeva npinaeva commented Feb 28, 2023

Add MetricMasterSyncDuration metric to track how much it takes for
ovn-k master to start watching every resource.

Add scale metrics for network policies, rename existing
enable-eip-scale-metrics flag to more general enable-scale-metrics, and
use it for network policy metric too.

Add an option to enable scale metrics in kind

For anyone who would like to test this, ovnkube_master_sync_duration_seconds is enabled by default. To check scale metrics use newly added kind.sh flag --scale-metrics
To fetch master metrics curl <ovn-k master leader ip>:9409/metrics

@coveralls
Copy link

Coverage Status

Coverage: 51.197% (-0.2%) from 51.386% when pulling 750d039 on npinaeva:netpol-perf-metrics into 396d8ee on ovn-org:master.

@npinaeva
Copy link
Member Author

npinaeva commented Mar 1, 2023

/retest

EnableConfigDuration bool `gcfg:"enable-config-duration"`
EnableEIPScaleMetrics bool `gcfg:"enable-eip-scale-metrics"`
EnableConfigDuration bool `gcfg:"enable-config-duration"`
EnableScaleMetrics bool `gcfg:"enable-scale-metrics"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flavio had added the previous EIP metrics flag:

    Add flags to explicitly enable the histogram metrics, since we only see value
    in having them when scale testing egress ips. The flag introduced here is:
        --metrics-enable-eip-scale
    
    Signed-off-by: Flavio Fernandes <flaviof@redhat.com>

@flavio-fernandes are you ok with changing this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trozet yes. we can change it w/out any problems.

@@ -368,35 +368,39 @@ func (oc *DefaultNetworkController) Run(ctx context.Context) error {

// Sync external gateway routes. External gateway may be set in namespaces
// or via pods. So execute an individual sync method at startup
oc.cleanExGwECMPRoutes()
WithWatchDurationMetricNoError("external gateway router", oc.cleanExGwECMPRoutes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this something we are interested in? I guess it doesn't hurt but this isnt really a watcher, it just cleans up stale stuff (sync) function. Also it should be called "external gateway routes", and maybe add the word "cleanup"?

Copy link
Member Author

@npinaeva npinaeva Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I added this to make sure we see where the summarized time comes from, about the name - the metric is actually called sync_duration_seconds - probably not the best name, but at least it can be applied to watchers and syncs like this one.

ovn-k master to start watching every resource.
Add scale metrics for network policies, rename existing
enable-eip-scale-metrics flag to more general enable-scale-metrics, and
use it for network policy metric too.

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
Copy link
Contributor

@flavio-fernandes flavio-fernandes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@npinaeva
Copy link
Member Author

npinaeva commented Mar 6, 2023

@trozet we have an lgtm, do you have any more comments?

@trozet trozet merged commit 4780c08 into ovn-org:master Mar 6, 2023
@npinaeva npinaeva deleted the netpol-perf-metrics branch March 6, 2023 15:30
@martinkennelly
Copy link
Member

@npinaeva can you create a new PR updating the metrics doc with these new metrics?

@npinaeva
Copy link
Member Author

@martinkennelly sure, I just wanted to ask if we mention scale metric in that doc? I see egress ip metrics are not there, so should I only add sync_duration_seconds?

@martinkennelly
Copy link
Member

@npinaeva yeah, i think thats ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants