-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround for sampling=1 bugs #132
Workaround for sampling=1 bugs #132
Conversation
cc @memodi |
By default, prevent setting sampling=1 for IPFIX (automatically raised to 2 in that case) If flag "forceAllowSamplingAll" is set to true, the tule above is ignored and user can still set it to 1 at their own risks See bugs: - https://bugzilla.redhat.com/show_bug.cgi?id=2103136 - https://bugzilla.redhat.com/show_bug.cgi?id=2104943 Change ipfix cache defaults: - timeout decreased to 20s - max-flows increased to 400
d522f00
to
39b6d23
Compare
My main concern is that we are including a behavior that could be unnoticed by many users that do not completely read the documentation. Also, if the issue gets fixed and we remove this automatic fix in a future version, many users could see an unwanted/unexpected change in the behavior after they update (with implications e.g. in their internal network traffic or storage). If we can't expect that the causing issue is fixed soon, I'd do: if sampling == 1, instead of silently set it to 2, return error and inform that sampling must be 2 or forceAllowSamplingAll property must be explicitly set to true at their own risk. |
good point, yeah, probably better to return an error |
5130818
to
9f64b66
Compare
It allows to inform the user of an issue as soon as they create or update the custom resource Field forceSampleAll allows to override the sampling definition
9f64b66
to
4812d0f
Compare
/retest |
format: int32 | ||
minimum: 0 | ||
minimum: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no issues on eBPF
side so I would not force this minimum here.
Moreover, I would prefer a realistic minimum based on perf tests for each agent more than a workaround here. Thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sampling for ebpf and ipfix are two different configs, so no problem on that side.
I don't think we want to introduce minimums in general, ideally, users should be able to set whatever they want and be careful about what they do. But the issue we found is critical enough (like, put a cluster down) and happened quite consistently with ipfix' sampling=1, so that's why we take extra measures to prevent it to happen in that specific situation. The stability really isn't the same between sampling=1 and sampling=2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry I though it was a common field. I agree the user should understand what he does and not able to put the cluster down.
What about adding some numbered recommendations in the descriptions based on performance results as a followup task ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's something we need to document anyway. Captured here: https://issues.redhat.com/browse/NETOBSERV-493
I am going to merge this as I don't see moves on OVS side at the moment. We will rollback if it ends up being unnecessary. I'll also start releasing v0.1.4 candidate after that. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jotak The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
fix manually location db and update automatic workflow
By default, prevent setting sampling=1 for IPFIX
(automatically raisedto 2 in that case)
Flag "forceSampleAll" can be use to force sampling=1 (used at their own risks)
See bugs:
Change ipfix cache defaults: