New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NETOBSERV-1099: remove reporter option #311
Conversation
Opening as draft for now ; it may need some discussion.
Also, as this is kind of a new feature, I suggest to hold for 1.2 and merge that only for 1.3 |
Codecov Report
@@ Coverage Diff @@
## main #311 +/- ##
==========================================
+ Coverage 57.15% 57.37% +0.21%
==========================================
Files 163 166 +3
Lines 7543 7706 +163
Branches 921 909 -12
==========================================
+ Hits 4311 4421 +110
- Misses 2967 3019 +52
- Partials 265 266 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
becae03
to
3a88bc4
Compare
@jotak: This pull request references NETOBSERV-1099 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I revisited this old PR, I guess we could take a stab at it |
making 'merge' as the new default reporter is a bit tricky since it's only for table view, so we need to deal with 2 defaults, and being able to switch to default when user changes tabs UNLESS they explicitly choosed Source or Destination from the overview/topology tab, in which case probably we want to keep that selection when moving to Table view ... 😵💫 or ... more simply, keep reporter-for-table and reporter-for-metrics as two distinct states - hence it wouldn't be applied when moving from Topo/Overview to Table or vice-versa If someone as a better suggestion, speak :-) |
What is missing to make 'merged' works with metrics ? As a user POV, I feel 'merged' should be the unique way. If you are interested in particular source or destination, just apply any filter. |
I'm afraid this just won't be possible with the data model that we have: segregating flows by source and dest IP is necessary for merging. (Like done here). Metrics are aggregated e.g. per namespaces, this isn't a sufficient level of details to dedup. To take an example, imagine these flows:
Here, flows 1 and 2 are the same and will be deduped, by taking only the "ingress" ones (first to come). Flow 3 has a different dest IP so it will be kept as well. So F1 and F3 remain. With metrics, such as aggregated by namespace, you'd get:
Applying the same logic would keep only M1, which actually doesn't integrate F3 values, so the metrics would be incorrect. => you cannot just pick one and forget the other. The Ingress metric would miss some flows that are only in Egress, and vice-versa.
I agree, except perhaps we'd also want to be able to see BOTH (unmerged) |
@jpinsonneau I don't see an easy way to solve that. Dedup done from the agents or FLP are limited because they don't have the full-nodes vision. Dedup done from the UI / loki queries, as in this PR, work only for raw flows, not for aggregated metrics. Perhaps that could be done from an intermediate job, able to have the full-nodes vision AND to rewrite flows by editing the "dedup" label. But that would increase the overall complexity of netobserv, not talking about performances, so not sure if it's worth it. |
(thinking out loud)
The downside is, the more we do assumptions, the more "vulnerable" we are to CNI specific behaviours. |
I think this is going well: so I ended up removing entirely "Reporter", trading it for a simpler approach of "show duplicates: yes/no": In the table view, this option allows to hide all duplicate traffic; two algorithms come into play:
In the topology/metrics views, this option is not available (flows are always deduped - to not mess up with the rate counters); however the solution to dedup differs since the algorithm used in the table cannot be used here, as it requires a higher level of details (with IP) that aggregate queries don't provide. There's still two algorithms in play:
Users who want a specific reporter / flow direction can still use the appropriate filter |
pkg/model/filters/filters.go
Outdated
// Merging is done by running a first query with FlowDirection=EGRESS and another with FlowDirection=INGRESS AND SrcOwnerName is empty. | ||
// (Note that we use SrcOwnerName both as an optimization as it's a Loki index, | ||
// and as convenience because looking for empty fields won't work if they aren't indexed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpinsonneau I'm not totally sure if this would work with conversation or if we need to skip - or do something else - in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm never mind - topology & metrics always use raw flows, not conversations, and this code is only called from topology endpoint ... so that's fine I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics queries can be based on conversations when available.
As Both FlowDirection
and SrcOwnerName
fields are available in conversations, the approch is fine.
However:
- FlowDirection is not reinterpreted in conversations
- SrcOwnerName can actually be flipped by the swapAB option depending on TCP flags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so .. this brings us back to swapAB needing to switch direction.. I'll open a separate bug about that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jotak: This pull request references NETOBSERV-1099 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
New image: It will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=5aad792 make set-plugin-image |
"Every flow can be reported from the source node and/or the destination node. For in-cluster traffic, usually both source and destination nodes report flows, resulting in duplicated data. Cluster ingress traffic is only reported by destination nodes, and cluster egress by source nodes.": "Every flow can be reported from the source node and/or the destination node. For in-cluster traffic, usually both source and destination nodes report flows, resulting in duplicated data. Cluster ingress traffic is only reported by destination nodes, and cluster egress by source nodes.", | ||
"Reporter node": "Reporter node", | ||
"Only available in Table view.": "Only available in Table view.", | ||
"A flow might be reported from several interfaces, and from both the source and the destination nodes, making it appear several times. By default, duplicates are hidden. Showing duplicates is not possible in Overview and Topology tabs.": "A flow might be reported from several interfaces, and from both the source and the destination nodes, making it appear several times. By default, duplicates are hidden. Showing duplicates is not possible in Overview and Topology tabs.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"A flow might be reported from several interfaces, and from both the source and the destination nodes, making it appear several times. By default, duplicates are hidden. Showing duplicates is not possible in Overview and Topology tabs.": "A flow might be reported from several interfaces, and from both the source and the destination nodes, making it appear several times. By default, duplicates are hidden. Showing duplicates is not possible in Overview and Topology tabs.", | |
"A flow might be reported from several interfaces, and from both source and destination nodes, making it appear several times. By default, duplicates are hidden. Showing duplicates is not possible in Overview and Topology tabs.": "A flow might be reported from several interfaces, and from both source and destination nodes, making it appear several times. By default, duplicates are hidden. Showing duplicates is not possible in Overview and Topology tabs.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd appreciate that :D |
New changes are detected. LGTM label has been removed. |
/ok-to-test |
New image: It will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=d6b7cbf make set-plugin-image |
@jotak - In terms of testing this, here I made sure by default only "Duplicate: false" show in table view and with check box both kind show and checkbox is disabled for both overview/topology page. Let me know if there's any other scenarios to test here, otherwise consider this |
Thanks @memodi EDIT: to be more accurate: the scenario that this is fixing is described in NETOBSERV-697 ("A possible solution is to get all flows and deduplicate. Ideally, information on what nodes the flows have traversed should be preserved. If done right, the Reporter node option can be removed.") ; but for testing you can run the steps described in NETOBSERV-696 :-) EDIT 2: Also as I wrote on slack, you saw another limitation of the unmerged reporters when looking at traffic to services - merged reporters solve that, ie. you should see traffic from/to services without having to switch between ingress/egress |
The average metrics seems to be different between main (in dark theme on the left) and your PR (light theme on the right) The tables looks good using reporter both / showing duplicates |
export const mergeFlowReporters = (flows: Record[]): Record[] => { | ||
// The purpose of this function is to determine if, for a given [srcip, dstip] couple, we'll look at INGRESS or EGRESS reporter | ||
// The assumption is that INGRESS alone, or EGRESS alone, or both of them, always provide a complete visiblity, so we can just pick one of the two. | ||
// The logic if then to index flows by src+dest ips, then for each indexed set, keep only the first-found reporter | ||
const keyFunc = (r: Record) => r.fields.SrcAddr + '+' + r.fields.DstAddr; | ||
const grouped = _.groupBy(flows, keyFunc); | ||
const filtersIndex = _.mapValues(grouped, (recs: Record[]) => recs[0].labels.FlowDirection); | ||
return flows.filter((r: Record) => r.labels.FlowDirection === filtersIndex[keyFunc(r)]); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open a ticket and I will investigate post-merge? There might be some discrepancies related to having sampling, but I need to confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add a new reporter option to merge reported flows (in fact, just filter out one of the reporters per src/dest key) - Add test Replace "reporter" setting with "show duplicates" - In table view, instead of choosing a reporter (or both / merged) the user can show duplicates ( = previously reporter=both) or hide them ( = the new "merged" reporter + filtering with Duplicate=false) - In metrics, showing duplicates is not allowed - If the user wants so, it's still possible to filter with FlowDirection using the regular filters - In metrics, the dedup algorithm differs from the table view: it's a specific query to get EGRESS + cluster-external INGRESS Split API calls for topology and drops As they result in quite different data model and processing, explicit distinct routes allows to better separate concerns and avoid frequent casts. Issues with merge reporter for drops Fetch Ingress in priority, seems to work better, however there might be more to do as the assumption done for merged reporters doesn't seem relevant for drops Improve input validation Check more query parameters, especially those that end up injected such as rate interval Fix: no dedup for drops Add validation tests
ack, thanks @jotak , I will try these |
(outdated capture)
New capture: