-
Notifications
You must be signed in to change notification settings - Fork 34
conntrack: Add update reports for long connections #287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #287 +/- ##
==========================================
+ Coverage 67.39% 67.90% +0.51%
==========================================
Files 73 75 +2
Lines 4278 4381 +103
==========================================
+ Hits 2883 2975 +92
- Misses 1212 1219 +7
- Partials 183 187 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
docs/api.md
Outdated
| prefix: prefix added to each metric name | ||
| expiryTime: seconds of no-flow to wait before deleting prometheus data item | ||
| tls: TLS configuration for the prometheus endpoint | ||
| enable: set to true to enable tls for the prometheus endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this is removed, should not be connected to this PR???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That field was changed from struct to struct ptr and the api-to-doc script doesn't handle such types at the moment.
I've submitted a different PR to solve this: #288
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That PR is merged so I rebased this one.
| @@ -0,0 +1,135 @@ | |||
| /* | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ronensc this feels like a generic helper utility, can you move that to utils ??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
eranra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ronensc can you update the README file ... currently it is still talking on the old implementation
| KeysFrom(fl, ct.config.KeyDefinition). | ||
| Aggregators(ct.aggregators). | ||
| Build() | ||
| conn.setNextUpdateReportTime(ct.clock.Now().Add(ct.config.UpdateConnectionInterval.Duration)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to add this as another build step instead of explicitaly calling the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This makes the expiry condition more understandable
60a5f27 to
f72140c
Compare
| if _, found := mom.GetRecord(key); !found { | ||
| return fmt.Errorf("can't MoveToBack non-existing key %x (order id %q)", key, orderID) | ||
| } | ||
| rw := mom.m[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't GetRecord() already give the element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetRecord() indeed returns the element. But, here were interested in the element's wrapper.
I'll remove the call to GetRecord() to make it clearer.
| assertLengthConsistency(t, mom) | ||
| } | ||
|
|
||
| func TestMultiOrderedMap_IterateFrontToBackIterateFrontToBack(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the name IterateFrontToBack doubled intentionally or is it a copy-paste error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's an error. good catch!
jotak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I guess it's a step toward the option to entirely remove raw flowlogs from storage in the future.
At some point I'd like to measure the performance impact of using conntracking:
- on FLP (I guess more memory used, more compute)
- on Loki if we don't store raw flowlogs, we save a lot of space
@jotak In theory, this is correct. |
This PR adds a new type of output records to the connection tracking module: update report for long connections.
There's a new setting in the conntrack config
UpdateConnectionIntervalto set the amount of time to wait between update reports.This PR also extends the connectionStore data structure from an ordered map to a multi-ordered map. The actual implementation of the data structure were extracted to a new file which simplifies
connectionStore. This extension was needed to optimize the search for connections that require update report - rather than going through all the connections and inspecting theirnextUpdateReportfield, we keep them ordered by this field (in addition to the order by expiry time) and going through only those that require update report.There is still 1 issue with the suggested implementation:
The scan for connections that need update report is done for every incoming batch of flow logs. This could be a problem when the incoming batch frequency is too high. A possible solution would be to scan once in a while. We can set the scan interval to be 50% of
UpdateConnectionInterval.Note: The same problem exists with end connections
Additional refactors and minor changes:
lastUpdateof a connection, we store theexpiryTime(which islastUpdate + ConnectionTimeout). Changing the word "update" reduces confusion with the new feature of update reports. In my opinion, this also makes the expiry condition more understandable.connectionStorewas extracted to a file of its own.require.Equal()was replaced withrequire.JSONEq()inpkg/config/pipeline_builder_test.go