Skip to content

Conversation

@ronensc
Copy link
Contributor

@ronensc ronensc commented Aug 16, 2022

This PR adds a new type of output records to the connection tracking module: update report for long connections.

There's a new setting in the conntrack config UpdateConnectionInterval to set the amount of time to wait between update reports.

This PR also extends the connectionStore data structure from an ordered map to a multi-ordered map. The actual implementation of the data structure were extracted to a new file which simplifies connectionStore. This extension was needed to optimize the search for connections that require update report - rather than going through all the connections and inspecting their nextUpdateReport field, we keep them ordered by this field (in addition to the order by expiry time) and going through only those that require update report.

There is still 1 issue with the suggested implementation:

The scan for connections that need update report is done for every incoming batch of flow logs. This could be a problem when the incoming batch frequency is too high. A possible solution would be to scan once in a while. We can set the scan interval to be 50% of UpdateConnectionInterval.
Note: The same problem exists with end connections

Additional refactors and minor changes:

  1. Instead of storing the lastUpdate of a connection, we store the expiryTime (which is lastUpdate + ConnectionTimeout). Changing the word "update" reduces confusion with the new feature of update reports. In my opinion, this also makes the expiry condition more understandable.
  2. connectionStore was extracted to a file of its own.
  3. require.Equal() was replaced with require.JSONEq() in pkg/config/pipeline_builder_test.go

@codecov-commenter
Copy link

codecov-commenter commented Aug 21, 2022

Codecov Report

Merging #287 (f72140c) into main (009a086) will increase coverage by 0.51%.
The diff coverage is 88.62%.

@@            Coverage Diff             @@
##             main     #287      +/-   ##
==========================================
+ Coverage   67.39%   67.90%   +0.51%     
==========================================
  Files          73       75       +2     
  Lines        4278     4381     +103     
==========================================
+ Hits         2883     2975      +92     
- Misses       1212     1219       +7     
- Partials      183      187       +4     
Flag Coverage Δ
unittests 67.90% <88.62%> (+0.51%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/pipeline/extract/conntrack/store.go 61.22% <61.22%> (ø)
pkg/api/conntrack.go 87.00% <100.00%> (+0.13%) ⬆️
pkg/pipeline/extract/conntrack/conn.go 89.28% <100.00%> (+1.28%) ⬆️
pkg/pipeline/extract/conntrack/conntrack.go 92.00% <100.00%> (+2.97%) ⬆️
pkg/pipeline/utils/multiorderedmap.go 100.00% <100.00%> (ø)
pkg/pipeline/ingest/ingest_collector.go 49.62% <0.00%> (+1.50%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@ronensc ronensc marked this pull request as ready for review August 21, 2022 07:27
docs/api.md Outdated
prefix: prefix added to each metric name
expiryTime: seconds of no-flow to wait before deleting prometheus data item
tls: TLS configuration for the prometheus endpoint
enable: set to true to enable tls for the prometheus endpoint
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is removed, should not be connected to this PR???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That field was changed from struct to struct ptr and the api-to-doc script doesn't handle such types at the moment.
I've submitted a different PR to solve this: #288

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That PR is merged so I rebased this one.

@@ -0,0 +1,135 @@
/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ronensc this feels like a generic helper utility, can you move that to utils ??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@eranra eranra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ronensc can you update the README file ... currently it is still talking on the old implementation

KeysFrom(fl, ct.config.KeyDefinition).
Aggregators(ct.aggregators).
Build()
conn.setNextUpdateReportTime(ct.clock.Now().Add(ct.config.UpdateConnectionInterval.Duration))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to add this as another build step instead of explicitaly calling the function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ronensc ronensc mentioned this pull request Aug 23, 2022
14 tasks
@ronensc ronensc force-pushed the conntrack-update-records branch from 60a5f27 to f72140c Compare August 25, 2022 09:05
if _, found := mom.GetRecord(key); !found {
return fmt.Errorf("can't MoveToBack non-existing key %x (order id %q)", key, orderID)
}
rw := mom.m[key]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't GetRecord() already give the element?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetRecord() indeed returns the element. But, here were interested in the element's wrapper.
I'll remove the call to GetRecord() to make it clearer.

assertLengthConsistency(t, mom)
}

func TestMultiOrderedMap_IterateFrontToBackIterateFrontToBack(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the name IterateFrontToBack doubled intentionally or is it a copy-paste error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's an error. good catch!

Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I guess it's a step toward the option to entirely remove raw flowlogs from storage in the future.

At some point I'd like to measure the performance impact of using conntracking:

  • on FLP (I guess more memory used, more compute)
  • on Loki if we don't store raw flowlogs, we save a lot of space

@ronensc
Copy link
Contributor Author

ronensc commented Sep 7, 2022

  • on Loki if we don't store raw flowlogs, we save a lot of space

@jotak In theory, this is correct.
But, it depends on the characteristics of the connections in the cluster and the output records settings of the connection tracking module.
i.e. If most of the connections are short (starts and ends in the same flow-log) and the connection tracking is set to output both new connection and end connection records,
then, we'll end up with 2 records in loki for each connection, rather than 1 in the current settings without conntrack.

@ronensc ronensc merged commit fd10c83 into netobserv:main Sep 7, 2022
@ronensc ronensc deleted the conntrack-update-records branch September 7, 2022 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants