NETOBSERV-1268: handle concurrency issues between kernel and userspace #172

msherif1234 · 2023-08-29T14:04:57Z

with using global hmap we increased the window where both userspace and kernel can collide to we can't usebpf_spin_lock() because tracepoints doesn't allow it so we have to revert back to use perCPU map :(

testing

verified unaccuracy is now more accurate with sampling of 1
verified PktDrop still working
Verified DNS tracking
measure resources impact going back to perCPU hmap
Signed-off-by: msherif1234 mmahmoud@redhat.com

openshift-ci-robot · 2023-08-29T14:05:01Z

@msherif1234: This pull request references NETOBSERV-1268 which is a valid jira issue.

In response to this:

with using global hmap we increased the window where both userspace and kernel can collide to avoid that we create a copy of the map and update it then write it back instead of updating in place
Signed-off-by: msherif1234 mmahmoud@redhat.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

msherif1234 · 2023-08-29T14:05:17Z

/ok-to-test

github-actions · 2023-08-29T14:06:48Z

New image:
quay.io/netobserv/netobserv-ebpf-agent:f162e4b

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=f162e4b make set-agent-image

codecov · 2023-08-29T14:10:35Z

Codecov Report

Patch coverage: 60.71% and project coverage change: +0.64% 🎉

Comparison is base (6d9d2e7) 38.60% compared to head (4b956f4) 39.24%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #172      +/-   ##
==========================================
+ Coverage   38.60%   39.24%   +0.64%     
==========================================
  Files          31       31              
  Lines        2259     2301      +42     
==========================================
+ Hits          872      903      +31     
- Misses       1338     1346       +8     
- Partials       49       52       +3

Flag	Coverage Δ
unittests	`39.24% <60.71%> (+0.64%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
pkg/agent/agent.go	`37.84% <ø> (ø)`
pkg/ebpf/tracer.go	`0.00% <0.00%> (ø)`
pkg/flow/record.go	`60.65% <57.14%> (-2.99%)`	⬇️
pkg/flow/tracer_map.go	`78.57% <73.33%> (-1.43%)`	⬇️
pkg/flow/account.go	`83.33% <100.00%> (+0.64%)`	⬆️
pkg/test/tracer_fake.go	`68.96% <100.00%> (ø)`

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.

📢 Have feedback on the report? Share it here.

jotak · 2023-08-30T09:30:33Z

@msherif1234 now that I see this concurrency issue, I am having serious doubts about having removed the perCpu maps. You assumed concurrency issues between kernel and userspace but is'nt it actually concurrency issues between the different cpu that could occur here? Especially as we know that the same 5-tuples / flow_id can be processed by several cpus. Isn't it something that the previous design, with per-cpu maps, was actually solving? (all maps being dedicated to a core, and the merge being done safely in the user space)

jotak · 2023-08-30T09:41:51Z

I think this confirms my doubts, the problem isn't solved: I still get low bytes counter on the chunks, except for the last one which is more consistent: :

(this is for a 300MB download, you can see only 60MB is captured)

This totally makes sense if there is a concurrency issue between cores:

During downloads, several cores are updating the map concurrently, each overwriting what the other captures
When all cores but one have finished the processing, concurrency issues disappear hence everything is correctly reported, that's why the last chunk is always bigger.

jotak · 2023-08-30T10:09:33Z

To test:

Install netobserv/FlowCollector with sampling=1
oc get pods -n netobserv and choose a FLP pod (or any pod that has curl)
run oc exec -it <pod that has curl> -- curl https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/30/Cloud/x86_64/images/Fedora-Cloud-Base-Vagrant-30-1.2.x86_64.vagrant-libvirt.box --output /tmp/test (this is a random 300MB image)
open console at https://<your cluster>/netflow-traffic?timeRange=300&limit=50&match=all&showDup=false&function=last&type=bytes&packetLoss=all&recordType=flowLog&filters=src_owner_name%3D%22%22%3Bdst_namespace%3D%22netobserv%22&bnf=false

You should see the flows similar to my screenshot above.
If you have all in one flow (no chunk), try with a bigger image, or maybe try --rate-limit with curl.

msherif1234 · 2023-08-30T12:50:39Z

/ok-to-test

github-actions · 2023-08-30T12:51:56Z

New image:
quay.io/netobserv/netobserv-ebpf-agent:b32ef06

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=b32ef06 make set-agent-image

msherif1234 · 2023-08-31T00:17:08Z

/ok-to-test

github-actions · 2023-08-31T00:18:21Z

New image:
quay.io/netobserv/netobserv-ebpf-agent:8f969c1

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=8f969c1 make set-agent-image

jotak · 2023-08-31T07:47:05Z

fyi I forgot to mention in my test steps above: Install netobserv/FlowCollector with sampling=1 (above comment updated)

openshift-ci-robot · 2023-08-31T13:31:18Z

@msherif1234: This pull request references NETOBSERV-1268 which is a valid jira issue.

In response to this:

with using global hmap we increased the window where both userspace and kernel can collide to we can't usebpf_spin_lock() because tracepoints doesn't allow it so we have to revert back to use perCPU map :(
Signed-off-by: msherif1234 mmahmoud@redhat.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

msherif1234 · 2023-08-31T13:46:08Z

/ok-to-test

github-actions · 2023-08-31T13:47:40Z

New image:
quay.io/netobserv/netobserv-ebpf-agent:79e1ce2

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=79e1ce2 make set-agent-image

msherif1234 · 2023-08-31T13:51:03Z

cc'd @tohojo as discussed offline we have no way to ensure concurrency using global hmap now its shared with tracepoints hook so we have to fall back to use perCPU hmap

jpinsonneau · 2023-09-04T12:27:51Z

I'm also seeing issues on bytes count. Steps to reproduce:

set sampling 1
do a curl between two httpd pods using:

sh-4.4$ curl --compressed -so /dev/null 10.129.0.36:8080 -w '%{size_download}'
4927 // this is the expected size

see the amount of bytes in flow table

Using this PR I get 140 whereas without it I had 5485
Edit: verified on 1.3 & main and was working fine

jpinsonneau · 2023-09-05T13:14:07Z

pkg/flow/tracer_map.go

+			// eBPF hashmap values are not zeroed when the entry is removed. That causes that we
+			// might receive entries from previous collect-eviction timeslots.
+			// We need to check the flow time and discard old flows.
+			if mt.StartMonoTimeTs <= m.lastEvictionNs || mt.EndMonoTimeTs <= m.lastEvictionNs {


Suggested change

if mt.StartMonoTimeTs <= m.lastEvictionNs || mt.EndMonoTimeTs <= m.lastEvictionNs {

if mt.EndMonoTimeTs <= m.lastEvictionNs {

Can you please explain why we need to compare Start and End times here ?

It seems both curl on httpd page & ubuntu image download works fine when removing the compare on start 😸

Can you please explain why we need to compare Start and End times here ?

It seems both curl on httpd page & ubuntu image download works fine when removing the compare on start 😸

that code was just a revert and that is what 1.3 has where u don't see the issue correct ? I would think comparing start > eviction time is sufficient, doing endTime is more of extra safety since we can have an old TS from prev collection

@msherif1234 could it be due to this code block being removed: (as said here #172 (comment) ) ?

// it might happen that start_mono_time hasn't been set due to // the way percpu hashmap deal with concurrent map entries if (aggregate_flow->start_mono_time_ts == 0) { aggregate_flow->start_mono_time_ts = current_time; }

It seems like sometimes start_mono_time_ts is 0, which would indeed make the condition mt.StartMonoTimeTs <= m.lastEvictionNs be true ...

yeah that is the proof that same flow won't always hit the same cpu core so we had to add this trick unfortuntely revert couldn't bring that code and I forget about it thanks for catching it now I tested both large and small curl and they are pretty accurate

msherif1234 · 2023-09-05T14:51:30Z

/ok-to-test

github-actions · 2023-09-05T14:52:56Z

New image:
quay.io/netobserv/netobserv-ebpf-agent:8c253eb

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=8c253eb make set-agent-image

jotak · 2023-09-06T09:02:21Z

bpf/utils.h

@@ -253,12 +253,13 @@ static inline long pkt_drop_lookup_and_update_flow(struct sk_buff *skb, flow_id
                                                   enum skb_drop_reason reason) {
     flow_metrics *aggregate_flow = bpf_map_lookup_elem(&aggregated_flows, id);
     if (aggregate_flow != NULL) {
+         aggregate_flow->end_mono_time_ts = bpf_ktime_get_ns();


Previously, there was some specific handling of empty start time here:

// it might happen that start_mono_time hasn't been set due to // the way percpu hashmap deal with concurrent map entries if (aggregate_flow->start_mono_time_ts == 0) { aggregate_flow->start_mono_time_ts = current_time; }

cf also c696bb0

But I admit I don't really understand this old patch... I don't see in which cases start_mono_time_ts could not be set.

@joel this drop handling function

but u are right that part is missing in flows.c the revert didn't bring it back

jotak · 2023-09-06T09:17:03Z

pkg/agent/agent_test.go

-	ebpfTracer.AppendLookupResults(map[ebpf.BpfFlowId]*ebpf.BpfFlowMetrics{
-		key1: &key1Metrics,
+	ebpfTracer.AppendLookupResults(map[ebpf.BpfFlowId][]ebpf.BpfFlowMetrics{
+		key1: key1Metrics,


This isn't testing the dedup, we should restore this test as it was previously:

key1Metrics := []ebpf.BpfFlowMetrics{ {Packets: 3, Bytes: 44, StartMonoTimeTs: now + 1000, EndMonoTimeTs: now + 1_000_000_000}, {Packets: 1, Bytes: 22, StartMonoTimeTs: now, EndMonoTimeTs: now + 3000}, } key2Metrics := []ebpf.BpfFlowMetrics{ {Packets: 7, Bytes: 33, StartMonoTimeTs: now, EndMonoTimeTs: now + 2_000_000_000}, } ebpfTracer.AppendLookupResults(map[ebpf.BpfFlowId][]ebpf.BpfFlowMetrics{ key1: key1Metrics, key1Dupe: key1Metrics, key2: key2Metrics, })

.. and adapt the expectations in the related tests (dedup test expects 2 results, dedup-just-mark test expects 3 results with one flagged duplicate, no-dedup expects 3 results)

…netobserv#118)" This reverts commit b6e2b87. fix Signed-off-by: msherif1234 <mmahmoud@redhat.com>

msherif1234 · 2023-09-06T10:49:19Z

/ok-to-test

github-actions · 2023-09-06T10:51:04Z

New image:
quay.io/netobserv/netobserv-ebpf-agent:399209a

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=399209a make set-agent-image

jotak · 2023-09-06T12:36:17Z

not tested - code-wise this looks good to me, now we must check if @jpinsonneau 's test is satisfied

msherif1234 · 2023-09-06T12:49:56Z

I did number of small curls b2b and they are consistent

jpinsonneau · 2023-09-06T12:52:25Z

Yeah I confirm both small curl / 2GB download scenarios works fine 👍
Thanks guys !

jpinsonneau · 2023-09-06T12:52:54Z

/lgtm

msherif1234 · 2023-09-06T12:57:01Z

@memodi @Amoghrd can u pls sanity check this PR and make sure the new features still working @dushyantbehl can u pls check RTT functionality with this PR

Amoghrd · 2023-09-07T17:53:52Z

Sanity checked PacketDrop with this PR and everything looks good; works as expected!

jotak · 2023-09-08T08:00:31Z

thanks @msherif1234 @Amoghrd @jpinsonneau !
/approve

openshift-ci · 2023-09-08T08:00:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jotak]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot added the jira/valid-reference label Aug 29, 2023

openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Aug 29, 2023

jotak requested a review from dushyantbehl August 30, 2023 09:31

msherif1234 force-pushed the fix_concurrency branch from 335f76c to 5733308 Compare August 30, 2023 12:48

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Aug 30, 2023

openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Aug 30, 2023

msherif1234 force-pushed the fix_concurrency branch from 5733308 to 5ab0fb5 Compare August 31, 2023 00:16

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Aug 31, 2023

openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Aug 31, 2023

msherif1234 force-pushed the fix_concurrency branch from 5ab0fb5 to 2633601 Compare August 31, 2023 13:29

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Aug 31, 2023

msherif1234 force-pushed the fix_concurrency branch from 2633601 to af8822f Compare August 31, 2023 13:39

openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Aug 31, 2023

msherif1234 force-pushed the fix_concurrency branch from af8822f to 7ba7fe1 Compare August 31, 2023 14:14

jpinsonneau reviewed Sep 5, 2023

View reviewed changes

msherif1234 force-pushed the fix_concurrency branch from 5518398 to 62dd76f Compare September 5, 2023 14:51

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 5, 2023

openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 5, 2023

jotak reviewed Sep 6, 2023

View reviewed changes

Revert "change aggregation flow map to hashmap instead perCPU hashmap (…

4b956f4

…netobserv#118)" This reverts commit b6e2b87. fix Signed-off-by: msherif1234 <mmahmoud@redhat.com>

msherif1234 force-pushed the fix_concurrency branch from 62dd76f to 4b956f4 Compare September 6, 2023 10:48

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 6, 2023

openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 6, 2023

msherif1234 requested review from jpinsonneau, jotak and dushyantbehl September 6, 2023 11:31

openshift-ci bot assigned jpinsonneau Sep 6, 2023

openshift-ci bot added the lgtm label Sep 6, 2023

openshift-ci bot added the approved label Sep 8, 2023

openshift-merge-robot merged commit faf274e into netobserv:main Sep 8, 2023
10 checks passed

	if mt.StartMonoTimeTs <= m.lastEvictionNs \|\| mt.EndMonoTimeTs <= m.lastEvictionNs {
	if mt.EndMonoTimeTs <= m.lastEvictionNs {

NETOBSERV-1268: handle concurrency issues between kernel and userspace #172

NETOBSERV-1268: handle concurrency issues between kernel and userspace #172

Conversation

msherif1234 commented Aug 29, 2023 • edited

openshift-ci-robot commented Aug 29, 2023 • edited by openshift-ci bot

msherif1234 commented Aug 29, 2023

github-actions bot commented Aug 29, 2023

codecov bot commented Aug 29, 2023 • edited

Codecov Report

jotak commented Aug 30, 2023

jotak commented Aug 30, 2023

jotak commented Aug 30, 2023 • edited

msherif1234 commented Aug 30, 2023

github-actions bot commented Aug 30, 2023

msherif1234 commented Aug 31, 2023

github-actions bot commented Aug 31, 2023

jotak commented Aug 31, 2023

openshift-ci-robot commented Aug 31, 2023 • edited by openshift-ci bot

msherif1234 commented Aug 31, 2023

github-actions bot commented Aug 31, 2023

msherif1234 commented Aug 31, 2023

jpinsonneau commented Sep 4, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msherif1234 Sep 5, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msherif1234 commented Sep 5, 2023

github-actions bot commented Sep 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msherif1234 commented Sep 6, 2023

github-actions bot commented Sep 6, 2023

jotak commented Sep 6, 2023

msherif1234 commented Sep 6, 2023

jpinsonneau commented Sep 6, 2023

jpinsonneau commented Sep 6, 2023

msherif1234 commented Sep 6, 2023

Amoghrd commented Sep 7, 2023

jotak commented Sep 8, 2023

openshift-ci bot commented Sep 8, 2023

msherif1234 commented Aug 29, 2023 •

edited

openshift-ci-robot commented Aug 29, 2023 •

edited by openshift-ci bot

codecov bot commented Aug 29, 2023 •

edited

jotak commented Aug 30, 2023 •

edited

openshift-ci-robot commented Aug 31, 2023 •

edited by openshift-ci bot

jpinsonneau commented Sep 4, 2023 •

edited

msherif1234 Sep 5, 2023 •

edited