Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-1268: handle concurrency issues between kernel and userspace #172

Merged
merged 1 commit into from
Sep 8, 2023

Conversation

msherif1234
Copy link
Contributor

@msherif1234 msherif1234 commented Aug 29, 2023

with using global hmap we increased the window where both userspace and kernel can collide to we can't usebpf_spin_lock() because tracepoints doesn't allow it so we have to revert back to use perCPU map :(

testing

  • verified unaccuracy is now more accurate with sampling of 1
  • verified PktDrop still working
  • Verified DNS tracking
  • measure resources impact going back to perCPU hmap
    Signed-off-by: msherif1234 mmahmoud@redhat.com

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Aug 29, 2023

@msherif1234: This pull request references NETOBSERV-1268 which is a valid jira issue.

In response to this:

with using global hmap we increased the window where both userspace and kernel can collide to avoid that we create a copy of the map and update it then write it back instead of updating in place
Signed-off-by: msherif1234 mmahmoud@redhat.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Aug 29, 2023
@github-actions
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:f162e4b

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=f162e4b make set-agent-image

@codecov
Copy link

codecov bot commented Aug 29, 2023

Codecov Report

Patch coverage: 60.71% and project coverage change: +0.64% 🎉

Comparison is base (6d9d2e7) 38.60% compared to head (4b956f4) 39.24%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #172      +/-   ##
==========================================
+ Coverage   38.60%   39.24%   +0.64%     
==========================================
  Files          31       31              
  Lines        2259     2301      +42     
==========================================
+ Hits          872      903      +31     
- Misses       1338     1346       +8     
- Partials       49       52       +3     
Flag Coverage Δ
unittests 39.24% <60.71%> (+0.64%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
pkg/agent/agent.go 37.84% <ø> (ø)
pkg/ebpf/tracer.go 0.00% <0.00%> (ø)
pkg/flow/record.go 60.65% <57.14%> (-2.99%) ⬇️
pkg/flow/tracer_map.go 78.57% <73.33%> (-1.43%) ⬇️
pkg/flow/account.go 83.33% <100.00%> (+0.64%) ⬆️
pkg/test/tracer_fake.go 68.96% <100.00%> (ø)

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.

📢 Have feedback on the report? Share it here.

@jotak
Copy link
Member

jotak commented Aug 30, 2023

@msherif1234 now that I see this concurrency issue, I am having serious doubts about having removed the perCpu maps. You assumed concurrency issues between kernel and userspace but is'nt it actually concurrency issues between the different cpu that could occur here? Especially as we know that the same 5-tuples / flow_id can be processed by several cpus. Isn't it something that the previous design, with per-cpu maps, was actually solving? (all maps being dedicated to a core, and the merge being done safely in the user space)

@jotak
Copy link
Member

jotak commented Aug 30, 2023

I think this confirms my doubts, the problem isn't solved: I still get low bytes counter on the chunks, except for the last one which is more consistent: :
image
(this is for a 300MB download, you can see only 60MB is captured)

This totally makes sense if there is a concurrency issue between cores:

  • During downloads, several cores are updating the map concurrently, each overwriting what the other captures
  • When all cores but one have finished the processing, concurrency issues disappear hence everything is correctly reported, that's why the last chunk is always bigger.

@jotak
Copy link
Member

jotak commented Aug 30, 2023

To test:

  • Install netobserv/FlowCollector with sampling=1
  • oc get pods -n netobserv and choose a FLP pod (or any pod that has curl)
  • run oc exec -it <pod that has curl> -- curl https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/30/Cloud/x86_64/images/Fedora-Cloud-Base-Vagrant-30-1.2.x86_64.vagrant-libvirt.box --output /tmp/test (this is a random 300MB image)
  • open console at https://<your cluster>/netflow-traffic?timeRange=300&limit=50&match=all&showDup=false&function=last&type=bytes&packetLoss=all&recordType=flowLog&filters=src_owner_name%3D%22%22%3Bdst_namespace%3D%22netobserv%22&bnf=false

You should see the flows similar to my screenshot above.
If you have all in one flow (no chunk), try with a bigger image, or maybe try --rate-limit with curl.

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Aug 30, 2023
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Aug 30, 2023
@github-actions
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:b32ef06

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=b32ef06 make set-agent-image

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Aug 31, 2023
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Aug 31, 2023
@github-actions
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:8f969c1

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=8f969c1 make set-agent-image

@jotak
Copy link
Member

jotak commented Aug 31, 2023

fyi I forgot to mention in my test steps above: Install netobserv/FlowCollector with sampling=1 (above comment updated)

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Aug 31, 2023
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Aug 31, 2023

@msherif1234: This pull request references NETOBSERV-1268 which is a valid jira issue.

In response to this:

with using global hmap we increased the window where both userspace and kernel can collide to we can't usebpf_spin_lock() because tracepoints doesn't allow it so we have to revert back to use perCPU map :(
Signed-off-by: msherif1234 mmahmoud@redhat.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Aug 31, 2023
@github-actions
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:79e1ce2

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=79e1ce2 make set-agent-image

@msherif1234
Copy link
Contributor Author

cc'd @tohojo as discussed offline we have no way to ensure concurrency using global hmap now its shared with tracepoints hook so we have to fall back to use perCPU hmap

@jpinsonneau
Copy link
Contributor

jpinsonneau commented Sep 4, 2023

I'm also seeing issues on bytes count. Steps to reproduce:

  • set sampling 1
  • do a curl between two httpd pods using:
sh-4.4$ curl --compressed -so /dev/null 10.129.0.36:8080 -w '%{size_download}'
4927 // this is the expected size
  • see the amount of bytes in flow table

Using this PR I get 140 whereas without it I had 5485
Edit: verified on 1.3 & main and was working fine

// eBPF hashmap values are not zeroed when the entry is removed. That causes that we
// might receive entries from previous collect-eviction timeslots.
// We need to check the flow time and discard old flows.
if mt.StartMonoTimeTs <= m.lastEvictionNs || mt.EndMonoTimeTs <= m.lastEvictionNs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if mt.StartMonoTimeTs <= m.lastEvictionNs || mt.EndMonoTimeTs <= m.lastEvictionNs {
if mt.EndMonoTimeTs <= m.lastEvictionNs {

Can you please explain why we need to compare Start and End times here ?

It seems both curl on httpd page & ubuntu image download works fine when removing the compare on start 😸

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain why we need to compare Start and End times here ?

It seems both curl on httpd page & ubuntu image download works fine when removing the compare on start 😸

that code was just a revert and that is what 1.3 has where u don't see the issue correct ? I would think comparing start > eviction time is sufficient, doing endTime is more of extra safety since we can have an old TS from prev collection

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msherif1234 could it be due to this code block being removed: (as said here #172 (comment) ) ?

        // it might happen that start_mono_time hasn't been set due to
        // the way percpu hashmap deal with concurrent map entries
        if (aggregate_flow->start_mono_time_ts == 0) {
            aggregate_flow->start_mono_time_ts = current_time;
        }

It seems like sometimes start_mono_time_ts is 0, which would indeed make the condition mt.StartMonoTimeTs <= m.lastEvictionNs be true ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that is the proof that same flow won't always hit the same cpu core so we had to add this trick unfortuntely revert couldn't bring that code and I forget about it thanks for catching it now I tested both large and small curl and they are pretty accurate

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 5, 2023
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 5, 2023
@github-actions
Copy link

github-actions bot commented Sep 5, 2023

New image:
quay.io/netobserv/netobserv-ebpf-agent:8c253eb

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=8c253eb make set-agent-image

@@ -253,12 +253,13 @@ static inline long pkt_drop_lookup_and_update_flow(struct sk_buff *skb, flow_id
enum skb_drop_reason reason) {
flow_metrics *aggregate_flow = bpf_map_lookup_elem(&aggregated_flows, id);
if (aggregate_flow != NULL) {
aggregate_flow->end_mono_time_ts = bpf_ktime_get_ns();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, there was some specific handling of empty start time here:

        // it might happen that start_mono_time hasn't been set due to
        // the way percpu hashmap deal with concurrent map entries
        if (aggregate_flow->start_mono_time_ts == 0) {
            aggregate_flow->start_mono_time_ts = current_time;
        }

cf also c696bb0

But I admit I don't really understand this old patch... I don't see in which cases start_mono_time_ts could not be set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joel this drop handling function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but u are right that part is missing in flows.c the revert didn't bring it back

ebpfTracer.AppendLookupResults(map[ebpf.BpfFlowId]*ebpf.BpfFlowMetrics{
key1: &key1Metrics,
ebpfTracer.AppendLookupResults(map[ebpf.BpfFlowId][]ebpf.BpfFlowMetrics{
key1: key1Metrics,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't testing the dedup, we should restore this test as it was previously:

	key1Metrics := []ebpf.BpfFlowMetrics{
		{Packets: 3, Bytes: 44, StartMonoTimeTs: now + 1000, EndMonoTimeTs: now + 1_000_000_000},
		{Packets: 1, Bytes: 22, StartMonoTimeTs: now, EndMonoTimeTs: now + 3000},
	}
	key2Metrics := []ebpf.BpfFlowMetrics{
		{Packets: 7, Bytes: 33, StartMonoTimeTs: now, EndMonoTimeTs: now + 2_000_000_000},
	}

	ebpfTracer.AppendLookupResults(map[ebpf.BpfFlowId][]ebpf.BpfFlowMetrics{
		key1:     key1Metrics,
		key1Dupe: key1Metrics,
		key2:     key2Metrics,
	})

.. and adapt the expectations in the related tests (dedup test expects 2 results, dedup-just-mark test expects 3 results with one flagged duplicate, no-dedup expects 3 results)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

…netobserv#118)"

This reverts commit b6e2b87.

fix

Signed-off-by: msherif1234 <mmahmoud@redhat.com>
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 6, 2023
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 6, 2023
@github-actions
Copy link

github-actions bot commented Sep 6, 2023

New image:
quay.io/netobserv/netobserv-ebpf-agent:399209a

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=399209a make set-agent-image

@jotak
Copy link
Member

jotak commented Sep 6, 2023

not tested - code-wise this looks good to me, now we must check if @jpinsonneau 's test is satisfied

@msherif1234
Copy link
Contributor Author

I did number of small curls b2b and they are consistent
image

@jpinsonneau
Copy link
Contributor

Yeah I confirm both small curl / 2GB download scenarios works fine 👍
Thanks guys !

@jpinsonneau
Copy link
Contributor

/lgtm

@msherif1234
Copy link
Contributor Author

@memodi @Amoghrd can u pls sanity check this PR and make sure the new features still working @dushyantbehl can u pls check RTT functionality with this PR

@Amoghrd
Copy link

Amoghrd commented Sep 7, 2023

Sanity checked PacketDrop with this PR and everything looks good; works as expected!

@jotak
Copy link
Member

jotak commented Sep 8, 2023

thanks @msherif1234 @Amoghrd @jpinsonneau !
/approve

@openshift-ci
Copy link

openshift-ci bot commented Sep 8, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Sep 8, 2023
@openshift-merge-robot openshift-merge-robot merged commit faf274e into netobserv:main Sep 8, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved jira/valid-reference lgtm ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants