Fix division by zero crash when doing metering #529

amarsri28 · 2022-02-28T14:20:49Z

this one resolve #376 . we can test this patch with Metering stress test . Memory corruption(caused by stl container append) was happening as internal container address was referred and used.
commits were reverted to resolved issue (original code seems working fine)

we dont need workaround #385

This reverts commit 103b198.

This reverts commit d2c0793.

pudelkoM · 2022-02-28T22:31:28Z

This is just a revert of both the profile optimization (d2c0793) and the following workaround, right? Seems safe to me, the old code worked before.

Do we have a test case showing the crash right now, and that it's fixed after this change?

ccascone

I'm fine with the change. I'm temporarily leaving a -1 since we have a release happening this week and we want to avoid last-minute disruptions. We can merge next week.

amarsri28 · 2022-03-02T17:23:40Z

@ccascone I am not sure the purpose and process of your Release, but metering functionality is broken (ue, app level etc) so if the purpose is to give bug free release along with Qos-Metering feature(latest code set), can consider this patch for release testing.
Furthermore push it for production testing asap.
@pudelkoM use #528 for testing,
after 20 secs (time defined by #528 ) we can observe absurd metering with 13mpps in below figure where as 1 mpps is the correct rate which ll come for first 20secs using this test case(tested with latest cloned code).

amarsri28 · 2022-03-02T17:30:42Z

below is the correct metering rate ,tested by #528 , come for 20 secs as defined in testcase with latest cloned code

pudelkoM · 2022-03-02T19:56:36Z

From our slack discussion:

My understanding is as follows: Sai’s patch added an indirection on the meter profile (rates + more config) by coalescing identical meter configs and storing them in a hash map (params_map_). Before, each meter entry had its profile inline, now it’s a pointer to the shared instance. DPDK hash maps are not not stable on inserts:

As mentioned above, Cuckoo hash implementation pushes elements out of their bucket, if there is a new entry to be added which primary location coincides with their current bucket

https://doc.dpdk.org/guides/prog_guide/hash_lib.html

This means pointers to previously looked up data become invalidated on inserts and then point to random data (could be zero or just garbage). This causes the fluctuating rates, as the actual profile (with cir, pir, …) has been moved into a different bucket, but the pointer used in the data path (val[j]->p) was not updated.

Carmelo's workaround relied on the coincidence that that memory seemed to get cleared, hence showed 0 rates.

Abseil’s explanation of pointer stability in hash maps: https://abseil.io/docs/cpp/guides/container#fn:pointer-stability

…project#376

ccascone · 2022-03-02T20:19:00Z

@amarsri28 apologies, I originally misunderstood the issue and fix. After discussing with @pudelkoM (thanks for the explanation) we agree that this should be merged before the release.

BTW, I was talking about the Aether release. We are currently in the process of validating the Aether 2.0 release (due very soon) using automated system-level integration tests:
https://docs.aetherproject.org/master/testing/about_system_tests.html

Here the "system" is Aether, which includes many components other than the UPF and core. We have tests that verify the integration with the fabric (SD-Fabric), the management GUI/API (ROC), etc. We do this for both 4G and 5G core. So lots of moving pieces. Debugging of failed tests is mostly manual for now and it currently takes a lot of time. For this reason, as we approach the release, we tend to be careful in merging new changes to avoid introducing unnecessary disruption that might distract us from other existing bugs.

I hope this gives you some perspective on our processes.

amarsri28 · 2022-06-14T07:01:48Z

just stating facts as it caused confusion over dpdk hash table reliability to our team members.
Global reference to c++ container memory were taken and as container expand , it may reallocate memory etc as a part of reallocation process, which has caused issue.

amarsri28 added 2 commits February 28, 2022 05:31

Revert "Workaround for division by zero crash (omec-project#385)"

511ea57

This reverts commit 103b198.

Revert "QoS: Use params to profile map, carry pointer"

3b20dde

This reverts commit d2c0793.

amarsri28 mentioned this pull request Feb 28, 2022

Add PTF tests for MBR-based rate limiting #519

Merged

3 tasks

amarsri28 requested a review from ccascone February 28, 2022 16:33

ccascone changed the title ~~Bess metering division by zero crash #376~~ Fix division by zero crash when doing metering Feb 28, 2022

ccascone requested a review from pudelkoM February 28, 2022 22:12

ccascone requested changes Mar 2, 2022

View reviewed changes

pudelkoM approved these changes Mar 2, 2022

View reviewed changes

Merge branch 'master' into Bess-Metering-division-by-zero-crash-omec-…

fb7d971

…project#376

ccascone approved these changes Mar 2, 2022

View reviewed changes

ccascone merged commit 9cb736e into omec-project:master Mar 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix division by zero crash when doing metering #529

Fix division by zero crash when doing metering #529

amarsri28 commented Feb 28, 2022

pudelkoM commented Feb 28, 2022

ccascone left a comment

amarsri28 commented Mar 2, 2022 •

edited

Loading

amarsri28 commented Mar 2, 2022

pudelkoM commented Mar 2, 2022 •

edited

Loading

ccascone commented Mar 2, 2022 •

edited

Loading

amarsri28 commented Jun 14, 2022

Fix division by zero crash when doing metering #529

Fix division by zero crash when doing metering #529

Conversation

amarsri28 commented Feb 28, 2022

pudelkoM commented Feb 28, 2022

ccascone left a comment

Choose a reason for hiding this comment

amarsri28 commented Mar 2, 2022 • edited Loading

amarsri28 commented Mar 2, 2022

pudelkoM commented Mar 2, 2022 • edited Loading

ccascone commented Mar 2, 2022 • edited Loading

amarsri28 commented Jun 14, 2022

amarsri28 commented Mar 2, 2022 •

edited

Loading

pudelkoM commented Mar 2, 2022 •

edited

Loading

ccascone commented Mar 2, 2022 •

edited

Loading