-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix division by zero crash when doing metering #529
Fix division by zero crash when doing metering #529
Conversation
This is just a revert of both the profile optimization (d2c0793) and the following workaround, right? Seems safe to me, the old code worked before. Do we have a test case showing the crash right now, and that it's fixed after this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the change. I'm temporarily leaving a -1 since we have a release happening this week and we want to avoid last-minute disruptions. We can merge next week.
@ccascone I am not sure the purpose and process of your Release, but metering functionality is broken (ue, app level etc) so if the purpose is to give bug free release along with Qos-Metering feature(latest code set), can consider this patch for release testing. |
below is the correct metering rate ,tested by #528 , come for 20 secs as defined in testcase with latest cloned code |
From our slack discussion: My understanding is as follows: Sai’s patch added an indirection on the meter profile (rates + more config) by coalescing identical meter configs and storing them in a hash map (
https://doc.dpdk.org/guides/prog_guide/hash_lib.html This means pointers to previously looked up data become invalidated on inserts and then point to random data (could be zero or just garbage). This causes the fluctuating rates, as the actual profile (with cir, pir, …) has been moved into a different bucket, but the pointer used in the data path ( Carmelo's workaround relied on the coincidence that that memory seemed to get cleared, hence showed 0 rates. Abseil’s explanation of pointer stability in hash maps: https://abseil.io/docs/cpp/guides/container#fn:pointer-stability |
@amarsri28 apologies, I originally misunderstood the issue and fix. After discussing with @pudelkoM (thanks for the explanation) we agree that this should be merged before the release. BTW, I was talking about the Aether release. We are currently in the process of validating the Aether 2.0 release (due very soon) using automated system-level integration tests: Here the "system" is Aether, which includes many components other than the UPF and core. We have tests that verify the integration with the fabric (SD-Fabric), the management GUI/API (ROC), etc. We do this for both 4G and 5G core. So lots of moving pieces. Debugging of failed tests is mostly manual for now and it currently takes a lot of time. For this reason, as we approach the release, we tend to be careful in merging new changes to avoid introducing unnecessary disruption that might distract us from other existing bugs. I hope this gives you some perspective on our processes. |
just stating facts as it caused confusion over dpdk hash table reliability to our team members. |
this one resolve #376 . we can test this patch with Metering stress test . Memory corruption(caused by stl container append) was happening as internal container address was referred and used.
commits were reverted to resolved issue (original code seems working fine)
we dont need workaround #385