Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore failing //lte/gateway/c/core/oai/test/mme_app_task:mme_procedures_test with --config=asan #11955

Closed
Tracked by #9714 ...
themarwhal opened this issue Mar 4, 2022 · 11 comments · Fixed by #11966 or #12141
Closed
Tracked by #9714 ...
Assignees

Comments

@themarwhal
Copy link
Member

themarwhal commented Mar 4, 2022

  • Passes with --config=lsan and without --config.
  • With --config=asan we get the logging:
TASK_MME_APP terminated
lte/gateway/c/core/oai/test/mme_app_task/test_mme_procedures.cpp:2435: Failure
Actual function call count doesn't match EXPECT_CALL(*s1ap_handler, s1ap_mme_handle_handover_command( check_params_in_mme_app_handover_command( mme_ue_s1ap_id, new_enb_id)))...
         Expected: to be called once
           Actual: never called - unsatisfied and active
lte/gateway/c/core/oai/test/mme_app_task/test_mme_procedures.cpp:2424: Failure
Actual function call count doesn't match EXPECT_CALL(*s1ap_handler, s1ap_mme_handle_handover_request( check_params_in_mme_app_handover_request( mme_ue_s1ap_id, 0)))...
         Expected: to be called once
           Actual: never called - unsatisfied and active

@themarwhal
Copy link
Member Author

Thanks @LKreutzer !

@themarwhal
Copy link
Member Author

Actually, I seem to still see this test on master :o @LKreutzer are you seeing this too? (I can't tell if it's just flaky or not though)

@LKreutzer LKreutzer reopened this Mar 8, 2022
@LKreutzer
Copy link
Contributor

@themarwhal sorry about the confusion! Yes, the PR #11966 fixes only one asan error in the mme_procedures_test, but there are others that remain, which we have not been able to fix so far.

@themarwhal
Copy link
Member Author

got it! Thanks ;D

@LKreutzer
Copy link
Contributor

The remaining errors are in the TestFailedPagingForPendingBearers and TestS1HandoverSuccess tests.

It seems that for both tests the errors originate from a race condition on the EXPECT_CALL macros. In each test the two EXPECT_CALL macros seem to be running in parallel threads. We found that often only one of the MATCHER_P2 macros is called. The behaviour seems to be undefined.

From the google mock doc "Important note: Google Mock requires expectations to be set before the mock functions are called, otherwise the behavior is undefined. In particular, you mustn't interleave EXPECT_CALL()s and calls to the mock functions." It might be that these EXPECT_CALL macros interleave in these cases.

We experimented with adding testing::Mock::VerifyAndClearExpectations(s1ap_handler.get()); or .InSequence(seq) but so far without success.

@ssanadhya
Copy link
Collaborator

@pruthvihebbani , could you please look at these failures? Several of these are stemming from the fact that EXPECT_CALL is registered after calling the function triggering the event in the EXPECT_CALL.

@electronjoe
Copy link
Member

Reproduction

  • If you have access to GitHub Codespaces (you should if you are a member of GH Magma)
  • Go over to my Bazel prototype branch
  • Click on the Green colored Code drop down button
  • Select the Codespaces toggle
  • Select New Codespace and when asked select 16 core (why not!)
  • This will spin up a GitHub codespace that contains my branch
  • Once the codespace is up (vs code in browswer) in the terminal window type bazel test //lte/gateway/c/core/oai/test/mme_app_task:mme_procedures_test --config=asan and hit enter. This triggers a bazel build of all necessary components to run the test, then runs the test. You will see ASAN failures.

@LKreutzer
Copy link
Contributor

FYI moving the EXPECT_CALLs in the TestS1HandoverSuccess and TestFailedPagingForPendingBearers tests to the earliest possible location we find that they only FAILED in 10 out of 1000 runs with bazel instead of them mostly failing e.g. FAILED in 624 out of 1000 bazel runs - which seems to support the idea that there is a race condition.

@pruthvihebbani pruthvihebbani linked a pull request Mar 16, 2022 that will close this issue
@ssanadhya
Copy link
Collaborator

@LKreutzer , could you please redo the above analysis now that #12141 is merged?

@LKreutzer
Copy link
Contributor

@ssanadhya @themarwhal Findings regarding the flakiness (on the current master) (#12166):

  • Running the tests 1000 times without asan or lsan with
    bazel test //lte/gateway/c/core/oai/test/mme_app_task:mme_procedures_test --runs_per_test=1000
    results in FAILED in 126 out of 1000 runs (instead of FAILED in 20 out of 100 before this change).

  • Running the tests 100 times with asan
    bazel test //lte/gateway/c/core/oai/test/mme_app_task:mme_procedures_test --config=asan --runs_per_test=100
    results in FAILED in 24 out of 100 runs.

Errors still related to the EXPECT_CALLs, but now for a number of different TEST_F.

@ssanadhya
Copy link
Collaborator

@LKreutzer , thanks for doing the analysis. Let's continue the discussion on #12166 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants