Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMF] Fix crash when gNB disconnects in middle of SBI client transaction #2134

Merged
merged 1 commit into from
Mar 23, 2023

Conversation

bmeglic
Copy link
Contributor

@bmeglic bmeglic commented Mar 6, 2023

In case that gNB abruptly disconnects right in the middle of SBI client transaction (AMF sends a SBI request to another NF, and waits for response), AMF would crash because the 'ran_ue' context is already removed from the 'amf_ue' context.

The issue can easily be reproduced with following patch to simulate heavy load on UDM:

diff --git a/src/udm/nudr-handler.c b/src/udm/nudr-handler.c
index 466ea0b6e..2eefc97ac 100644
--- a/src/udm/nudr-handler.c
+++ b/src/udm/nudr-handler.c
@@ -624,6 +624,9 @@ bool udm_nudr_dr_handle_subscription_provisioned(
 
         response = ogs_sbi_build_response(&sendmsg, recvmsg->res_status);
         ogs_assert(response);
+
+        ogs_msleep(4000);
+
         ogs_assert(true == ogs_sbi_server_send_response(stream, response));
 
         OpenAPI_access_and_mobility_subscription_data_free(

- Start 5G core with patch for UDM
- Run UERANSIM UE
- While the registration procedure is stuck at UDM in the `ogs_msleep()`, stop the gNB.
// UE sends Registration Request, AMF sends SBI requests to AUSF,UDM,...
...
// gNB disconnects
amf  | 03/01 12:51:59.275: [amf] DEBUG: amf_state_operational(): AMF_EVENT_NGAP_LO_CONNREFUSED (../src/amf/amf-sm.c:78)
amf  | 03/01 12:51:59.275: [amf] INFO: gNB-N2[192.168.92.9] connection refused!!! (../src/amf/amf-sm.c:778)
amf  | 03/01 12:51:59.275: [amf] INFO: [Removed] Number of gNB-UEs is now xxx (../src/amf/context.c:2334)
...
amf  | 03/01 12:51:59.597: [amf] DEBUG: ngap_state_operational(): EXIT (../src/amf/ngap-sm.c:55)
amf  | 03/01 12:51:59.598: [amf] DEBUG: ngap_state_final(): EXIT (../src/amf/ngap-sm.c:37)
amf  | 03/01 12:51:59.598: [amf] INFO: [Removed] Number of gNBs is now 0 (../src/amf/context.c:1061)
...
// SBI response is now received at AMF, processed, and another SBI request is sent to UDM
amf  | 03/01 12:51:59.598: [amf] DEBUG: amf_state_operational(): OGS_EVENT_NAME_SBI_CLIENT (../src/amf/amf-sm.c:78)
amf  | 03/01 12:51:59.599: [gmm] DEBUG: gmm_state_initial_context_setup(): OGS_EVENT_NAME_SBI_CLIENT (../src/amf/gmm-sm.c:1720)
amf  | 03/01 12:51:59.599: [sbi] DEBUG: [REF] 500 (../lib/sbi/path.c:212)
amf  | 03/01 12:51:59.599: [sbi] DEBUG: [GET] http://10.15.199.92:29503/nudm-sdm/v2/imsi-001010000000248/am-data (../lib/sbi/client.c:656)
...
// SBI response from UDM is now processed
amf  | 03/01 12:51:59.834: [amf] DEBUG: amf_state_operational(): OGS_EVENT_NAME_SBI_CLIENT (../src/amf/amf-sm.c:78)
amf  | 03/01 12:51:59.834: [gmm] DEBUG: gmm_state_initial_context_setup(): OGS_EVENT_NAME_SBI_CLIENT (../src/amf/gmm-sm.c:1720)
amf  | 03/01 12:51:59.834: [amf] FATAL: amf_ue_rat_type: Assertion `ran_ue' failed. (../src/amf/context.c:1837)

@bmeglic
Copy link
Contributor Author

bmeglic commented Mar 6, 2023

There is another approach to solve this issue, by checking if RAN UE context is already removed, before dispatching any SBI client events to GMM SM. But not sure of any possible side effects of this generalized approach. What do you guys think?

diff --git a/src/amf/amf-sm.c b/src/amf/amf-sm.c
index ca1f9c93a..f673c402a 100644
--- a/src/amf/amf-sm.c
+++ b/src/amf/amf-sm.c
@@ -383,6 +383,12 @@ void amf_state_operational(ogs_fsm_t *s, amf_event_t *e)
                 break;
             }
 
+            ran_ue = ran_ue_cycle(amf_ue->ran_ue);
+            if (!ran_ue) {
+                ogs_error("NG context has already been removed");
+                break;
+            }
+
             ogs_assert(OGS_FSM_STATE(&amf_ue->sm));
 
             e->amf_ue = amf_ue;

@acetcom
Copy link
Member

acetcom commented Mar 10, 2023

@bmeglicit

It would be nice to be able to locate where these issues are occurring.

We suggest the following locations:

diff --git a/src/amf/sbi-path.c b/src/amf/sbi-path.c
index 4c31012f4..237724d25 100644
--- a/src/amf/sbi-path.c
+++ b/src/amf/sbi-path.c
@@ -95,10 +95,16 @@ int amf_ue_sbi_discover_and_send(
     int r;
     int rv;
     ogs_sbi_xact_t *xact = NULL;
+    ran_ue_t *ran_ue = NULL;
 
     ogs_assert(service_type);
-    ogs_assert(amf_ue);
     ogs_assert(build);
+    ogs_assert(amf_ue);
+    ran_ue = ran_ue_cycle(amf_ue->ran_ue);
+    if (!ran_ue) {
+        ogs_error("NG context has already been removed");
+        return OGS_NOTFOUND;
+    }
 
     xact = ogs_sbi_xact_add(
             &amf_ue->sbi, service_type, discovery_option,

Please let me know if you have any idea.

Thanks a lot!
Sukchan

@bmeglic
Copy link
Contributor Author

bmeglic commented Mar 14, 2023

Hi,

This issue is encountered in the function amf_nudm_sdm_handle_provisioned, when amf_ue_is_rat_restricted is called just before the amf_ue_sbi_discover_and_send.
The original commit (the one that modifies the function amf_ue_rat_type is thus still required, unless we found some other solution.

Regards
Bostjan

@bmeglic bmeglic marked this pull request as draft March 14, 2023 07:00
@bmeglic bmeglic force-pushed the fix_amf_crash_between_sbi_call branch from 8fca36f to efd7ff9 Compare March 14, 2023 07:34
@bmeglic bmeglic marked this pull request as ready for review March 14, 2023 08:16
@acetcom
Copy link
Member

acetcom commented Mar 14, 2023

@bmeglicit

Even so, I don't think it is a good idea to add a check routine directly to amf_ue_is_rat_restricted().

How about the following modification?

$ diff --git a/src/amf/nudm-handler.c b/src/amf/nudm-handler.c
index 71f3f0b9b..4774fd13c 100644
--- a/src/amf/nudm-handler.c
+++ b/src/amf/nudm-handler.c
@@ -32,6 +32,12 @@ int amf_nudm_sdm_handle_provisioned(
 
     SWITCH(recvmsg->h.resource.component[1])
     CASE(OGS_SBI_RESOURCE_NAME_AM_DATA)
+        ran_ue_t *ran_ue = ran_ue_cycle(amf_ue->ran_ue);
+        if (!ran_ue) {
+            ogs_error("NG context has already been removed");
+            return OGS_ERROR;
+        }
+
         if (recvmsg->AccessAndMobilitySubscriptionData) {
             OpenAPI_list_t *gpsiList =
                 recvmsg->AccessAndMobilitySubscriptionData->gpsis;

You can add this check routine to all the function that uses amf_ue_is_rat_restricted() if needed.

  • gmm_handle_registration_request()
  • amf_namf_callback_handle_sdm_data_change_notify()

Please let me know if you have any different idea.

Thanks a lot!
Sukchan

@bmeglic bmeglic force-pushed the fix_amf_crash_between_sbi_call branch from efd7ff9 to 23bdf51 Compare March 20, 2023 12:09
@bmeglic
Copy link
Contributor Author

bmeglic commented Mar 20, 2023

  • did a refactor of amf_namf_callback_handle_sdm_data_change_notify(), to safely handle all possible scenarios (RAN UE already removed / CM_IDLE / CM_CONNECTED);
  • needed to add ran_ue_cycle() in a few places, wherever amf_ue_rat_type() is called

@bmeglic bmeglic force-pushed the fix_amf_crash_between_sbi_call branch from 23bdf51 to e63471b Compare March 21, 2023 09:31
@acetcom acetcom merged commit df25013 into open5gs:main Mar 23, 2023
@acetcom
Copy link
Member

acetcom commented Mar 23, 2023

@bmeglicit

You made a great job.

Thank you so much!
Sukchan

@bmeglic
Copy link
Contributor Author

bmeglic commented Mar 23, 2023

Thank you!

@bmeglic bmeglic deleted the fix_amf_crash_between_sbi_call branch March 23, 2023 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants