s1ap_remove_ue #9902

sentry-io · 2021-10-26T19:53:52Z

SIGSEGV /SEGV_MAPERR: Fatal Error: SIGSEGV /SEGV_MAPERR
  File "s1ap_mme.c", line 561, in s1ap_remove_ue
  File "s1ap_mme_handlers.c", line 3871, in s1ap_mme_release_ue_context
  File "s1ap_mme_handlers.c", line 1365, in s1ap_mme_generate_ue_context_release_command
  File "s1ap_mme_handlers.c", line 1491, in s1ap_handle_ue_context_release_command
  File "s1ap_mme.c", line 216, in handle_message
...
(4 additional frame(s) were not displayed)

The text was updated successfully, but these errors were encountered:

ulaskozat · 2021-11-04T17:58:21Z

looks like assertion:

magma/lte/gateway/c/core/oai/tasks/s1ap/s1ap_mme.c

Line 561 in 6e6daff

DevAssert(enb_ref->nb_ue_associated > 0);

crbertoldo · 2022-06-21T19:17:14Z

Hi @ulaskozat @ssanadhya @rsarwad (ref: #12677 (comment)), we may have a similar issue on prod (v1.6) according to the below. It seems the MME had been crashing when a specific subscriber was being putted on idle state. Please let me know if you guys have some clue here.

Jun 21 07:26:01 4G-cluster-node-4 mme[789983]: LeakSanitizer:DEADLYSIGNAL
Jun 21 07:26:01 4G-cluster-node-4 mme[789983]: ==789983==ERROR: LeakSanitizer: SEGV on unknown address 0x000000000204 (pc 0x556099b8399c bp 0x7f80781da710 sp 0x7f80781da6e0 T23)
Jun 21 07:26:01 4G-cluster-node-4 mme[789983]: ==789983==The signal is caused by a READ memory access.
Jun 21 07:26:01 4G-cluster-node-4 mme[789983]: ==789983==Hint: address points to the zero page.
Jun 21 07:26:01 4G-cluster-node-4 sctpd[11738]: I0621 07:26:01.653967 11743 sctpd_downlink_impl.cpp:76] SctpdDownlinkImpl::SendDl starting
Jun 21 07:26:01 4G-cluster-node-4 sctpd[11738]: I0621 07:26:01.675810 11942 sctp_connection.cpp:167] HandleClientSock sd = 49
Jun 21 07:26:01 4G-cluster-node-4 sctpd[11738]: I0621 07:26:01.675860 11942 sctp_connection.cpp:220] [sd:49] msg of len 106 on 62:1
Jun 21 07:26:01 4G-cluster-node-4 mme[789983]:     #0 0x556099b8399c in s1ap_remove_ue /home/ubuntu/jenkins-agent/workspace/build/build_agw_packages_ubuntu/lte/gateway/c/core/oai/tasks/s1ap/s1ap_mme.c:561
Jun 21 07:26:01 4G-cluster-node-4 mme[789983]:     #1 0x556099bd70d4 in s1ap_mme_release_ue_context /home/ubuntu/jenkins-agent/workspace/build/build_agw_packages_ubuntu/lte/gateway/c/core/oai/tasks/s1ap/s1ap_mme_handlers.c:3879
Jun 21 07:26:01 4G-cluster-node-4 mme[789983]:     #2 0x556099bd73c0 in s1ap_mme_generate_ue_context_release_command /home/ubuntu/jenkins-agent/workspace/build/build_agw_packages_ubuntu/lte/gateway/c/core/oai/tasks/s1ap/s1ap_mme_handlers.c:1365
Jun 21 07:26:01 4G-cluster-node-4 mme[789983]:     #3 0x556099bd747b in s1ap_handle_ue_context_release_command /home/ubuntu/jenkins-agent/workspace/build/build_agw_packages_ubuntu/lte/gateway/c/core/oai/tasks/s1ap/s1ap_mme_handlers.c:1491
Jun 21 07:26:01 4G-cluster-node-4 mme[789983]:     #4 0x556099b831d1 in handle_message /home/ubuntu/jenkins-agent/workspace/build/build_agw_packages_ubuntu/lte/gateway/c/core/oai/tasks/s1ap/s1ap_mme.c:216
Jun 21 07:26:01 4G-cluster-node-4 mme[789983]:     #5 0x7f80887db5c6 in zloop_start (/lib/x86_64-linux-gnu/libczmq.so.4+0x295c6)
Jun 21 07:26:01 4G-cluster-node-4 mme[789983]:     #6 0x556099b82ab8 in s1ap_mme_thread /home/ubuntu/jenkins-agent/workspace/build/build_agw_packages_ubuntu/lte/gateway/c/core/oai/tasks/s1ap/s1ap_mme.c:377
Jun 21 07:26:01 4G-cluster-node-4 mme[789983]:     #7 0x7f80893ce608 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x8608)
Jun 21 07:26:01 4G-cluster-node-4 mme[789983]:     #8 0x7f80881ab162 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x11f162)
Jun 21 07:26:01 4G-cluster-node-4 mme[789983]: LeakSanitizer can not provide additional info.
Jun 21 07:26:01 4G-cluster-node-4 mme[789983]: SUMMARY: LeakSanitizer: SEGV /home/ubuntu/jenkins-agent/workspace/build/build_agw_packages_ubuntu/lte/gateway/c/core/oai/tasks/s1ap/s1ap_mme.c:561 in s1ap_remove_ue
Jun 21 07:26:01 4G-cluster-node-4 mme[789983]: ==789983==ABORTING
Jun 21 07:26:01 4G-cluster-node-4 systemd[1]: magma@mme.service: Main process exited, code=exited, status=23/n/a

ssanadhya · 2022-06-27T17:52:09Z

@crbertoldo , could you please share the steps to reproduce this issue? The backtrace looks different from the one in #12677 .

crbertoldo · 2022-06-27T18:18:14Z

@ssanadhya unfortunately we don't know how to reproduce it. We had to flush redis / restart the AGW in order to recover the services when it was happening. Regarding 12677, it seems a bit similar but I think it doesn't bring any conclusion, so please disregard it.

alledpaiva · 2022-06-30T14:34:49Z

Follow the logs of MME and kernel as requested.
We have just the MME log that starts at 07:26:10... the first one that we have is this one.

Any doubts please, let me know.

Files:

kernel-log-06-21.txt
MME.4G-cluster-node-4.root.log.INFO.20220621-072610.791052.txt
MME.4G-cluster-node-4.root.log.INFO.20220621-072654.791847.txt
MME.4G-cluster-node-4.root.log.INFO.20220621-072855.793853.txt
.

rsarwad · 2022-07-06T07:36:26Z

Thanks @crbertoldo and @alledpaiva for sharing the logs.
Please find my analysis below

There are some UEs attached to network and moved to idle state. Due to some reason, mme restarts and mme fetches information from DB and re-establishes all contexts.
Then MME receives TAU Request in s1ap-Initial Ue Message and mme rejects it and send TAU Reject and Ue Context Release Command to s1ap. As part of handling Ue context release command message, s1ap sends s1ap-Ue Context Release Command to eNB, send UE context release complete message to mme_app and also frees the UE context.
Once mme_app receives the UE context release complete message, mme_app move the ue's ecm state to Idle.
But before mme_app has moved ue to Idle, mme_app receives another TAU-Request in another s1ap-Initial Ue Message,
While handling so, mme_app finds ue context holding valid s1ap_id_key, So before handling the new initial ue message mme_app first clears previous s1-signaling connection by sending Ue Context Release Command to s1ap. So s1ap clears the UE context. So here it clears the UE context allocated for second Initial UE message. At this point nb_ue_associated becomes zero.
As part of handling second TAU Request, mme_app again rejects it and send TAU Reject and Ue Context Release Command to s1ap. While handling Ue Context Release Command, s1ap try to remove UE, it hits the condition, DevAssert(enb_ref->nb_ue_associated > 0); and makes the mme to crash.

Attachment may be helpful for visualizing:

There was similar issue, #12114
So could you pull the PRs, #11795 and #12275 to v1.6
I am not able simulate this scenario locally. Could you please pull the above PRs and verify. Or could you share the system details where I can validate.

rsarwad · 2022-07-19T06:40:35Z

Hi @crbertoldo and @alledpaiva, were you able pull above mentioned PRs and test.

crbertoldo · 2022-08-18T13:31:14Z

Hi @rsarwad, sorry for the late reply. We haven't had a chance to do that yet but will do asap and let you know. Thanks a lot!

ulaskozat assigned ssanadhya Nov 4, 2021

stale bot added the wontfix This will not be worked on label Mar 9, 2022

stale bot closed this as completed Mar 16, 2022

ssanadhya reopened this Mar 16, 2022

stale bot removed the wontfix This will not be worked on label Mar 16, 2022

Neudrino added the sentry label Mar 30, 2022

magma deleted a comment from stale bot Mar 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s1ap_remove_ue #9902

s1ap_remove_ue #9902

sentry-io bot commented Oct 26, 2021

ulaskozat commented Nov 4, 2021

crbertoldo commented Jun 21, 2022

ssanadhya commented Jun 27, 2022

crbertoldo commented Jun 27, 2022

alledpaiva commented Jun 30, 2022

rsarwad commented Jul 6, 2022 •

edited

rsarwad commented Jul 19, 2022

crbertoldo commented Aug 18, 2022

s1ap_remove_ue #9902

s1ap_remove_ue #9902

Comments

sentry-io bot commented Oct 26, 2021

ulaskozat commented Nov 4, 2021

crbertoldo commented Jun 21, 2022

ssanadhya commented Jun 27, 2022

crbertoldo commented Jun 27, 2022

alledpaiva commented Jun 30, 2022

rsarwad commented Jul 6, 2022 • edited

rsarwad commented Jul 19, 2022

crbertoldo commented Aug 18, 2022

rsarwad commented Jul 6, 2022 •

edited