Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCTP Abort on new connections from ENB. MME stops processing. #13115

Closed
bhuvaneshne opened this issue Jun 29, 2022 · 7 comments
Closed

SCTP Abort on new connections from ENB. MME stops processing. #13115

bhuvaneshne opened this issue Jun 29, 2022 · 7 comments
Labels
type: bug Something isn't working

Comments

@bhuvaneshne
Copy link
Contributor

Your Environment

  • Version:: Currently using v 1.6.1 (We are yet to upgrade to 1.7.X)
  • Affected Component:: Access Gateway, or NMS
  • Affected Subcomponent: MME, SCTPD
  • Deployment Environment: bare-metal (AGW)

Describe the Issue

In our setup, there are about 40+ ENBs connected to a single AGW. In some certain rare occasions, MME would stop processing packets. SCTP Aborts are seen on the wire.

To Reproduce
The issue was getting reproduced rather frequently lately in the lab. The latest additions to the network were scrutinized. The latest addition- A "certain brand" of ENB was sending a spikes of S1AP Partial Resets at constant intervals. We modified open source simulator to send spikes of of S1AP partial resets which then reproduced similar behavior.

Expected behavior
AGW should not stop processing packets.

Additional context
Behavior of MME and SCTPD executables when issue occurs:
1. MME stops processing packets.
2. SCTPD GRPC errors are seen in syslog (GRPC timeout).
3. SCTP Aborts are seen on wire on new connections after the above two happens.

@bhuvaneshne bhuvaneshne added the type: bug Something isn't working label Jun 29, 2022
@bhuvaneshne
Copy link
Contributor Author

bhuvaneshne commented Jun 29, 2022

Root cause analysis:
Added instrumentation code which added metadata - An integer assigned to the first stage in MME processing (GRPC receive). This metadata is carried across different stages in pipeline (Different MME Tasks). Also modified logs to print the thread ID. Logs showed that the GRPC send downlink was was stopping processing after sometime when S1AP partial reset bursts are sent. The other tasks stops processing soon after (Maybe due to ZMQ buffers getting filled).
Next, we added additional logs at SCTP Send DL processing. sctp_sendmsg was not returning control.

@bhuvaneshne
Copy link
Contributor Author

Proposed solution:
Instead of waiting indefinitely, add a timeout to sctp_sendmsg

Before fix: sctp_connection.cpp
image

After proposed fix:
image

Caveat:
Have not looked into why the sctp_sendmsg was waiting indefinitely.

Please let us know if the fix looks good.

@ushivahs
Copy link

ushivahs commented Jun 29, 2022

s1ap_test.zip
PFA which contains the simulator used for the test.
Usage is : ./s1ap_test mcc mnc tac pps srcIp dstIp
./s1ap_test 315 010 3f 50 192.168.56.93 192.168.56.9

Note: 1) Please give value of tac in hex.
2) pps is the rate at which partial reset has to be sent.

@ssanadhya
Copy link
Collaborator

ssanadhya commented Jun 29, 2022

@bhuvaneshne, @ushivahs , thanks for the detailed description of the issue! Adding timeout makes sense to me. Please go ahead and raise a PR with this fix.

bhuvaneshne added a commit to bhuvaneshne/magma that referenced this issue Jun 30, 2022
…g resolved. magma#13115

Signed-off-by: bhuvaneshne <bhuvaneshne@highway9networks.com>
@bhuvaneshne
Copy link
Contributor Author

Thanks @ssanadhya , I have issued a pull request.

bhuvaneshne added a commit to bhuvaneshne/magma that referenced this issue Jul 1, 2022
…g resolved. magma#13115

Signed-off-by: bhuvaneshne <bhuvaneshne@highway9networks.com>
bhuvaneshne added a commit to bhuvaneshne/magma that referenced this issue Jul 2, 2022
…g resolved. magma#13115

Signed-off-by: bhuvaneshne <bhuvaneshne@highway9networks.com>
bhuvaneshne added a commit to bhuvaneshne/magma that referenced this issue Jul 6, 2022
…g resolved. magma#13115

Signed-off-by: bhuvaneshne <bhuvaneshne@highway9networks.com>
bhuvaneshne added a commit to bhuvaneshne/magma that referenced this issue Jul 18, 2022
…g resolved. magma#13115

Signed-off-by: bhuvaneshne <bhuvaneshne@highway9networks.com>
bhuvaneshne added a commit to bhuvaneshne/magma that referenced this issue Jul 19, 2022
…g resolved. magma#13115

Signed-off-by: bhuvaneshne <bhuvaneshne@highway9networks.com>
bhuvaneshne added a commit to bhuvaneshne/magma that referenced this issue Aug 16, 2022
…g resolved. magma#13115

Signed-off-by: bhuvaneshne <bhuvaneshne@highway9networks.com>
bhuvaneshne added a commit to bhuvaneshne/magma that referenced this issue Aug 17, 2022
…g resolved. magma#13115

Signed-off-by: bhuvaneshne <bhuvaneshne@highway9networks.com>
ssanadhya pushed a commit that referenced this issue Aug 17, 2022
…ndmsg (#13146)

* fix(agw): SCTP Abort on new connections from ENB. MME stops processing resolved.  #13115

Signed-off-by: bhuvaneshne <bhuvaneshne@highway9networks.com>
MagmaCIBot pushed a commit that referenced this issue Aug 17, 2022
…ndmsg (#13146)

* fix(agw): SCTP Abort on new connections from ENB. MME stops processing resolved.  #13115

Signed-off-by: bhuvaneshne <bhuvaneshne@highway9networks.com>
(cherry picked from commit 6506c85)
maxhbr pushed a commit that referenced this issue Aug 17, 2022
…ndmsg (#13146) (#13644)

* fix(agw): SCTP Abort on new connections from ENB. MME stops processing resolved.  #13115

Signed-off-by: bhuvaneshne <bhuvaneshne@highway9networks.com>
(cherry picked from commit 6506c85)

Co-authored-by: Bhuvanesh <97009526+bhuvaneshne@users.noreply.github.com>
@bhuvaneshne
Copy link
Contributor Author

@ssanadhya : We can close this issue. The code is checked in.

@ssanadhya
Copy link
Collaborator

Fixed in #13146

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants