Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes unique stale socket issue with LTE devices #1666

Merged
merged 1 commit into from Jan 5, 2019

Conversation

@technobly
Copy link
Member

commented Jan 4, 2019

Problem

A unique situation on LTE devices appears when cellular coverage drops briefly and the device remains unreachable until reset.

If cellular coverage drops and comes back just as the cloud socket close operation is executing. The observed issue is that the cloud socket handle on the modem is attempted to be closed, but at the time the modem is unable to process the request. The socket is freed from the lookup table in RAM and when the system attempts to re-connect to the cloud, a new socket is requested and the modem returns handle 1 on port 5684. 0 is still open on port 5684 and the modem apparently doesn't mind having two sockets open on the same port. Data is sent on handle 1 and expected back on handle 1, but it's being received on handle 0 by the modem. This state persists until the modem is reset.

Solution

Cleanup stale sockets whenever creating new sockets on cellular devices.

Check for any stale handles that are open on the modem but not currently associated with a socket. These may occur after power cycling the STM32 with modem connected or if a previous socket was not closed cleanly. socketFree() will unconditionally free the socket even if the handle doesn't close on the modem.

Steps to Test

It is difficult to reproduce this error, so one way to force it is to replace MDMParser::socketSocket() with the following code. On cold boot this will open and free socket handle 0 before moving on and opening socket handle 1 for the cloud connection. The device will attempt to talk to the cloud on handle 1 but gets a response on 0. The system will teardown the network and try again, and when it does the stale socket 0 is cleaned up. DNS occurs on socket handle 0 and closed… then the cloud socket is opened again on 0 and works fine from there on out.

int MDMParser::socketSocket(IpProtocol ipproto, int port)
{
    int socket;
    LOCK();

    if (!_attached) {
        if (!reconnect()) {
            socket = MDM_SOCKET_ERROR;
        }
    }

    if (_attached) {
        // Check for any stale handles that are open on the modem but not
        // currently associated with a socket.
        // These may occur after power cycling the STM32 with modem connected
        // or if a previous socket was not closed cleanly.
        if (_socketCloseUnusedHandles())
        {
            DEBUG_D("%s: closed stale socket handle(s)\r\n", __func__);
        }

        // START TEST CODE
        static bool once = false;
        if (!once && port == 5684) {
            once = true;
            // find an free socket
            socket = _findSocket(MDM_SOCKET_ERROR);
            DEBUG_D("socketSocket(%s)\r\n", (ipproto?"UDP":"TCP"));
            if (socket != MDM_SOCKET_ERROR) {
                int _socket = _socketSocket(socket, ipproto, port);
                if (_socket != MDM_SOCKET_ERROR) {
                    socket = _socket;
                    _socketFree(socket); // free without closing
                }
            }
        } // END TEST CODE

        // find an free socket
        socket = _findSocket(MDM_SOCKET_ERROR);
        DEBUG_D("socketSocket(%s)\r\n", (ipproto?"UDP":"TCP"));
        if (socket != MDM_SOCKET_ERROR) {
            int _socket = _socketSocket(socket, ipproto, port);
            if (_socket != MDM_SOCKET_ERROR) {
                socket = _socket;
            }
        }
    }
    UNLOCK();
    return socket;
}

Completeness

  • User is totes amazing for contributing!
  • Contributor has signed CLA (Info here)
  • Problem and Solution clearly stated
  • Run unit/integration/application tests on device
  • (N/A) Added documentation
  • Added to CHANGELOG.md after merging (add links to docs and issues)

@technobly technobly added the bug label Jan 4, 2019

@technobly technobly added this to the 0.8.0 milestone Jan 4, 2019

@technobly technobly merged commit 0f3d3cf into develop Jan 5, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@technobly technobly deleted the fix/lte-socket-cleanup branch Jan 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.