Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Electron] Workaround for modem HAL relying on system networking code to re-attempt initialization #2218

Merged
merged 6 commits into from
Oct 20, 2020

Conversation

avtolstoy
Copy link
Member

@avtolstoy avtolstoy commented Oct 14, 2020

Problem

There are two problems which have some common roots. If the power-on sequence on Electrons fails (and there are two stages in it: power on and SIM initialization, additional initialization), a specific set of actions needs to be taken by the system networking layer to get the modem HAL into a proper state.

  1. We are relying on a hack to get the system layer to restart poweron initialization when MDMParser::init() fails to initialize and requires a modem reset e.g. to set UMNO profile. This used to work somewhat until we've introduced reliable modem power off in one of the Sleep 2.0 PRs, which now causes the modem to be hard powercycled almost immediately after we've reset it through an AT command.
   219.818 AT send      14 "AT+CFUN=15,0\r\n"
   219.828 AT read OK    6 "\r\nOK\r\n"
0000219828 [system] INFO: Sim Ready
0000219829 [system] INFO: ARM_WLAN_WD 1
0000249833 [system] WARN: Resetting WLAN due to WLAN_WD_TO()
0000249833 [system] INFO: Clearing WLAN WD in disconnect()
0000249838 [system] INFO: Clearing WLAN WD in disconnect()
0000249843 [system] INFO: OFF 1
[ Modem::powerOff ] modem is on unexpectedly. Turning it off...

[ Modem::powerOff ] failed
[ ElectronSerialPipe::end ] pipeTx=0 pipeRx=0

[ Modem::resume ] = = = = = = = = = = = = = = =
[ ElectronSerialPipe::begin ] pipeTx=0 pipeRx=0

[ Modem::powerOn ] = = = = = = = = = = = = = =
   280.173 AT send       4 "AT\r\n"
   283.484 AT send       4 "AT\r\n"
  1. The second step in poweron (MDMParser::init()) was skipped whenever initial SIM initialization in MDMParser::powerOn() failed, because for some reason reinitialization is happening in has_credentials()
0000321278 [system] INFO: OFF 1
[ Modem::powerOff ] modem is on unexpectedly. Turning it off...

[ Modem::powerOff ] failed
[ ElectronSerialPipe::end ] pipeTx=0 pipeRx=0

[ Modem::resume ] = = = = = = = = = = = = = = =
[ ElectronSerialPipe::begin ] pipeTx=0 pipeRx=0

[ Modem::powerOn ] = = = = = = = = = = = = = =
   351.608 AT send       4 "AT\r\n"
   351.613 AT read UNK   3 "AT\r"
   351.614 AT read OK    6 "\r\nOK\r\n"
   351.614 AT send       9 "AT+CGMM\r\n"
   361.819 AT send       6 "ATE0\r\n"
0000371820 [system] INFO: SIM/modem not responsive or SIM not inserted/requires a PIN.

[ Modem::powerOn ] = = = = = = = = = = = = = =
   372.136 AT send       4 "AT\r\n"
   375.447 AT send       4 "AT\r\n"
   378.758 AT send       4 "AT\r\n"
   378.764 AT read UNK   3 "AT\r"
   378.765 AT read OK    6 "\r\nOK\r\n"
   378.765 AT send       9 "AT+CGMM\r\n"
   378.778 AT read UNK   8 "AT+CGMM\r"
   378.779 AT read UNK  18 "\r\nSARA-R410M-02B\r\n"
   378.781 AT read OK    6 "\r\nOK\r\n"
   378.985 AT send       6 "ATE0\r\n"
   378.992 AT read UNK   5 "ATE0\r"
   378.993 AT read OK    6 "\r\nOK\r\n"
   378.993 AT send      11 "AT+CMEE=2\r\n"
   379.003 AT read OK    6 "\r\nOK\r\n"
   379.003 AT send      19 "AT+CMER=1,0,0,2,1\r\n"
   379.012 AT read OK    6 "\r\nOK\r\n"
   379.012 AT send      15 "AT+IPR=115200\r\n"
   379.021 AT read OK    6 "\r\nOK\r\n"
   379.121 AT send      10 "AT+CPIN?\r\n"
   379.129 AT read  +   16 "\r\n+CPIN: READY\r\n"
   379.130 AT read OK    6 "\r\nOK\r\n"
0000379131 [system] INFO: Sim Ready
0000379135 [system] INFO: ARM_WLAN_WD 1
   379.138 AT send       4 "AT\r\n"
   379.148 AT read OK    6 "\r\nOK\r\n"

[ Modem::register ] = = = = = = = = = = = = = =
   379.150 AT send      10 "AT+CFUN?\r\n"
   379.162 AT read  +   12 "\r\n+CFUN: 1\r\n"
   379.163 AT read OK    6 "\r\nOK\r\n"
   379.164 AT send      12 "AT+CEREG=2\r\n"
   379.175 AT read OK    6 "\r\nOK\r\n"
   379.175 AT send      11 "AT+CEREG?\r\n"
   379.186 AT read  +   34 "\r\n+CEREG: 2,1,\"4118\",\"3376211\",8\r\n"
   379.188 AT read OK    6 "\r\nOK\r\n"
   379.190 AT send       4 "AT\r\n"
   379.200 AT read OK    6 "\r\nOK\r\n"
   379.200 AT send       9 "AT+CIMI\r\n"
   379.210 AT read UNK  19 "\r\n732123200003356\r\n"
   379.211 AT read OK    6 "\r\nOK\r\n"

(see https://app.clubhouse.io/particle/story/64669, https://app.clubhouse.io/particle/story/64668)

Solution

Add even more hacks! Ideally all these retries should be contained with the modem code and spread through multiple subsystems but this potentially brings some behavior changes, which we shouldn't be making so far along into LTS.

The main idea behind this PR is to check whether power-on succeeded and if not, rely on [network_connect()] (https://github.com/particle-iot/device-os/blob/develop/system/src/system_network_compat.cpp#L262) to trigger reinitialization.

Steps to Test

Here are a few patches to trigger a particular issue:

  1. MDMParser::init() failing, to emulate a case where we need a reset to set e.g. a correct UMNO profile:
diff --git a/hal/src/electron/modem/mdm_hal.cpp b/hal/src/electron/modem/mdm_hal.cpp
index f7c67672c..55edb6bb3 100644
--- a/hal/src/electron/modem/mdm_hal.cpp
+++ b/hal/src/electron/modem/mdm_hal.cpp
@@ -1160,6 +1160,9 @@ bool MDMParser::init(DevStatus* status)
             goto reset_failure;
         }
     } // if (_dev.dev == DEV_SARA_R410)
+#if 1
+    goto reset_failure;
+#endif // 1
     _resetFailureAttempts = 0;
     return true;

There should be 8 attempts made to run MDMParser::init() and after that the system should continue to attempt to register

  1. MDMParser::powerOn() failing due to SIM failure
diff --git a/hal/src/electron/modem/mdm_hal.cpp b/hal/src/electron/modem/mdm_hal.cpp
index f7d08ec8d..42c15be37 100644
--- a/hal/src/electron/modem/mdm_hal.cpp
+++ b/hal/src/electron/modem/mdm_hal.cpp
@@ -935,6 +935,13 @@ bool MDMParser::powerOn(const char* simpin)
     for (int i = 0; (i < 5) && (_dev.sim != SIM_READY) && !_cancel_all_operations; i++) {
         sendFormated("AT+CPIN?\r\n");
         int ret = waitFinalResp(_cbCPIN, &_dev.sim);
+#if 1
+        static int counter = 0;
+        if (counter++ <= 2) {
+            _dev.sim = SIM_MISSING;
+            goto failure;
+        }
+#endif // 1
         // having an error here is ok (sim may still be initializing)
         if ((RESP_OK != ret) && (RESP_ERROR != ret)) {
             goto failure;

This should trigger the system to enter listening mode as if the SIM card is not present and after 5 minutes (default listening mode timeout), the device should connect to the network.

  1. MDMParser::powerOn() failing due to SIM failure (similar to the previous one, but we fail only once)
diff --git a/hal/src/electron/modem/mdm_hal.cpp b/hal/src/electron/modem/mdm_hal.cpp
index f7d08ec8d..42c15be37 100644
--- a/hal/src/electron/modem/mdm_hal.cpp
+++ b/hal/src/electron/modem/mdm_hal.cpp
@@ -935,6 +935,13 @@ bool MDMParser::powerOn(const char* simpin)
     for (int i = 0; (i < 5) && (_dev.sim != SIM_READY) && !_cancel_all_operations; i++) {
         sendFormated("AT+CPIN?\r\n");
         int ret = waitFinalResp(_cbCPIN, &_dev.sim);
+#if 1
+        static int counter = 0;
+        if (counter++ <= 1) {
+            _dev.sim = SIM_MISSING;
+            goto failure;
+        }
+#endif // 1
         // having an error here is ok (sim may still be initializing)
         if ((RESP_OK != ret) && (RESP_ERROR != ret)) {
             goto failure;

The device should report SIM failure once and then go through MDMParser::init() and successfully connect.

  1. Same thing as 1, but with a simulated failure preventing us from completing initailization entirely
diff --git a/hal/src/electron/modem/mdm_hal.cpp b/hal/src/electron/modem/mdm_hal.cpp
index f7c67672c..9a14d6d5d 100644
--- a/hal/src/electron/modem/mdm_hal.cpp
+++ b/hal/src/electron/modem/mdm_hal.cpp
@@ -55,7 +55,7 @@ std::recursive_mutex mdm_mutex;
 #define MDM_URC_POLL_INTERVAL_MS (1000) //!< URC poll interval
 #define MDM_SOCKET_SEND_RETRIES_R4_BUG (3)
 #define MDM_RESET_FAILURE_TIMEOUT_MS (10000)
-#define MDM_RESET_FAILURE_MAX_ATTEMPTS (8)
+#define MDM_RESET_FAILURE_MAX_ATTEMPTS (9999)
 
 // ID of the PDP context used to configure the default EPS bearer when registering in an LTE network
 // Note: There are no PDP contexts in LTE, SARA-R4 uses this naming for the sake of simplicity
@@ -1161,6 +1161,9 @@ bool MDMParser::init(DevStatus* status)
         }
     } // if (_dev.dev == DEV_SARA_R410)
     _resetFailureAttempts = 0;
+#if 1
+    goto reset_failure;
+#endif // 1
     return true;
 
 failure:

This should cause a modem reset in 5 minutes and restart the initialization again.

  1. These changes should not affect Photon/P1.

Example App

N/A

References

  • [CH64668]
  • [CH64669]

Completeness

  • User is totes amazing for contributing!
  • Contributor has signed CLA (Info here)
  • Problem and Solution clearly stated
  • Run unit/integration/application tests on device
  • Added documentation
  • Added to CHANGELOG.md after merging (add links to docs and issues)

@avtolstoy avtolstoy added this to the 2.0.0 milestone Oct 14, 2020
@avtolstoy avtolstoy requested review from technobly, XuGuohui, keeramis and sergeuz and removed request for XuGuohui October 14, 2020 17:21
hal/src/electron/modem/mdm_hal.cpp Outdated Show resolved Hide resolved
hal/src/electron/modem/mdm_hal.cpp Outdated Show resolved Hide resolved
system/src/system_network_cellular.h Outdated Show resolved Hide resolved
system/src/system_network_cellular.h Show resolved Hide resolved
system/src/system_network_internal.h Show resolved Hide resolved
system/src/system_network_internal.h Outdated Show resolved Hide resolved
system/src/system_network_internal.h Show resolved Hide resolved
system/src/system_network_internal.h Show resolved Hide resolved
system/src/system_network_internal.h Show resolved Hide resolved
hal/src/electron/modem/mdm_hal.cpp Show resolved Hide resolved
system/src/system_network_cellular.h Outdated Show resolved Hide resolved
system/src/system_network_cellular.h Show resolved Hide resolved
system/src/system_network_internal.h Show resolved Hide resolved
@sergeuz
Copy link
Member

sergeuz commented Oct 20, 2020

The test # 2 fails for me when I call Cellular.connect() in the app without establishing a cloud connection. The reason is that WLAN_CONNECTING is set to 1 after the has_credentials() check in ManagedNetworkInterface::connect() and thus the reconnection logic in ManagedNetworkInterface::process() is never triggered. Setting that flag earlier in the process causes another issue: CellularNetworkInterface::on_stop_listening() force-disconnects the device from the network when exiting the listening mode.

system/src/system_network_cellular.h Outdated Show resolved Hide resolved
@@ -773,6 +825,12 @@ class ManagedNetworkInterface : public NetworkInterface

virtual int process()
{
// Requested to power-on, failed initial power-on, requested to connect as well
if ((SPARK_WLAN_STARTED && !WLAN_INITIALIZED) && WLAN_CONNECTING) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this comment.

Also, wouldn't this change affect Photon/P1 as well, contrary to what is stated in the PR description?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this condition wouldn't happen on Photons. And even if it does, it won't probably hurt?

Copy link
Member

@sergeuz sergeuz Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed the issue described in the comment and decided that we're not going to fix it in the scope of this PR as there's a risk of introducing a breaking change in the release.

@avtolstoy avtolstoy merged commit 8847aed into develop Oct 20, 2020
@avtolstoy avtolstoy deleted the fix/ch64669/ch64668 branch October 20, 2020 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants