Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publishing in a multi-thread application running on UDP device during OTA results in hard fault #1298

Closed
jme783 opened this issue Apr 11, 2017 · 1 comment · Fixed by #1415

Comments

@jme783
Copy link
Contributor

@jme783 jme783 commented Apr 11, 2017

Given I have a UDP device running application firmware with multithreading enabled
And the application firmware publishes events to the cloud at regular intervals
When I OTA flash the device with another application firmware
And the device publishes an event while the OTA is in progress
Then the OTA will be irreparably disrupted and the UDP device will hard fault until a manual reset

Application firmware running on the Electron was:

// This #include statement was automatically added by the Particle IDE.
#include <google-maps-device-locator.h>
GoogleMapsDeviceLocator locator;

SerialLogHandler logHandler;
SYSTEM_THREAD(ENABLED);

void setup() {
    Serial.begin(9600);
    locator.withLocatePeriodic(10);
}


void loop() {
    locator.loop();
}

when this firmware is running on the device, and I OTA a new application firmware to the electron, the OTA fails mid-way through and my electron starts hard faulting. Serial logs for the device during the OTA:

0000218892 [comm.protocol] INFO: rcv'd message type=13
0000219614 [comm.protocol] INFO: rcv'd message type=13
0000226230 [comm.protocol] INFO: rcv'd message type=5
0000226484 [comm.protocol] INFO: rcv'd message type=7
0000226738 [comm.protocol] INFO: rcv'd message type=7
0000226992 [comm.protocol] INFO: rcv'd message type=7
0000227246 [comm.protocol] INFO: rcv'd message type=7
0000227500 [comm.protocol] INFO: rcv'd message type=7
publishLocation
0000228099 [comm] WARN: handle received message failed - aborting transfer
0000228100 [system] INFO: Send spark/device/ota_result event
0000229100 [comm] WARN: handle received message failed - aborting transfer
0000229100 [system] INFO: Send spark/device/ota_result event
0000230101 [comm] WARN: handle received message failed - aborting transfer
0000230101 [system] INFO: Send spark/device/ota_result event
0000231102 [comm] WARN: handle received message failed - aborting transfer

So looks like to me the device is receiving UDP packets, but for some reason, the application code on the device is still running while receiving the OTA. The publishLocation is logged every time network data is sent up to the cloud. Once this happened, the following system messages suggest an issue, and the OTA is aborted. Then the device starts hard-faulting (blinking red).

Note that this is only observable on an Electron (UDP) not Photon (TCP). Running the same experiment on a Photon produces the following device logs:

0000055938 [comm.sparkprotocol] INFO: chunk idx=22 crc=1 fast=1 updating=1
publishLocation
0000056043 [comm.sparkprotocol] INFO: chunk
0000056043 [comm.sparkprotocol] INFO: chunk idx=23 crc=1 fast=1 updating=1
0000056146 [comm.sparkprotocol] INFO: chunk
0000056146 [comm.sparkprotocol] INFO: chunk idx=24 crc=1 fast=1 updating=1
0000056249 [comm.sparkprotocol] INFO: chunk
0000056249 [comm.sparkprotocol] INFO: chunk idx=25 crc=1 fast=1 updating=1
0000056352 [comm.sparkprotocol] INFO: chunk
0000056352 [comm.sparkprotocol] INFO: chunk idx=26 crc=1 fast=1 updating=1
0000056455 [comm.sparkprotocol] INFO: chunk
@m-mcgowan m-mcgowan added track and removed track labels Oct 22, 2017
@sergeuz

This comment has been minimized.

Copy link
Member

@sergeuz sergeuz commented Oct 26, 2017

This is likely caused by the google-maps-device-locator library, which talks to the modem directly via Cellular.command(), which in turn is not thread-safe.

@m-mcgowan @technobly I'm wondering what would be a good solution here. Should we add a global lock to the Cellular HAL?

@sergeuz sergeuz referenced this issue Oct 26, 2017
3 of 6 tasks complete
@m-mcgowan m-mcgowan modified the milestones: 0.8.0, 0.8.0-rc.2 Nov 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.