Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LTE/Electron] Fast OTA Fixes [ch16346] #1558

Merged
merged 7 commits into from Jul 30, 2018
Merged

[LTE/Electron] Fast OTA Fixes [ch16346] #1558

merged 7 commits into from Jul 30, 2018

Conversation

technobly
Copy link
Member

@technobly technobly commented Jul 12, 2018

Problem

  • Fast OTA on the E Series LTE v0.8.0-rc.8 is disabled pending a Cat-M1 rate limit investigation. Chunks would hang at index 48 out of 49, and usually be missing most of the 0-48.
  • Devices were resetting before sending the UpdateDone ACK to the server, leaving it in a limbo OTA process state for 16 minutes. During this 16 minutes, the device APIs were blocked (no function calls, etc..)
  • Event loop error = 3 (INVALID_STATE) when working around the reset too soon issue with a "Delay 5 seconds before resetting after OTA update" fix.

Solution

  • Instead of 50 chunks at a time, only 40 are requested (reduces lossy behavior of Cat-M1)
  • The Particle Cloud also needs to initially send 40 instead of 50 chunks (currently deployed).
  • Multiple UpdateDone replies reduced to one (removes errors)
  • Multiple ChunksMissed replies reduced to one (removes errors)
  • 3 second timeout removed (Server is now very persistent about sending UpdateDone after each round of Chunks).
  • Wait for confirmable messages to be sent (fixes resetting too soon which was causing the server to hang around for 16 minutes)
  • If a CHUNK or UPDATE_DONE CoAP message is received when we don't expect it, instead of forcing an INVALID_STATE error, simply throw NO_ERROR to give a silent error (removes errors)

Steps to Test

  • Enable Serial1LogHander logHandler(115200, LOG_LEVEL_ALL); in the Tinker app.
  • Program a E Series LTE device with a monolithic image of that modified Tinker app from this branch to prevent it from being overwritten by modular firmware OTA'd to it.
  • OTA a system-part2-0.8.0-rc.8-electron.bin to it and watch it's logging output on the TX pin.
  • Also try the included Test App:
    • #define DELAY_RESET_5_SECONDS
    • #define MORE_THAN_40_CHUNKS
    • comment out both ^
  • Test on LTE E Series, U260 Electron G350 Electron, U201 E Series, Photon, P1.

Test App

/*
 ******************************************************************************
  Copyright (c) 2015 Particle Industries, Inc.  All rights reserved.

  This program is free software; you can redistribute it and/or
  modify it under the terms of the GNU Lesser General Public
  License as published by the Free Software Foundation, either
  version 3 of the License, or (at your option) any later version.

  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  Lesser General Public License for more details.

  You should have received a copy of the GNU Lesser General Public
  License along with this program; if not, see <http://www.gnu.org/licenses/>.
  ******************************************************************************
 */

/* Includes ------------------------------------------------------------------*/
#include "application.h"
#include "stdarg.h"

PRODUCT_ID(PLATFORM_ID);
PRODUCT_VERSION(3);

/* Function prototypes -------------------------------------------------------*/
int tinkerDigitalRead(String pin);
int tinkerDigitalWrite(String command);
int tinkerAnalogRead(String pin);
int tinkerAnalogWrite(String command);

Serial1LogHandler logHandler(115200, LOG_LEVEL_ALL);

STARTUP(System.enable(SYSTEM_FLAG_WIFITESTER_OVER_SERIAL1));
STARTUP(System.enableFeature(FEATURE_WIFITESTER));

// COMMENT/UNCOMMENT for different options
// #define DELAY_RESET_5_SECONDS
#define MORE_THAN_40_CHUNKS

SYSTEM_MODE(AUTOMATIC);
SYSTEM_THREAD(ENABLED);

// let's get our binary larger than 40 chunks worth of data!
#define TOTAL_WASTE (2000)
const uint32_t waste_of_flash[TOTAL_WASTE] = {0};
uint32_t waste_of_ram[TOTAL_WASTE] = {0};

void systemEventHandler(system_event_t event, int data) {
    // Log.info("EVENT_H: %lu EVENT_L: %lu DATA: %d", (uint32_t)(event>>32), (uint32_t)event, data);

    // RESET PENDING
    if (event == reset_pending) {
        Log.info("\nDELAYING RESET\n");
        static uint32_t reset_timer = millis();
        while (millis() - reset_timer < 5000) {
            Particle.process();
        }
        Log.info("\nENABLE RESET\n");
        delay(500);
        System.enableReset();
    }
}

/* This function is called once at start up ----------------------------------*/
void setup()
{
#ifdef DELAY_RESET_5_SECONDS
    System.disableReset();
    System.on(reset_pending, systemEventHandler);
#endif

    //Setup the Tinker application here

    //Register all the Tinker functions
    Particle.function("digitalread", tinkerDigitalRead);
    Particle.function("digitalwrite", tinkerDigitalWrite);

    Particle.function("analogread", tinkerAnalogRead);
    Particle.function("analogwrite", tinkerAnalogWrite);

#ifdef MORE_THAN_40_CHUNKS
    for (uint32_t x=0; x<TOTAL_WASTE; x++) {
        waste_of_ram[x] = waste_of_flash[x];
    }
#endif
}

/* This function loops forever --------------------------------------------*/
void loop()
{
    //This will run in a loop
}

/*******************************************************************************
 * Function Name  : tinkerDigitalRead
 * Description    : Reads the digital value of a given pin
 * Input          : Pin
 * Output         : None.
 * Return         : Value of the pin (0 or 1) in INT type
                    Returns a negative number on failure
 *******************************************************************************/
int tinkerDigitalRead(String pin)
{
    //convert ascii to integer
    int pinNumber = pin.charAt(1) - '0';
    //Sanity check to see if the pin numbers are within limits
    if (pinNumber < 0 || pinNumber > 7) return -1;

    if(pin.startsWith("D"))
    {
        pinMode(pinNumber, INPUT_PULLDOWN);
        return digitalRead(pinNumber);
    }
    else if (pin.startsWith("A"))
    {
        pinMode(pinNumber+10, INPUT_PULLDOWN);
        return digitalRead(pinNumber+10);
    }
#if Wiring_Cellular
    else if (pin.startsWith("B"))
    {
        if (pinNumber > 5) return -3;
        pinMode(pinNumber+24, INPUT_PULLDOWN);
        return digitalRead(pinNumber+24);
    }
    else if (pin.startsWith("C"))
    {
        if (pinNumber > 5) return -4;
        pinMode(pinNumber+30, INPUT_PULLDOWN);
        return digitalRead(pinNumber+30);
    }
#endif
    return -2;
}

/*******************************************************************************
 * Function Name  : tinkerDigitalWrite
 * Description    : Sets the specified pin HIGH or LOW
 * Input          : Pin and value
 * Output         : None.
 * Return         : 1 on success and a negative number on failure
 *******************************************************************************/
int tinkerDigitalWrite(String command)
{
    bool value = 0;
    //convert ascii to integer
    int pinNumber = command.charAt(1) - '0';
    //Sanity check to see if the pin numbers are within limits
    if (pinNumber < 0 || pinNumber > 7) return -1;

    if(command.substring(3,7) == "HIGH") value = 1;
    else if(command.substring(3,6) == "LOW") value = 0;
    else return -2;

    if(command.startsWith("D"))
    {
        pinMode(pinNumber, OUTPUT);
        digitalWrite(pinNumber, value);
        return 1;
    }
    else if(command.startsWith("A"))
    {
        pinMode(pinNumber+10, OUTPUT);
        digitalWrite(pinNumber+10, value);
        return 1;
    }
#if Wiring_Cellular
    else if(command.startsWith("B"))
    {
        if (pinNumber > 5) return -4;
        pinMode(pinNumber+24, OUTPUT);
        digitalWrite(pinNumber+24, value);
        return 1;
    }
    else if(command.startsWith("C"))
    {
        if (pinNumber > 5) return -5;
        pinMode(pinNumber+30, OUTPUT);
        digitalWrite(pinNumber+30, value);
        return 1;
    }
#endif
    else return -3;
}

/*******************************************************************************
 * Function Name  : tinkerAnalogRead
 * Description    : Reads the analog value of a pin
 * Input          : Pin
 * Output         : None.
 * Return         : Returns the analog value in INT type (0 to 4095)
                    Returns a negative number on failure
 *******************************************************************************/
int tinkerAnalogRead(String pin)
{
    //convert ascii to integer
    int pinNumber = pin.charAt(1) - '0';
    //Sanity check to see if the pin numbers are within limits
    if (pinNumber < 0 || pinNumber > 7) return -1;

    if(pin.startsWith("D"))
    {
        return -3;
    }
    else if (pin.startsWith("A"))
    {
        return analogRead(pinNumber+10);
    }
#if Wiring_Cellular
    else if (pin.startsWith("B"))
    {
        if (pinNumber < 2 || pinNumber > 5) return -3;
        return analogRead(pinNumber+24);
    }
#endif
    return -2;
}

/*******************************************************************************
 * Function Name  : tinkerAnalogWrite
 * Description    : Writes an analog value (PWM) to the specified pin
 * Input          : Pin and Value (0 to 255)
 * Output         : None.
 * Return         : 1 on success and a negative number on failure
 *******************************************************************************/
int tinkerAnalogWrite(String command)
{
    String value = command.substring(3);

    if(command.substring(0,2) == "TX")
    {
        pinMode(TX, OUTPUT);
        analogWrite(TX, value.toInt());
        return 1;
    }
    else if(command.substring(0,2) == "RX")
    {
        pinMode(RX, OUTPUT);
        analogWrite(RX, value.toInt());
        return 1;
    }

    //convert ascii to integer
    int pinNumber = command.charAt(1) - '0';
    //Sanity check to see if the pin numbers are within limits

    if (pinNumber < 0 || pinNumber > 7) return -1;

    if(command.startsWith("D"))
    {
        pinMode(pinNumber, OUTPUT);
        analogWrite(pinNumber, value.toInt());
        return 1;
    }
    else if(command.startsWith("A"))
    {
        pinMode(pinNumber+10, OUTPUT);
        analogWrite(pinNumber+10, value.toInt());
        return 1;
    }
    else if(command.substring(0,2) == "TX")
    {
        pinMode(TX, OUTPUT);
        analogWrite(TX, value.toInt());
        return 1;
    }
    else if(command.substring(0,2) == "RX")
    {
        pinMode(RX, OUTPUT);
        analogWrite(RX, value.toInt());
        return 1;
    }
#if Wiring_Cellular
    else if (command.startsWith("B"))
    {
        if (pinNumber > 3) return -3;
        pinMode(pinNumber+24, OUTPUT);
        analogWrite(pinNumber+24, value.toInt());
        return 1;
    }
    else if (command.startsWith("C"))
    {
        if (pinNumber < 4 || pinNumber > 5) return -4;
        pinMode(pinNumber+30, OUTPUT);
        analogWrite(pinNumber+30, value.toInt());
        return 1;
    }
#endif
    else return -2;
}

Completeness

  • User is totes amazing for contributing!
  • Contributor has signed CLA (Info here)
  • Problem and Solution clearly stated
  • Run unit/integration/application tests on device
  • [N/A] Added documentation
  • Added to CHANGELOG.md after merging (add links to docs and issues)

@technobly technobly added the bug label Jul 12, 2018
@@ -310,7 +310,7 @@ void establish_cloud_connection()
cellular_device_info(&device, NULL);
if (device.dev == 8/*DEV_SARA_R410*/) {
DEBUG("Device is SARA_R410, disabling Fast OTA!");
CLOUD_FN(spark_set_connection_property(particle::protocol::Connection::FAST_OTA, 0/*disabled*/, nullptr, nullptr), (void)0);
CLOUD_FN(spark_set_connection_property(particle::protocol::Connection::FAST_OTA, 1/*disabled*/, nullptr, nullptr), (void)0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this, given that FastOTA is enabled by default? (I'm assuming 1 means Enabled, despite the succeeding comment.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I just flipped the 0 to 1 for quick Fast OTA testing.

We wouldn't need this code any longer, but I wonder if we should create a Wiring API to allow users to toggle this if they so desired. Although after all of these fixes get implemented I can't think of a reason anyone would want to OTA update slower. Once we test and it looks good though, we can just remove this code, since we have the history in github. No need to even leave it commented out.

@technobly technobly added this to the 0.8.0-rc.9 milestone Jul 24, 2018
technobly and others added 6 commits July 27, 2018 20:24
…unks received, plus a small growth factor. This ensures that when the network drops many of the chunks sent by the server, that the device requests fewer chunks.
- Wait for confirmable messages to be sent (fixes resetting too soon which was causing the server to hang around for 16 minutes)
- Also remove extra UpdateDone sent from device (fixes a race condition of how we complete Fast OTA)
- If a CHUNK or UPDATE_DONE CoAP message is received when we don't expect it, instead of forcing an INVALID_STATE error, simply throw NO_ERROR to give a silent error.  TODO: Return to throwing INVALID_STATE once we add a method to know when the server receives the UpdateDone ACK we send.  This is so we can properly call reset_updating() after we are done and we also know the server is done with the Fast OTA process.
@technobly technobly changed the title [LTE] Fixes Fast OTA (WIP) [LTE/Electron] Fast OTA Fixes [ch16346] Jul 28, 2018
@technobly technobly requested a review from m-mcgowan July 28, 2018 01:28
m-mcgowan
m-mcgowan previously approved these changes Jul 30, 2018
@@ -299,7 +279,11 @@ ProtocolError ChunkedTransfer::handle_update_done(token_t token, Message& messag
{
updating = 2; // flag that we are sending missing chunks.
DEBUG("update done - missing chunks starting at %d", index);
error = send_missing_chunks(channel, MISSED_CHUNKS_TO_SEND);
chunk_index_t increase = std::max(chunk_index_t(chunk_count*0.2),chunk_index_t(2u)); // ensure always some growth
chunk_index_t resend_chunk_count = std::min(std::max(2u,unsigned(chunk_count+increase)), MISSED_CHUNKS_TO_SEND);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this is my code, but written quickly to try out the concept, and on review I see we can make it a little clearer:

resend_chunk_count can be simplified
std::max(2u,unsigned(chunk_count+increase) can be simply unsigned(chunk_count+increase) since it's guaranteed to have a non-zero lower limit (currently 2).

We could add a symbol for MINIMUM_CHUNK_INCREASE = chunk_index(2u) to avoid magic numbers in the code.

@technobly technobly merged commit b9aee44 into develop Jul 30, 2018
@technobly technobly deleted the fix/fast-ota-lte branch July 30, 2018 19:14
@avtolstoy avtolstoy restored the fix/fast-ota-lte branch October 17, 2018 08:44
@avtolstoy avtolstoy deleted the fix/fast-ota-lte branch October 17, 2018 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants