Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

firmware control over OTA updates #375

Closed
m-mcgowan opened this issue Jan 8, 2015 · 22 comments
Closed

firmware control over OTA updates #375

m-mcgowan opened this issue Jan 8, 2015 · 22 comments
Labels

Comments

@m-mcgowan
Copy link
Contributor

@m-mcgowan m-mcgowan commented Jan 8, 2015

OTA updates - aren't they great! We can make them greater!

Presently, OTA updates happen at the time they are triggered, and succeed when core is online. Offline cores cannot participate in OTA updates, in which case the update request is timed out after a short period (minute or so.)

User firmware is stopped at an arbitrary point for several minutes, possibly in mid action when an OTA update starts. (The proposed system events #279 would allow the user firmware to take action before the OTA update begins, such as putting the device in a safe state.)

One solution to these issues is:

  • have the cloud notify the core that an OTA update is available, but not start the OTA update
  • if the core is offline, persist the OTA update, and notify the core next time it comes online.
  • having previously been notified that an OTA update is available, the core notifies the cloud at the time it is ready to receive the OTA update.

I imagine this has significant ramifications on the cloud, just posting this here for discussion!

@m-mcgowan

This comment has been minimized.

Copy link
Contributor Author

@m-mcgowan m-mcgowan commented Jan 8, 2015

@piettetech

This comment has been minimized.

Copy link

@piettetech piettetech commented Jan 8, 2015

* if the core is offline, persist the OTA update, and notify the core next time it comes online.

I for one would love to have things persist in the cloud such that the next time a core is connected to the cloud it would be able to get the OTA. For solutions that are trying to be power efficient, having messages and OTA update persist in the cloud would be a very helpful.

User firmware is stopped at an arbitrary point for several minutes, possibly in mid action when an OTA update starts. (The proposed system events #279 would allow the user firmware to take action before the OTA update begins, such as putting the device in a safe state.)

Having a safe way to save the state prior to performing an OTA update would be a nice addition as well. Maybe register an OTA function callback that will be executed when an OTA event is about to happen. That would allow the user app to gracefully shut down the system prior to the OTA and subsequent reboot.

@towynlin

This comment has been minimized.

Copy link
Member

@towynlin towynlin commented Jan 8, 2015

@m-mcgowan I also know that @dmiddlecamp has at least partially implemented the cloud changes already, though I don't believe they're deployed yet. You two should chat about current status and next steps — we might be closer than you think!

@cSquirrel

This comment has been minimized.

Copy link

@cSquirrel cSquirrel commented Jan 15, 2015

I always had a slight concern about "forced" updates and I would like to suggest API controlled approach to control both OTA and non-OTA updates:

-(boolean)Spark.hasFirmwareUpdate()
Returns true if there's an update pending, false otherwise

-(void)Spark.performFirmwareUpdate()
Performs firmware update if any available, does nothing otherwise.

With these in hand I will have fine control over the updates. I can decide whether my core will respond to an update or not ( security related projects may want to prohibit updates ). Also I can perform additional tasks ( send an event maybe ) before the update (OTA or not).

You can even imagine building a fail-safe solution if necessary where two cores would talk to each other using events. Secondary core could take over when primary is going through an update. Once primary finished the update successfully then secondary core can proceed with update. All thanks to events and two methods I suggested above.

@m-mcgowan

This comment has been minimized.

Copy link
Contributor Author

@m-mcgowan m-mcgowan commented Jan 15, 2015

@cSquirrel - Agreed with the above - can you explain what you mean by non-OTA updates?

@piettetech

This comment has been minimized.

Copy link

@piettetech piettetech commented Jan 15, 2015

I suggest that rather than poll for firmware updates, we register a function that gets called if a update is ready. It will return either true or false as to if the update should be performed.

(boolean) OTAUpdate() {
    // do your cleanup, notification, and get ready for update
   return true;
}

Then register it with
Spark.OTAHandler(OTAUpdate);

If you return false, then the OTAUpdate function will be called again after some pre-determined amount of time

Finally, if the user does not register an update handler then updates occur automatically, if you want to turn off control then just call Spark.OTAHandler(NULL);

@m-mcgowan

This comment has been minimized.

Copy link
Contributor Author

@m-mcgowan m-mcgowan commented Jan 15, 2015

Agreed. I would have both a function you can call to poll status, and a system event. We have a proposal for system events here #279.

@cSquirrel

This comment has been minimized.

Copy link

@cSquirrel cSquirrel commented Jan 15, 2015

@m-mcgowan I assume that at some point in the future you may want to introduce other than OTA way to update ( i.e. over-the-cable ). Whatever the update method is/will be the API should behave the same.

I also agree that both ways to deal with the updates ( polling and events ) will be handy.

@m-mcgowan

This comment has been minimized.

Copy link
Contributor Author

@m-mcgowan m-mcgowan commented Jan 15, 2015

Makes sense. Yes, there is already a form of OTC (Over the Cable) update with Ymodem support, so totes that this should have the same behaviour.

@m-mcgowan

This comment has been minimized.

Copy link
Contributor Author

@m-mcgowan m-mcgowan commented Aug 23, 2015

Something simple is to have a flag that application firmware can set to indicate if firmware updates are allowed.

System.updates(true|false);

bool updatesEnabled = System.updatesEnabled();

This will then tie in with system events - the system can post an event that an update is available, but won't go ahead and apply the update if System.updates(false) has been called. The application may chose to be in a "no-updates" mode until the system indicates an update is available and the application then calls System.updates(true) after it's taken measures to make it safe to stall application code and apply the update.

@zsup

This comment has been minimized.

Copy link
Member

@zsup zsup commented Aug 23, 2015

The syntax System.updates(bool) doesn't feel quite right to me; it feels like it should be something like System.enableUpdates(bool), or System.allowUpdates(bool), or it could be System.enableUpdates() and System.disableUpdates(). Naming the method updates doesn't make the intended action clear; my assumption would be that updates is a get method instead of a set method, which would return the number of updates available, or something like that.

@m-mcgowan

This comment has been minimized.

Copy link
Contributor Author

@m-mcgowan m-mcgowan commented Aug 23, 2015

👍 Totally.. I had the same thought since most of the system actions are written as commands rather than states. Not sure what my reasoning for just update, other than it's one function rather than 2, but having them separate is certainly simpler.

So we'll have

System.enableUpdates();
System.disableUpdates();

bool updatesEnabled = System.updatesEnabled();
@zsup

This comment has been minimized.

Copy link
Member

@zsup zsup commented Aug 23, 2015

Sounds great. Also when we write the docs for this, make sure to put a big bold warning around disabling updates, since it opens the door for people to be able to lose OTA functionality if they're not careful!

@kneuron

This comment has been minimized.

Copy link

@kneuron kneuron commented Aug 23, 2015

This functionality would be great! Currently, it's been necessary to build a bit of extra communication around this, eg: server flags device via an API call to a particle.variable() that it would like to perform an update, then waits for the device to signal back that it's ready, then the update is attempted.

I like the three functions proposed here, @m-mcgowan, and would suggest another, that @zsup hinted at: something that provides information to the device about the cloud's desire to perform an update when System.updatesEnabled() returns false. This might require some additional handling on the Particle Cloud when performing OTA flash. Maybe the cloud should have information about whether or not the device is accepting updates?

In other words:

  1. What happens on the device when it is in updatesEnabled == False and the cloud attempts an OTA update
  2. What happens on the cloud when the device is in updatesEnabled == False and the cloud attempts an OTA update
@m-mcgowan

This comment has been minimized.

Copy link
Contributor Author

@m-mcgowan m-mcgowan commented Aug 23, 2015

Good suggestion @kneuron!

At present, I believe the cloud will attempt the update once and then no more. In future, this will be extended to persist the update in the cloud until such a time the device is ready for it. The changes to make happen involve multiple systems, and will take longer to realize, so I figured we'd do this incrementally:

first iteration: give the application control to veto an OTA - the OTA is not sent and the update will need to be requested again sometime later, in the same way this update was initiated (Web IDE, fleet management, particle CLI etc.). no automated re-schedule in the cloud.

second iteration: the application vetos an update and the cloud is informed of this and persists the update until the device allows it. Fleet management tools report on devices that have taken the update and those still pending. cc @jme783

In the first iteration, the application will be made aware of an update by a system event:

// System.disableUpdates() previously called.
// Notifications of available updates are still sent, but updates won't be applied unless 
// System.enableUpdates() is called within the timeout period. (30s?)
// Here, we allow updates only after precautions taken to put the device in a safe state
System.on(firmware_update_available, []{ shutdown_pumps(); System.enableUpdates(); });

In the second iteration, the same system event can be used. With the update being persisted in the cloud, there is no longer a time constraint. so we can add a new method that the main application loop can poll at it's leasure:

void loop() {
    delay(30000); // busy doing stuff
    if (System.updateAvailable()) {
          shutdownPumps();
          System.enableUpdates();   // notifies the cloud that we are ready to take an update now
    }
}

So that's how I see this eventually working, both with a system event or by polling.

Note also that when multithreading is fully realized on the photon, it will be possible to perform the OTA update in the background without interrupting the application. The firmwre_update_available event will be sent when the update is fully downloaded. The application then has decide when to apply the update (using the same mechanism as above) and the system briefly restarts to apply the update. This will reduce application downtime to the the duration of a system reset and cloud reconnect, which is typically just a few seconds.

@andyw-lala

This comment has been minimized.

Copy link
Contributor

@andyw-lala andyw-lala commented Aug 23, 2015

I suggest two levels of disable updates on the device, one that is
overridable fr the cloud by some series of "do you really want to do this"
exception approvals, and one that refuses even that.

Think about the use cases of remote devices.
On Aug 23, 2015 15:21, "Matthew McGowan" notifications@github.com wrote:

Good suggestion @kneuron https://github.com/kneuron!

At present, I believe the cloud will attempt the update once and then no
more. In future, this will be extended to persist the update in the cloud
until such a time the device is ready for it. The changes to make happen
involve multiple systems, and will take longer to realize, so I figured
we'd do this incrementally:

first iteration: give the application control to veto an OTA - the OTA is
not sent and the update will need to be requested again sometime later, in
the same way this update was initiated (Web IDE, fleet management, particle
CLI etc.). no automated re-schedule in the cloud.

second iteration: the application vetos an update and the cloud is
informed of this and persists the update until the device allows it. Fleet
management tools report on devices that have taken the update and those
still pending. cc @jme783 https://github.com/jme783

In the first iteration, the application will be made aware of an update by
a system event:

// System.disableUpdates() previously called.
// Notifications of available updates are still sent, but updates won't be applied unless
// System.enableUpdates() is called within the timeout period. (30s?)
// Here, we allow updates only after precautions taken to put the device in a safe state
System.on(firmware_update_available, []{ shutdown_pumps(); System.enableUpdates(); });

In the second iteration, the same system event can be used. With the
update being persisted in the cloud, there is no longer a time constraint.
so we can add a new method that the main application loop can poll at it's
leasure:

void loop() {
delay(30000); // busy doing stuff
if (System.updateAvailable()) {
shutdownPumps();
System.enableUpdates(); // notifies the cloud that we are ready to take an update now
}
}

So that's how I see this eventually working, both with a system event or
by polling.

Note also that when multithreading is fully realized on the photon, it
will be possible to perform the OTA update in the background without
interrupting the application. The firmwre_update_available event will be
sent when the update is fully downloaded. The application then has decide
when to apply the update (using the same mechanism as above) and the system
briefly restarts to apply the update. This will reduce application downtime
to the the duration of a system reset and cloud reconnect, which is
typically just a few seconds.


Reply to this email directly or view it on GitHub
#375 (comment).

@m-mcgowan

This comment has been minimized.

Copy link
Contributor Author

@m-mcgowan m-mcgowan commented Aug 23, 2015

Good plan. We should have a backdoor to force an update in case the normal route is inadvertently blocked (e.g. by buggy code.) I'm thinking an event or well-known function that the application can use to take action to put itself in a safe state. "shutdown mode" ? something like

System.on(shutdown, []{ shutoffPumps(); closeValves(); });

or as a well known function name, to complement setup()

void shutdown() {
   shutoffPumps();
   closeValves();
}

As well as a forced OTA update, forced reset etc...this might be useful in low power conditions or other situations where the system is forced into this mode due to environmental factors.

@kneuron

This comment has been minimized.

Copy link

@kneuron kneuron commented Aug 23, 2015

@m-mcgowan the two iteration plan looks good.

I especially like the detail in the first iteration about a timeout period after being notified of a desired OTA within which the device may call

System.enableUpdates()

That will smooth the transition to the second iteration by allowing some limited ability to allow the update after a bit of housekeeping on the device. You suggested 30s, which seems reasonable to me.

@m-mcgowan

This comment has been minimized.

Copy link
Contributor Author

@m-mcgowan m-mcgowan commented Aug 23, 2015

Cool. The timeout is mainly related to how long the cloud will wait for a response before timing out the OTA operation. I believe it's 30s but would need to double-check that before committing it to documentation, and then we'd recommend a shorter time since that also includes round trip network latency.

@kneuron

This comment has been minimized.

Copy link

@kneuron kneuron commented Aug 24, 2015

@m-mcgowan got it. Thanks :)

@solarplug

This comment has been minimized.

Copy link

@solarplug solarplug commented Sep 15, 2015

Hi guys,

I think it would be a great idea to allow the firmware the ability to inhibit / postpone a firmware update, I wanted to post some details about our application requirements for reference. Hopefully this benefits others.

Our application will require a technician to be present at a device location when an OTA firmware update is applied to ensure that nothing goes haywire and that everything comes back online afterwards without any issues. We may need for the technician to safely shut down the equipment before an update is allowed (that's TBD at this point but I can see a number of situations where that could easily become mandatory). Additionally there is a security element to it (having a technician be present to physically press a button significantly mitigates the possibility of a compromised server from pushing an unauthorized OTA update, etc).

So we would ideally like to have some means for the firmware to reject or postpone an update until a technician is able to physically press a button (or some sort of user interaction, it can certainly be engaged programmatically as discussed above).

It sounds like the discussion here will lead to something pretty much exactly like what we are looking for--looks good.

@m-mcgowan m-mcgowan added this to the 0.4.7 milestone Sep 30, 2015
@m-mcgowan m-mcgowan removed this from the 0.4.7 milestone Oct 21, 2015
@m-mcgowan

This comment has been minimized.

Copy link
Contributor Author

@m-mcgowan m-mcgowan commented Jul 19, 2019

Intelligent control over firmware updates was added in 1.2.1.

@m-mcgowan m-mcgowan closed this Jul 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.