Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTA: Verify firmware integrity before applying firmware update #172

Closed
ghost opened this issue Oct 13, 2016 · 23 comments
Closed

OTA: Verify firmware integrity before applying firmware update #172

ghost opened this issue Oct 13, 2016 · 23 comments
Milestone

Comments

@ghost
Copy link

ghost commented Oct 13, 2016

Homie devices should be better protected from bad data published to $implementation/ota/payload (intentionally or unintentionally).

Granted, there is some protection in place because we first need to post a version number to $ota. However, since $ota might be retained (the v2 spec allows retaining, homie-ota wisely doesn't), this protection could easily get lost. One bad publish to $implementation/ota/payload and you have to recover your homie thing from wherever it was installed, hook up a serial port, re-flash it and take it back to wherever it was.

The best solution I can imagine would be something along the lines of the flow chart on this website http://www.syncroness.com/portfolio/firmware-updates/ (found it using a quick web search, I'm not associated with that company). Unfortunately, public key infrastructure and RSA decryption might be a bit too heavy here. But, at least a hash or a checksum can easily be run over the firmware binary (that is, skip the key/RSA blocks in the picture). esp8266/Arduino Updater supports MD5 checking. We just need a Homie convention for OTA signatures and some code to enforce them.

For example, a Homie convention could request that the OTA server first publishes a firmware-hash like md5:098f6bcd4621d373cade4e832627b4f6 to $implementation/ota/signature and then publishes the actual firmware to $implementation/ota/payload. By using an md5: prefix, the signature checking could later be extended to something more sophisticated. A new config key could specify whether or not a signature is required, and also which signature types are acceptable. That config key would need to be protected against lowering signature requirements.

Additional integrity tests could include

  • matching the Homie_setFirmware name
  • checking if the firmware size is within legal bounds

I can propose some code for the ESP side and instructions for the server side. Let me know.

@flaviostutz
Copy link
Contributor

+1

I was thinking about doing some sort of hash verification on my own... It would be great to be on the Homie itself.

Sent from my iPhone

On 13 Oct 2016, at 09:37, mrpace2 notifications@github.com wrote:

Homie devices should be better protected from bad data published to $implementation/ota/payload (intentionally or unintentionally).

Granted, there is some protection in place because we first need to post a version number to $ota. However, since $ota might be retained (the v2 spec allows retaining, homie-ota wisely doesn't), this protection could easily get lost. One bad publish to $implementation/ota/payload and you have to recover your homie thing from wherever it was installed, hook up a serial port, re-flash it and take it back to wherever it was.

The best solution I can imagine would be something along the lines of the flow chart on this website http://www.syncroness.com/portfolio/firmware-updates/ (found it using a quick web search, I'm not associated with that company). Unfortunately, public key infrastructure and RSA decryption might be a bit too heavy here. But, at least a hash or a checksum can easily be run over the firmware binary (that is, skip the key/RSA blocks in the picture). esp8266/Arduino Updater supports MD5 checking. We just need a Homie convention for OTA signatures and some code to enforce them.

For example, a Homie convention could request that the OTA server first publishes a firmware-hash like md5:098f6bcd4621d373cade4e832627b4f6 to $implementation/ota/signature and then publishes the actual firmware to $implementation/ota/payload. By using an md5: prefix, the signature checking could later be extended to something more sophisticated. A new config key could specify whether or not a signature is required, and also which signature types are acceptable. That config key would need to be protected against lowering signature requirements.

Additional integrity tests could include

matching the Homie_setFirmware name
checking if the firmware size is within legal bounds
I can propose some code for the ESP side and instructions for the server side. Let me know.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@marvinroger marvinroger added this to the v2.0.0 milestone Oct 13, 2016
@marvinroger
Copy link
Member

Definitely! Actually, I wanted to implement this, as as you said this is already implemented in the Updater class. But I forgot to. The md5: prefix could be removed because this is implementation related (see homieiot/convention#7), and MD5 is sufficient for our ESP8266, given the risk of collision is still very, very low.

I'm also 👍 for the Homie_setFirmware and firmware size bounds check!

@ghost
Copy link
Author

ghost commented Oct 13, 2016

I also don't think MD5 collisions are a problem. The "md5:" prefix was just intended to keep a door open for potential future extensions to the signature scheme without having to touch the Homie convention. I was hoping that maybe some day a (simple) code signing scheme can be added to Homie. I may have read too many news about IoT bot nets recently... ;-)

@marvinroger
Copy link
Member

Yes, I guess we would better apply the KISS principle in this case, at least for the ESP8266. And anyway, as discussed, OTA is implementation dependant, so we would not have to update the convention to support a new signature scheme.

@ghost
Copy link
Author

ghost commented Oct 14, 2016

I spent some time studying the esp8266/arduino Updater code. Updater::begin(...) and Updater::end(...) already verify

  • byte 0 of new firmware is magic byte 0xE9
  • byte 3 of new firmware indicates a firmware size not larger than the physical flash
  • new firmware size greater than zero
  • new firmware fits into the available space for flash updates
  • all flash erases and flash writes were successful during the update
  • the new firmware's MD5 hash matches an expected MD5 value (only if Updater::setMD5() was ever called)

Regarding MD5 checks:

  • Updater always calculates the MD5 from the new firmware, whether we call Updater::setMD5() or not. Only if we call Updater::setMD5() before calling Updater::end(...) our expected MD5 is verified against the calculated MD5. Otherwise, this comparison is skipped. I think the whole MD5 calculation should be skipped if the resulting MD5 won't be tested. Anyways.
  • There's currently no way of clearing the expected MD5 if one is set. You can only set a new expected MD5 but not remove it. So if a Homie OTA server sends an MD5, and a firmware update fails for whatever reason, the OTA server MUST send another MD5 before retrying firmware update. With other words, Homie's MD5 check cannot be made optional (unless Homie reboots itself if a firmware update fails). I am thinking of filing an upstream issue with esp8266/Arduino to request a way of resetting an MD5 expectation, but for the time being that's what it is.

Regarding firmware name checks:

  • I'm not sure anymore if it's really a good idea to request that the new firmware uses the same name as the current firmware. There might be legit reasons to use a different firmware name later on. Maybe an "awesome-relay" one day turns into an "awesome-led". I don't think it's reasonable to block OTA for the "recycling" of hardware.
  • If we wanted to verify the firmware name, we would have to search for the magic sequences inside the new firmware image. However, since Homie_setFirmware() is optional (correct me if I'm wrong), the current implementation of Homie_setFirmware / __FLAGGED_FW_NAME would need to be tweaked in order to make the magic bytes available for comparison. Also, again since Homie_setFirmware() is optional, the new firmware may not even contain a firmware name so we can't be sure to even find the magic sequences in the new firmware.
  • The magic byte sequence might span multiple invocations of BootNormal::_onMqttMessage so the search would need to be smart enough to handle this situation. Not a show-stopper, but maybe too much hassle for a questionable feature.

I guess bottom line is that the most vital checks (firmware size, firmware magic bytes) are in place already. I think we should add optional MD5 checking but we would have to live with the all-or-nothing (i.e. always-or-never) limitation until esp8266/Arduino gets an update, or we would need to reboot if a firmware update fails to make MD5 truly optional. I would NOT suggest checking the firmware name or Homie's magic bytes.

What are your thoughts?

@marvinroger
Copy link
Member

I guess folks won't update their firmware "by hand" anyway, they will use some sort of script to automate the MQTT messages needed for the update. So actually requiring publishing an MD5 would not harm, as MD5 is available in almost every language. What do you think?

I agree with letting the ability to "swap firmwares", so indeed checking the name might not be a good idea. And, as you say, we would need to use a state machine which might be too much hassle.

@ghost
Copy link
Author

ghost commented Oct 18, 2016

I can now work on this one. The OTA sequence I envision is as follows:

  1. OTA server publishes new firmware version format "1.0.0" to $ota
    • If OTA disabled, Homie reports 403 (forbidden). End of story.
    • If OTA enabled and new version is the same as what is running, Homie reports 304 (not modified). End of story.
    • Otherwise, Homie subscribes to
      • $implementation/ota/checksum and
      • $implementation/ota/payload
        and reports 202 (accepted).
  2. OTA server publishes MD5 checksum (string of 32 hex characters) to $implementation/ota/checksum. This step is optional. If a checksum is received on this topic it will be used. If not, MD5 is not verified (=backwards compatible to current situation).
    • If checksum format is correct (32 hex characters), Homie reports 202 (accepted).
    • Otherwise, Homie reports 400 BAD_CHECKSUM. Goto unsubscribe.
  3. OTA server publishes firmware to $implementation/ota/payload.
    • Firmware update executes. Homie reports 206's with progress.
    • When all bytes are done, Updater::end() verifies the firmware. This includes the MD5 if one was set.
    • Homie either reports 200 (ok) or a 400 or 500 error.
  4. Homie unsubscribes from
    • $implementation/ota/checksum and
    • $implementation/ota/payload
  5. Homie reboots on success or on error if an MD5 was set (to clear out the MD5).

Is the sequence OK?

Can we rename $implementation/ota/payload to $implementation/ota/firmware?

@jpmens
Copy link

jpmens commented Oct 18, 2016 via email

@jalmeroth
Copy link

From my experience the binary-data breaks some(most?) websocket-clients.
I would strongly recommend to clear the topic after a successful upgrade.

+1 for renaming to $implementation/ota/firmware

@ghost
Copy link
Author

ghost commented Oct 18, 2016

I would also prefer base64-encoded firmware blobs (see @jpmen's proposal #143) because I keep having issues with binary firmware messing/locking up my mosquitto_sub shell all the time. I know it's possible to filter out $implementation/ota/firmware topics from mosquitto_sub but it's a hassle, one way or the other.

Luckily, ESP firmwares always have a 0xE9 in the first byte. During OTA update, Homie could check the first byte against 0xE9. If it matches, the firmware blob is binary. Fine. Proceed as usual. If not, Homie could base64-decode the first two characters from the payload and check the result against 0xE9. If it matches, great. base64-decode and proceed as usual.

Would this be acceptable?

@ghost
Copy link
Author

ghost commented Oct 18, 2016

Oh, I forgot to mention that with the sequence I suggested there's no need to retain OTA messages anymore. None of them. I think retention was only required because, before Homie added status messages, the server was unable to find out whether Homie accepted an $ota request or not. So the server had to publish the firmware as a retained message before triggering $ota. Also, the server didn't know when the OTA update was finished, so the blob remained retained for good. Having status messages, this is all different now. Granted, the OTA server's job didn't get easier.

Retaining firmware blobs also puts unneeded burden on the MQTT server (having to retain 350KB or so for each homie node).

@marvinroger
Copy link
Member

marvinroger commented Oct 18, 2016

The sequence sounds good to me, and yes, let's rename $implementation/ota/payload to $implementation/ota/firmware.

About the base64, as @mrpace2 said, there's a way to differentiate a binary firmware from a base64 firmware, so let's go this way. You can use cdecode.h from the core, with #include <libb64/cdecode.h>, so that there's no need for another library.

@ghost
Copy link
Author

ghost commented Oct 18, 2016

Thanks. Will do.

@ghost
Copy link
Author

ghost commented Oct 19, 2016

I think we're now feature-complete with respect to the original objective of this issue. Thanks for discussing options and for accepting my contributions. Closing the issue.

@ghost ghost closed this as completed Oct 19, 2016
@marvinroger
Copy link
Member

marvinroger commented Oct 19, 2016

@mrpace2 thanks a lot! I made two commits after yours: 083ab27 and 0a060a4
The first simply replaces the String with a char array, and the second one ensures the checksum is fully received before handling it (it was a matter of moving the checksum code), as the checksum might come splitted in multiple packets.

Although the changes are not significant, I am not able to test right now. Can you test the latest git rev?

One question, also:

// In case of base64-coded firmware, we have given the wrong length to Update.begin()
// because the base64-encoded firmware may have contained non-base64 characters such
// as line feeds

You mean line feeds at the end? I'm uncomfortable with that, the payload should be clean.

@ghost ghost reopened this Oct 19, 2016
@ghost
Copy link
Author

ghost commented Oct 19, 2016

Reopened because of @marvinroger's comment.

The comment you asked about is a leftover from a previous version of the commit where I was still hoping that I can support line-wrapping in the base64-encoded payload. The comment doesn't make sense now. I will rephrase it tomorrow. Thanks for pointing that out.

The code is right, though. The payload is clean. The actual number of bytes written to firmware might still be off by 1 from the length we set in Update::begin() because of a potential = padding at the end of the base64-encoded firmware. That is why I set evenIfRemaining to true for base64 encoded payload.

I will test your commits tomorrow.

@ghost
Copy link
Author

ghost commented Oct 19, 2016

@marvinroger: I tried 0a060a4 on my setup with my little test script. It seems to work. I called my script 3 times, so installing the same OTA update 3 times (I am forcing the update by purposely sending the wrong firmware version in the script).

Log files (removed my IDs and cropped the firmware blob):

@marvinroger
Copy link
Member

Got it! I was always wondering what were the = at the end of b64 payloads, now I know. :) Great! Well, next step is to write the docs, I will.

@marvinroger
Copy link
Member

@marvinroger
Copy link
Member

marvinroger commented Oct 20, 2016

Here is a quick'n'dirty Node.js example:

  • Put this in an index.js file:
'use strict'

const BROKER = '127.0.0.1'
const BASE_TOPIC = 'homie/'
const DEVICE_ID = 'esp8266-test'
const PREFIX = `${BASE_TOPIC}${DEVICE_ID}`

const path = require('path')

const mqtt = require('mqtt')

const firmware = require('fs').readFileSync(path.join(__dirname, './assets/firmware.bin'))
const checksum = require('crypto').createHash('md5').update(firmware).digest('hex')
const client = mqtt.connect(`mqtt://${BROKER}`)

client.on('connect', () => {
  client.subscribe(`${PREFIX}/#`)
})

let startTime
let step = 'INITIAL'
client.on('message', (topic, message) => {
  topic = topic.substr(PREFIX.length + 1)
  message = message.toString()
  switch (step) {
    case 'INITIAL':
      if (topic === '$fw/version') {
        startTime = Date.now()
        client.publish(`${PREFIX}/$ota`, '2.0.0')
        step = 'REQUESTED'
        console.log('Requested OTA')
      }
      return
    case 'REQUESTED':
      if (topic === '$implementation/ota/status') {
        const status = parseInt(message.split(' ')[0], 10)
        if (status !== 202) {
          console.log('OTA not accepted')
          process.exit(1)
        }
        console.log('Accepted, sending checksum')
        step = 'CHECKSUM'
        client.publish(`${PREFIX}/$implementation/ota/checksum`, checksum)
      }
      return
    case 'CHECKSUM':
      if (topic === '$implementation/ota/status') {
        const status = parseInt(message.split(' ')[0], 10)
        if (status !== 202) {
          console.log('Checksum not accepted')
          process.exit(1)
        }
        console.log('Accepted, sending firmware')
        step = 'SENDING'
        client.publish(`${PREFIX}/$implementation/ota/firmware`, firmware)
      }
      return
    case 'SENDING':
      if (topic === '$implementation/ota/status') {
        const splitted = message.split(' ')
        const status = parseInt(splitted[0], 10)
        if (status === 206) {
          console.log(`OTA in progress... ${splitted[1]}`)
        } else if (status === 200) {
          console.log('OTA success')
          console.log(`Done in ${Date.now() - startTime}ms`)
          process.exit(0)
        } else {
          console.log('OTA error')
          process.exit(1)
        }
      }
      return
  }
})
  • Change the constants at the top of the file
  • Run npm i mqtt from the directory
  • Run node index.js and it will start the OTA

In my last test, it took 6400ms for a 338kb firmware, which sounds good.

@jpmens
Copy link

jpmens commented Oct 21, 2016

@marvinroger I think they are, yes. You two gentlemen have done marvellous stuff. I won't be able to play with this for the next three weeks, but will try when I'm back at base.

@marvinroger
Copy link
Member

Great! I think we're done here, then. @mrpace2 did the marvellous stuff, here. 😉

@ghost
Copy link
Author

ghost commented Oct 21, 2016

The few lines that I contributed are just a drop in the ocean. Thanks to @marvinroger for creating Homie. And for maintaining it. Great job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants