Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mission checksum package #1171

Open
holmnikolaj opened this issue Jul 3, 2019 · 69 comments
Open

Mission checksum package #1171

holmnikolaj opened this issue Jul 3, 2019 · 69 comments

Comments

@holmnikolaj
Copy link

Instead of downloading all the mission from the drone to check that they are in sync it would be much faster if the mission started with a checksum item. As far as I can see there is no such thing at the moment but this would be very helpful for setups with long missions and poor link quality.

@dagar
Copy link
Member

dagar commented Jul 3, 2019

Very interested in this as well.

@Antiheavy @DonLakeFlyer

@Antiheavy
Copy link

Yes! And helpful with knowing if the flight plan on the vehicle is the same or different than what is onscreen (eg QGC).

@hamishwillee
Copy link
Collaborator

PX4 and QGC do something similar for parameters. It would be nice if we could use the 0th index for this, but ArduPilot populate that with the home position.

@holmnikolaj
Copy link
Author

It could be something as simple as a mission item that is simply ignored by the autopilot. It is only the UI that should be in charge of synchronization anyway.
Then the seven parameters (or just some of them) can contain the checksum.

  1. The UI will upload the checksum mission item as part of the mission if it supports it.
  2. The Autopilot completely ignores it.
  3. The UI reads the checksum mission item from the autopilot and verifies that the mission is in sync.

The checksum item could be placed first in the mission of second (just after takeoff) if this is more convenient.

@JohannesFriis
Copy link

I guess it could be done similar to MAV_CMD_DO_LAND_START (#189 ). This acts as a marker in the mission waypoint list. I know that making it a command has a double purpose, that makes sense for land start cmd, but the idea of a marker wp in the mission list containing the mission checksum could be implemented with small changes in both mavlink and autopilot's. i agree that placing the marker early in the mission list is preferred.

@dagar
Copy link
Member

dagar commented Jul 4, 2019

I think it needs to be handled like metadata across multiple parts of the mission protocol to solve most of the problems efficiently. I'd also like to consider the hash itself being part of the spec.

For example, MISSION_CURRENT (https://mavlink.io/en/messages/common.html#MISSION_CURRENT) is what the vehicle broadcasts so that the GCS can update the active waypoint. It could be extended to also broadcast the ID (mission hash). The GCS would then know immediately if the vehicle's mission became out of sync with the local copy.

We should be able to request a hash for an entire mission, subset range, or individual mission_item. That could also enable incremental sync.

@LorenzMeier
Copy link
Member

@dagar Proposal makes total sense. @holmnikolaj do you want to extend your PR to include a full proposal?

@hamishwillee
Copy link
Collaborator

hamishwillee commented Jul 5, 2019

Along with broadcast, might be nice to be able to request rebroadcast. So if, for example you missed the mission change message you could still request a new one. Maybe use MAV_CMD_REQUEST_MESSAGE for this.

If we head down this path though it becomes perhaps something that should be done as an RFC.

@holmnikolaj
Copy link
Author

holmnikolaj commented Jul 5, 2019

The points of implementing it as a mission item was that:

  1. The drone does not change the mission itself, so the sync issue will only arise if something happens in the GCS end. As it is now, the solution is to read all waypoints. With this solution it only have to read until the checksum.
  2. We don't have to agree on how the checksum is calculated across PX4, Ardupilot, QGC, MP etc. Any GCS can make each own checksum calculation, leave it in the mission item and then read it again when needed.
  3. It will only add and extra package sent when uploading a mission. No extra communication to clutter the link.

Not saying it has to be that way, just some reasons why I proposed this solution.

@holmnikolaj
Copy link
Author

@holmnikolaj do you want to extend your PR to include a full proposal?
@LorenzMeier What do you mean by full proposal? It was my understanding that everything is autogenerated based on the .xml file. If you refer to support in Px4/QGC etc. I plan on doing the Px4 part when we have agreed on the mavlink part.

@hamishwillee
Copy link
Collaborator

@holmnikolaj Lorenz is probably away today.

What I think he is suggesting is a more fully developed proposal in line with the suggestion @dagar made above: #1171 (comment)

Basically what you proposed is a mission command within the mission. So if you have uploaded a mission previously you get a checksum. If you want to check the mission again you have to kick off another upload. The first item tells you that if the mission has changed, and you can ignore the rest of the items. That is good as far as it goes - but firstly you have to "pull" the information rather than being told that the mission has changed, and secondly this results in a whole sequence of mission upload that you might not actually need (unnecessary messages).

Daniel has proposed that the system might emit the checksum when something changes (and I have suggested you might do so on request). He has also said you could then define different levels of hash - for entire mission, subset range, or individual mission_item (ie to enable incremental sync).

To my mind that is probably more than a PR, but you could start outlining how you would do it.

Or push back if you this does not make sense to you.

@holmnikolaj
Copy link
Author

As I understand it you want a solution where:

  • The checksum is sent as part of MISSION_CURRENT.
  • Possibility to request checksum without reading (parts of) mission from the MAV, which is already there if it is part of MISSION_CURRENT.
  • There is support for checksum for mission sections and individual waypoints.

I think the possibility to have checksums for sections of a mission makes good sense, but if you are in doubt about a single waypoint why not upload it again instead of requesting a checksum for it?

I want a solution where:

  • The checksum is only calculated at the CGS.

That way we avoid miscalculations in the checksum due to float/double precision.
We save calculation time one the drone where it may actually be an issue.
We don't (necessarily) have to agree on how to calculate it.

To get all this I want to propose the following solution:
As part of the mission upload checksum items (as in the original proposal) with this format:

  1. mission checksum 1
  2. mission checksum 2
  3. section start sequence
  4. section end sequence
  5. section checksum 1
  6. section checksum 2
    7 empty

These could be added to a mission in the beginning of the entire mission, and again in in the beginning of mapping areas etc.

The CGS will calculate the checksums, and the MAV just copies the relevant checksums to the MISSION_CURRENT message that will be sent when waypoint reached or by request as usual.

That way we can get an implementation that will work without opening a big can of worms. We have the best of both yours and my idea. This whole thing can be ignored for GCS/autopilots that does not want to use it. When we one day decide to change the structure of missions to support section division and make the MAV calculate the checksum itself, this will still work by just leaving the mission checksum items out of the mission.

The downside is that the mission will contain (otherwise unnecessary) items, but it supports enhancements where the drone calculates it itself and thereby removing the extra mission items.

@DonLakeFlyer
Copy link
Contributor

The checksums on sections seems overly complicated. Although something needs to be done about partial uploads. How do you update the checksum for the entire mission is you are only updating a single item.

Using a new mission item to store a checksum in a mission upload is tricky since old firmwares which don't know about this item will fail the upload. The whole thing is likely to need a capability bit.

  • Possibility to request checksum without reading (parts of) mission from the MAV

Having this as a separate command as well instead of just streaming MISSION_CURRENT seems like a good idea. A GCS doesn't really know the telemetry rate of that stream which can be adjusted. Having a more deterministic way to get it may speed things up in some case.

@prebenfihl
Copy link

prebenfihl commented Jul 9, 2019

I am working with @holmnikolaj on this issue. We have updated the pull request to include some of the comments.

We have reduced the scope of the change to address only a checksum/unique id for the whole mission (I'll comment on our considerations shortly), and we have added a bit to MAV_PROTOCOL_CAPABILITY.

We have introduced a MISSION_UID message rather than extending MISSION_CURRENT. Logically, changes in the current waypoint (broadcasted via MISSION_CURRENT) has nothing to do with the mission UID, so we propose to keep them separate. We considered extending the MISSION_CHANGED message, which is logically a better fit, but requesting MISSION_CHANGED to determine if a GCS mission is in sync doesn't match the intention/description of MISSION_CHANGED very well.

@prebenfihl
Copy link

Regarding partial mission upload: we have not specified this since the protocol for partial uploads is still to be defined (as far as we know) but MISSION_WRITE_PARTIAL_LIST cloud potentially be extended to include the new UID for the whole mission (which the GCS can calculate).

Regarding checksums/ids for parts of the mission: we agree with @hamishwillee that such a multi-level ID requires more than just a PR but we think that the mission UID would be a reasonable feature in itself (hence this proposal). We have marked the four parameters of the MAV_CMD_MISSION_UID cmd that was used for the section checksums/ids in the previous suggestion as reserved. We still see it as a viable solution but we think that the details should be defined in the context of a full multi-level ID solution.

@hamishwillee
Copy link
Collaborator

@prebenfihl @holmnikolaj We have a MAVLink devcall this evening at 5pm AEST - in about 7 hours: https://mavlink.io/en/about/support.html#dev_call
Would you like to attend and discuss this?

As I understand it you still intend that MAV_CMD_MISSION_UID be sent in the mission? As pointed out by @DonLakeFlyer this will break older firmware, which should reject the mission.

Further to that, I'd like the explicit agreement of @dagar that he is OK that this proposal does not preclude partial mission support in future.

The drone does not change the mission itself, so the sync issue will only arise if something happens in the GCS end. As it is now, the solution is to read all waypoints. With this solution it only have to read until the checksum.

Note, that you do need to consider that there may be multiple GCS or GSC and developer APIs affecting the mission. Not just a single GCS.

With respect to partial missions I believe the requirement is well understood, I just can't get the documentation reviewed: mavlink/mavlink-devguide#157

@DonLakeFlyer As an aside, the behaviour for unsupported mission types is not documented in https://mavlink.io/en/services/mission.html - though it makes sense to reject the whole mission if an item is not understood because you don't know how important a particular item is to the whole mission. Does any Firmware do anything different here?

@hamishwillee hamishwillee added the Dev Call Issues to be discussed during the Dev Call label Jul 10, 2019
@hamishwillee
Copy link
Collaborator

Thanks for attending the 20190710-Dev-Meeting @holmnikolaj !

Just to summarise the results, we agree the value of the checksum, and that your general approach is good. However we'd like to use the yet-to-implement mavlink microservices protocol to check the existence of the message rather than a capability bit.

@julianoes will look into defining/implementing that, starting in 4 weeks, and expected to get something working in 3 months.

@hamishwillee hamishwillee removed the Dev Call Issues to be discussed during the Dev Call label Jul 11, 2019
@hamishwillee
Copy link
Collaborator

hamishwillee commented Sep 19, 2019

Hi @holmnikolaj,
FYI, the work for microservices versioning has not yet been done. @julianoes is hoping to assess the work required in the coming weeks.

The discussion here was long ago, so can you please check my summary/recollection of the thinking:

  • GCS generates a mission GUID for the mission it has created and inserts this as a mission item in a MAV_CMD_MISSION_UID (if it can verify the recipient understands this MAV_CMD)
  • GCS can check for mission changes in two ways:
    • Send MAV_CMD_REQUEST_MESSAGE and monitor for the MISSION_UID.
    • Download mission and check MAV_CMD_MISSION_UID. Continue mission download if it has changed, cancel download if it has not changed and just stick with current store mission.
  • GCS should also monitor for MISSION_UID messages; if mismatched this should minimally inform user that stored mission is out of date, and potentially auto-upload new version. This is to cater for case where there are multiple potential setters and observers of missions.
  • Drone should store the UUID as part of the mission, and emit as a MISSION_UID:
    • If requested using MAV_CMD_REQUEST_MESSAGE
    • Whenever the stored mission has changed - ie after an upload, partial upload, or the mission is cleared.

Further thoughts/questions:

  • If a GCS misses a MISSION_UID from another system doing an update it can get out of sync.
  • What if the GCS does a partial mission upload? In theory it should calculate the UUID for the whole mission and include the MAV_CMD_MISSION_UID, but there may be problems with this:
    • If the GCS is out of sync, its calculated MAV_CMD_MISSION_UID would be incorrect.
    • The partial update is not well defined, but we know from the message format it takes a start index and an end index. How will we update the MAV_CMD_MISSION_UID if it is not in the range being updated? What if there are multiples of MAV_CMD_MISSION_UID in the mission after the update?

@holmnikolaj
Copy link
Author

I discussed your questions with @prebenfihl.
We agree on your summary.
Regarding the partial upload, we think that this is beyond the scope of the issue for now. Lets create the UID and then agree on a solution for partial upload when appropriate if that is a fully supported feature (we are not aware of the state of partial uploads).

We did however talk about some ways for handling the partial uploads with multiple GCS connected.

We could have a look at how it is handled in STANAG 4586 where GCSs have different levels of interoperability (LOI) such that only one GCS is in charge and the rest can only listen. It can then hand over the control if needed.

Otherwise we could spend the time between development of UID and Partial upload to agree on how the UID is calculated. That way the drone can respond with a new UID after partial upload and the drone can check if it gets the same.

@hamishwillee
Copy link
Collaborator

Thanks @holmnikolaj - I'm not sure we can treat it as out of scope, because some systems do support it (e.g. ArduPilot - albeit in a horrible and broken way).

Cooperative GCS interoperability is a bigger problem; it is not just a matter of GCS - any mavlink app would have to buy into this in some way. I wouldn't want to go down that path to fix this problem.

Otherwise we could spend the time between development of UID and Partial upload to agree on how the UID is calculated. That way the drone can respond with a new UID after partial upload and the drone can check if it gets the same.

I'm not certain the GCS needs to know how the UID is generated, it just needs to know that it has changed. Depends on whether the GCS uses an id for comparison that it calculates OR it uses an ID that is given by the drone. The first way means that you have have a common mechanism for generating the hash, the second means that you have to have a way of reliably associating the hash with the mission that someone just uploaded.

A possible solution assuming the second option

  1. We add hash/mission number field to MISSION_COUNT and MISSION_ACK
  2. GSC uploads mission.
    • We don't care what the number is in MISSION_COUNT.
    • Final MISSION_ACK from drone includes the drone-calculated hash for mission.
    • Drone also emits the MISSION_UID so everything else knows about this change.
  3. For mission download the first message from drone is MISSION_COUNT which now includes the hash.
    • At this point the GCS could cancel the download, because it knows it has a matching hash.
  4. The MISSION_UID can still be requested using MAV_CMD_REQUEST_MESSAGE.
  5. MAV_CMD_MISSION_UID is no longer needed, which is what breaks partial upload
  6. The same approach works for Partial mission upload (note, link is not approved doc) which completes with MISSION_ACK. It doesn't work for partial download, but that really is out of scope.

What I really like about this is that I think it breaks nothing and could be rolled out straight away without side effects. i.e. if working with a drone or gcs that has not been updated or does not support this feature the fields for the UID will always be zero. We can make that a magic number that means unknown or not supported. Ie if GCS gets zero from upload it knows that the drone doesn't support hash, and if it gets zero from a download, same thing.

@holmnikolaj
Copy link
Author

Your proposal does have some advantages, but the reason why we wanted this in the first place was to ensure that the GCS could compute an ID for comparison.

Consider the case that you made some changes to your mission on your GCS and you don't know if you are in sync (due to program crash etc). The UID is only calculated on the drone and we don’t know if the UID, the GCS has, corresponds to the mission it shows.

To solve that the GCS has to be able to calculate the UID itself and then we are back to all agreeing on how this is done. However, your proposed way of exchanging the UID would still make sense.

It might be the best option but as mentioned earlier I think it makes the implementation a lot more complicated.

However I we consider the UID calculated on the drone to be the "deciding" one then the GCS will just have to know how the autopilot that is connected calculates it. That could be different from autopilot to autopilot, and in case the calculation is not implemented on the GCS it only loses the feature mentioned above.

@hamishwillee
Copy link
Collaborator

hamishwillee commented Sep 23, 2019

@holmnikolaj Sure. Then we just need to include the hash in the mission upload from GCS and have some way to verify before we start that the mechanism is supported.

So perhaps something like:

  1. Check if feature is supported:
    • Maybe assume if MAV_CMD_REQUEST_MESSAGE with MISSION_UID returns any ACK except MAV_RESULT_UNSUPPORTED the feature is supported on the flight stack?
  2. Add field for mission id to MISSION_COUNT
  3. GSC calculates hash and uploads mission with id in MISSION_COUNT.mission_id.
    • GCS does not store this as id unless gets accepted ACK!
    • Drone stores this and returns on request for MISSION_UID
  4. For mission download the first message from drone is MISSION_COUNT which now includes the id
    • At this point the GCS could cancel the download if the id is matching, or continue if it is not
    • An id of zero could mean feature is not supported or some other GCS uploaded the mission. Either way the GCS should continue download (it could then decide to re-upload a mission with the ID if it chose).
  5. Partial mission upload resets the mission id to zero on drone (ie no additional messages or fields required)
  6. Any accepted update to the mission causes drone to emit the MISSION_UID so everything else knows about this change (including full and partial updates)

For the partial upload I don't see any way to make these work "end to end" unless the id is generated on the drone (or both end). Realistically there are bigger problems than this with getting these to work robustly, so for this case just making it clear to other systems they can no longer rely on the stored copy is the important thing.

Any obvious holes in this?

@holmnikolaj
Copy link
Author

holmnikolaj commented Sep 23, 2019

@hamishwillee
What you describe sounds good. If there is consensus about that solution I think we should go with that.

@hamishwillee
Copy link
Collaborator

Cool :-) @julianoes @julianoes @auturgy @DonLakeFlyer There is a way to do this mission checksum without mission commands or breaking GCS or autopilot systems that don't know about it. Broad scope of proposal above #1171 (comment) - but it comes down to adding the checksum in MISSION_COUNT and using the MAV_CMD_REQUEST_MESSAGE as verification that the service is supported.

  1. Does this make sense?
  2. @DonLakeFlyer Any thoughts on how the GCS might calculate the checksum for a mission plan? We need to specify this in a way that is GCS agnostic/can be implemented by anyone.

@prebenfihl
Copy link

A thought on the calculation of the mission UID:
Using a standard checksum or hashing function seems reasonable (CRC, SHA, ...) and when balancing computational complexity with the uniqueness of the UID a CRC32C checksum could be a suggestion.

An array of mission_item fields could be the basis for the checksum but omitting <target_system> and <target_component> (since two drones could fly the same mission). Sequence number (<seq>) of the mission item could probably also be omitted (changing the sequence number is reflected in the changed order of waypoints).

@LorenzMeier
Copy link
Member

LorenzMeier commented Sep 23, 2019 via email

@hamishwillee
Copy link
Collaborator

@DonLakeFlyer Sounds like we're heading towards consensus. I'd rather pay the cost of the drone calculating the hash too, as it allows us to deal with all edge cases.

#4 The request for a UID is made using a COMMAND_LONG. That way firmwares which don't support it can nack the command. Otherwise you will need yet another capability bit. If the firmware Acks, it then also sends a PLAN_UID mavlink message with the values. Maybe more than once to work around noisy comms. If MAV_CMD_REQUEST_MESSAGE can do the same Nacking from firmwares which don't support PLAN_UID then even better your way.

Yes, I am proposing exactly the model that you are - a command protocol request for the UID. This will either be acked (supported) and then the PLAN_UID emitted, or NACKED as not supported. The only remaining decision here is whether the emitted message always contains all triplets, or you individually request UIDs for each part of plan.

Historically we'd use a dedicated command for requesting that PLAN_UID message. My proposal is to instead use MAV_CMD_REQUEST_MESSAGE - a generic single-shot message requestor in which you can specify the message you want emitted. Note that this is implemented in ArduPilot but not yet in PX4. Should be pretty easy to add, and no harder than just adding support for a specific message.

The only point of clarification with the above is that I never refer to using COMMAND_LONG - but to using the command protocol. The nuance being that the command protocol allows you to send the request in either a COMMAND_LONG or a COMMAND_INT, and generally I much prefer the COMMAND_INT (and wish the COMMAND_LONG had never been born :-) ).

@DonLakeFlyer
Copy link
Contributor

The only remaining decision here is whether the emitted message always contains all triplets, or you individually request UIDs for each part of plan.

You individually request to future proof yourself against additional MAV_MISSION_TYPEs showing up in the future. If you get them all at once then PLAN_UID is locked down to current set. Or limited to as many as you can fill into the structure as it grows.

@hamishwillee
Copy link
Collaborator

Yes, that is what I think too. I'm going to disappear for a few days on another task and to give people a chance to comment. Assuming we're all in broad agreement is everyone happy for me to create a PR for the message changes? (that is an open question to all of you). That would happen late next week. If you want it earlier, then anyone is free to do this work.

@prebenfihl
Copy link

Not mentioned, but I might STILL extend MISSION_COUNT to include the triplet in the main sequence.

If MISSION_COUNT is extended then it might be worth also extending MISSION_ACK to include the UID. It is not necessarily required because of the PLAN_UID broadcast but conceptually it fits very nicely with the mission protocol (in stead of listening for the broadcast in a separate mechanism/process).

@DonLakeFlyer
Copy link
Contributor

Assuming we're all in broad agreement is everyone happy for me to create a PR for the message changes?

Fine by me

@stale
Copy link

stale bot commented Apr 23, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Apr 23, 2020
@stale
Copy link

stale bot commented Jun 22, 2020

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

@stale
Copy link

stale bot commented Aug 22, 2020

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

@stale stale bot added the stale label Aug 22, 2020
@hamishwillee hamishwillee removed Dev Call Issues to be discussed during the Dev Call stale labels Aug 23, 2020
@stale
Copy link

stale bot commented Oct 23, 2020

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

@stale stale bot added the stale label Oct 23, 2020
@davidbuzz
Copy link
Contributor

davidbuzz commented Dec 28, 2020

I didn't understand the entire discussion here, a bit above my pay-grade, but i just wanted to check if this will also work in the scenario where the drone itself changes its own mission ( such as a lua script loading a mission from the SD-card, or a lua script adjusting things in the mission on-the-fly), as this is also an increasingly common scanario where the GCS's UI can get out-of-sync with the mission thats in the drone. -eg this example.. https://github.com/ArduPilot/ardupilot/blob/master/libraries/AP_Scripting/examples/mission-edit-demo.lua

@hamishwillee
Copy link
Collaborator

Hi @davidbuzz

The discussion has moved on and the end result is captured in #1172.
In summary, when a mission, geofence or rally point definition is accepted a message containing a hash of the current mission/geofence/rally point is broadcast. This allows other systems to know that the mission has changed/whether the current version they have is correct. Systems can also specifically request the hash.

For this to work as intended, that broadcast has to happen however the mission is updated - Lua, load from file, whatever (technically MAVLink can only "mandate" this for for the MAVLink mission protocol).

@peterbarker Was making the point that if your scripts make frequent updates to a mission this will mean increased computational burden for calculating the hash and increased bandwidth consumption for the mavlink messages. I'm trying to establish whether this is a "real" concern, and if so whether there are ways to handle this - the obvious one being that a flight stack can choose to turn off support using a parameter.

Would love your thinking on this, but please add to #1172

@stale
Copy link

stale bot commented Jun 20, 2021

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

@stale stale bot added the stale label Jun 20, 2021
@hamishwillee
Copy link
Collaborator

This actually delivered into development.xml in #1172 but got pulled out due to toolchain issues. Needs to go back in again and then this can be closed.

@stale stale bot removed the stale label Jun 23, 2021
@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests