Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The sequence number for MavLink messages sent from QGC is not *in sequence* #2627

Closed
dogmaphobic opened this issue Jan 15, 2016 · 36 comments
Closed
Assignees
Milestone

Comments

@dogmaphobic
Copy link
Contributor

While trying to process sequence numbers to keep track of packet loss, I run into this issue. I noticed that there were more than one sequence and some where incremented by two. @billbonney pointed out that the issue is the way QGC packs and sends messages.

As an example, in:

https://github.com/mavlink/qgroundcontrol/blob/master/src/AutoPilotPlugins/AutoPilotPlugin.cc#L99-L106

the message is packed (which calls mavlink_finalize_message(), which calls mavlink_finalize_message_chan, which increments the sequence number):

void AutoPilotPlugin::resetAllParametersToDefaults(void)
{
    mavlink_message_t msg;
    MAVLinkProtocol* mavlink = qgcApp()->toolbox()->mavlinkProtocol();
    mavlink_msg_command_long_pack(mavlink->getSystemId(), mavlink->getComponentId(), &msg, _vehicle->uas()->getUASID(), 0, MAV_CMD_PREFLIGHT_STORAGE, 0, 2, -1, 0, 0, 0, 0, 0);
    _vehicle->sendMessage(msg);
}

When the message is sent in:

https://github.com/mavlink/qgroundcontrol/blob/master/src/Vehicle/Vehicle.cc#L451-L469

It is once again processed through mavlink_finalize_message_chan, which increments the sequence again.

void Vehicle::_sendMessageOnLink(LinkInterface* link, mavlink_message_t message)
{
    // Make sure this is still a good link
    if (!link || !_links.contains(link) || !link->isConnected()) {
        return;
    }
    // Give the plugin a chance to adjust
    _firmwarePlugin->adjustMavlinkMessage(this, &message);
    static const uint8_t messageKeys[256] = MAVLINK_MESSAGE_CRCS;
    mavlink_finalize_message_chan(&message, _mavlink->getSystemId(), _mavlink->getComponentId(), link->getMavlinkChannel(), message.len, messageKeys[message.msgid]);
    // Write message into buffer, prepending start sign
    uint8_t buffer[MAVLINK_MAX_PACKET_LEN];
    int len = mavlink_msg_to_send_buffer(buffer, &message);
    link->writeBytes((const char*)buffer, len);
}

That explains why I'm seeing increments by 2 but I have not figured out why I see multiple sequences.

@DonLakeFlyer
Copy link
Contributor

@LorenzMeier Is there some other way to do this? The pack/finalize/send pattern is used pretty much everywhere.

@LorenzMeier
Copy link
Member

I think the pack API is just terribly broken. We need to fix the MAVLink serializing code.

@LorenzMeier
Copy link
Member

I'm too tired to fix it. But if you get to it, here is where it needs to be changed upstream: https://github.com/mavlink/mavlink/tree/master/pymavlink/generator/C/include_v1.0

@DonLakeFlyer
Copy link
Contributor

Is there really any reason to change it now (with V2 coming)? It doesn't cause a bug anywhere does it.

@dogmaphobic
Copy link
Contributor Author

Some more findings. I put a qDebug() for each instance where messages are written out. These are the 2 sequences:

MAVLinkProtocol::_sendMessage(1):  1
Vehicle::_sendMessageOnLink():  0
Vehicle::_sendMessageOnLink():  1
Vehicle::_sendMessageOnLink():  2
MAVLinkProtocol::_sendMessage(1):  3
MAVLinkProtocol::_sendMessage(1):  5
MAVLinkProtocol::_sendMessage(1):  7
MAVLinkProtocol::_sendMessage(1):  9
MAVLinkProtocol::_sendMessage(1):  11
MAVLinkProtocol::_sendMessage(1):  13
Vehicle::_sendMessageOnLink():  3
Vehicle::_sendMessageOnLink():  4
Vehicle::_sendMessageOnLink():  5
Vehicle::_sendMessageOnLink():  6
Vehicle::_sendMessageOnLink():  7
Vehicle::_sendMessageOnLink():  8
Vehicle::_sendMessageOnLink():  9
MAVLinkProtocol::_sendMessage(1):  15
MAVLinkProtocol::_sendMessage(1):  17
MAVLinkProtocol::_sendMessage(1):  19
MAVLinkProtocol::_sendMessage(1):  21
MAVLinkProtocol::_sendMessage(1):  23
MAVLinkProtocol::_sendMessage(1):  25
MAVLinkProtocol::_sendMessage(1):  27
MAVLinkProtocol::_sendMessage(1):  29
MAVLinkProtocol::_sendMessage(1):  31
MAVLinkProtocol::_sendMessage(1):  33
MAVLinkProtocol::_sendMessage(1):  35
MAVLinkProtocol::_sendMessage(1):  37
MAVLinkProtocol::_sendMessage(1):  39
MAVLinkProtocol::_sendMessage(1):  41
MAVLinkProtocol::_sendMessage(1):  43
MAVLinkProtocol::_sendMessage(1):  45
MAVLinkProtocol::_sendMessage(1):  47
MAVLinkProtocol::_sendMessage(1):  49
MAVLinkProtocol::_sendMessage(1):  51
MAVLinkProtocol::_sendMessage(1):  53
MAVLinkProtocol::_sendMessage(1):  55
MAVLinkProtocol::_sendMessage(1):  57
MAVLinkProtocol::_sendMessage(1):  59
Vehicle::_sendMessageOnLink():  10
MAVLinkProtocol::_sendMessage(1):  61
MAVLinkProtocol::_sendMessage(1):  63
MAVLinkProtocol::_sendMessage(1):  65
MAVLinkProtocol::_sendMessage(1):  67
MAVLinkProtocol::_sendMessage(1):  69
MAVLinkProtocol::_sendMessage(1):  71
MAVLinkProtocol::_sendMessage(1):  73
MAVLinkProtocol::_sendMessage(1):  75
MAVLinkProtocol::_sendMessage(1):  77
MAVLinkProtocol::_sendMessage(1):  79
MAVLinkProtocol::_sendMessage(1):  81
MAVLinkProtocol::_sendMessage(1):  83
MAVLinkProtocol::_sendMessage(1):  85

@DonLakeFlyer
Copy link
Contributor

The sequence number you are outputting from Vehicle are bogus and get stomped. The ones that go out on the wire are from MAVLinkProtocol.

@dogmaphobic
Copy link
Contributor Author

Is there really any reason to change it now (with V2 coming)? It doesn't cause a bug anywhere does it.

No, there isn't really. It makes sense to wait. I was more curious about the secondary sequence.

The sequence number you are outputting from Vehicle are bogus and get stomped. The ones that go out on the wire are from MAVLinkProtocol.

But I (the ESP8266 Bridge) get them as above. I do see these two sets of sequences where one is incremented by 1 and the other by 2.

@DonLakeFlyer
Copy link
Contributor

Vehicle::_sendMessageOnLink() doesn't actually send the message. So those values never go out the wire. The likely reason you are seeing multiple sequences is because there is a codepath which doesn't use the pack/finalize/send pattern. Some places use encode which may cause different results. I don't really know what the difference is between pack and encode. Just kept the pattern from old code.

@billbonney
Copy link
Contributor

The advantage of fixing it, is that the MAVESP8266 FIRMWARE can give you an estimate of packet loss from it's perspective.

@DonLakeFlyer I concur that I followed the same pattern in code, it wasn't until I saw an issue where UDP packet send was failing that I noticed an issue. Solo requires a full packet in each UDP packet or it will just ignore them.

I fixed by 'packing' the message first, the calling send function, but I could have (@tridge mentioned this) used these macros
https://github.com/mavlink/mavlink/blob/master/pymavlink/generator/C/include_v1.0/mavlink_helpers.h#L144

I think the pack API (as @LorenzMeier mentioned) should only pack, and not call finalize

(encode just passes a message_t object as the param, and not the individual params for the struct)

@DonLakeFlyer
Copy link
Contributor

I fixed by 'packing' the message first,

I don't quite get what you are saying. What mavlink API sequence did you use? or did you do something custom?

@dogmaphobic
Copy link
Contributor Author

Vehicle::_sendMessageOnLink() doesn't actually send the message.

Yes, it does! Right at the end of the function:

uint8_t buffer[MAVLINK_MAX_PACKET_LEN];
int len = mavlink_msg_to_send_buffer(buffer, &message);
link->writeBytes((const char*)buffer, len);

The advantage of fixing it, is that the MAVESP8266 FIRMWARE can give you an estimate of packet loss from it's perspective.

I'll take a look at those functions and see what I can come up with.

@DonLakeFlyer
Copy link
Contributor

Hmm, forgot about that. I'm going to have to look through that. Must be old UAS code that is calling through mavlink to send.

@DonLakeFlyer
Copy link
Contributor

The skipping sequence numbers from MAVLinkProtocol are coming from the heartbeat output which is handled there. It uses encode/finalize/send_buffer/writeBytes which is a different api pattern than Vehicle uses. If you change that api sequence to be like Vehicle does it then I think you won't skip sequence numbers.

@dogmaphobic
Copy link
Contributor Author

If you change that api sequence to be like Vehicle does it then I think you won't skip sequence numbers.

Yep, I was just following that thread of code. I will fix that and route the Vehicle send as well so it follows the same sequence.

@DonLakeFlyer
Copy link
Contributor

The correct direction is to move the heartbeat and any outtbound messages from mavlink protocol to Vehicle. Vehicle should be the central point of message sending.

@dogmaphobic
Copy link
Contributor Author

Vehicle should be the central point of message sending.

I moved it to LinkManager. All sends are now from LinkManager. The interesting side effect is that I no longer see drops on incoming UDP messages.

The issue with Vehicle is that most calls are within a loop sending to all links. It made more sense to locate the call within LinkManager.

While at it, on current master, I can no longer have more than one vehicle. I will open a new issue for that.

@DonLakeFlyer
Copy link
Contributor

I'm going to have to look at the pull. I would much rather have it in Vehicle. Since message sending interacts with plugins and you need the Vehicle to get the plugin. Not sure what the benefit is to route it to yet another object. Just seems to complicate things more. With it in Vehicle message send starts and stops in Vehicle.

@dogmaphobic
Copy link
Contributor Author

I'm going to have to look at the pull. I would much rather have it in Vehicle. Since message sending interacts with plugins and you need to the Vehicle to get the plugin.

I noticed that. In that case, it should be in MultiVehicleManager as Vehicle is one instance of many. Instead of looping through links, it should loop through Vehicles instead. This will take some careful thought. Let me think about it.

@DonLakeFlyer
Copy link
Contributor

Instead of looping through links, it should loop through Vehicles instead.

That's not the way it works. A message send is directly connected to a vehicle and only goes out on links which are associated with that Vehicle. Not to all links. Hence the send all being scoped to the Vehicle object.

@dogmaphobic
Copy link
Contributor Author

That's what I meant about careful thought. I have not done that yet :)

Suffice to say, just by consolidating all sends not only I got the sequence issue fixed, but also I'm not seeing UPD packet drops. Now it's a matter of having it in the right place.

@DonLakeFlyer
Copy link
Contributor

I think we may be talking about two different things. The only change that should be needed is to move GCS heartbeat to MultiVehicleManager (like you said). Then it loops through Vehicles sending out heartbeats using Vehicle::sendMessage. I think I said QGC heartbeat in Vehicle before, that is wrong.

I was thinking you were talking about changing the location of Vehicle::sendMessage. That should not need to change.

Sorry for the confusion.

@LorenzMeier Is that multiplexing stuff in MAVLinkProtocol still needed?

@dogmaphobic
Copy link
Contributor Author

The Good:

Most messages throughout QGC are sent on a specific link (sendMessage(link, message)). This fits well with the send being centralized within the Vehicle.

Now the things that got me confused:

The Bad:

Besides legacy stuff in UAS.cc and MavLinkProtocol.cc (more on that later), the only place where a message is sent to all connected links is in APMSensorsComponentController::nextClicked(void)

https://github.com/mavlink/qgroundcontrol/blob/master/src/AutoPilotPlugins/APM/APMSensorsComponentController.cc#L471-L481

Is there a reason that message is sent to everything connected?

The Ugly:

MavLinkController.cc:

(all messages sent off its own thread)

It sends the heartbeat and, if enabled, it also sends mavlink_auth_key at the same time. These messages are sent to ALL connected links.

It responds to pings by sending mavlink_msg_ping replies to ALL connected links, not just the one it came from.

Finally, if multiplexing is enabled, it forwards all messages to all links but the one it came from.

UAS.cc has 16 calls to sendMessage() (plus one to executeCommand) and all of those are sent to ALL links. Things like startCalibration(), stopCalibration(), etc. At least they are sent though Vehicle, which makes sure they are sent within the proper thread.

Also note that if multiple vehicles are connected through UDP, every message sent out is broadcast to them all. This is expected behavior but it does't hurt to point that out.

I have some ideas how to clean all this up (yes, by consolidating send message control to Vehicle) but Don seems to be worried so I'm asking, should I do something or do you want to handle it yourself?

  • What is the proper behavior expected from Multiplexing?
  • Are ping replies supposed to be sent out to everything and everyone?
  • Are those auth messages ever used? If so, are they meant to be sent along with every heartbeat?
  • As to UAS.cc, shouldn't calls like startCalibration() and stopCalibration() be sent only to one target vehicle?
  • And finally, out of curiosity, command_long messages are sent but none of them pays attention to command_ack responses. Is that normal?

@DonLakeFlyer
Copy link
Contributor

You are stepping into the realm of many to one and one to many relationship of links and Vehicles. You might want to run as fast as you can away from having to understand this! It can make your head spin. It certainly confuses me everytime I step into it as to how it is supposed to work.

In the last couple days the concept of sending a message out on a specific single link associated with a vehicle was added. Previously all messages went out on all links associated with a Vehicle. Anything that only sends on a single link is new. At this point these single link send message places are only ones which are associated with a "mini-protocol" (param, mission, ftp, ...) where having multiple mavlink connections to a vehicle responding to the protocol would break the protocol. All other messages not associated with a "mini-protocol" are sent to all links associated with a Vehicle. Lorenz and I discussed this last week as being the correct approach. Thining about this again, I think it is likely incorrect. If you send a COMMAND_LONG on all vehicle links what happens? I think badness happens most likely.

If everthing moves to going out on a single link, then why do you need multiple link support inbound? I guess the streaming messages have a better change of getting to you? Other that that, everything else will end up on one link.

The reason I'm worried about you changing anything is that this multi-link stuff has been a bit of a headache and as nasty and full of boundary cases as multi-vehicle. Then put multi-vehicle together with multi-link and you have a many to many Link<->Vehicle relationship which makes your eyes roll up into the back of your head.

You are correct that command acks are disregarded. Normally if a command fails on the firmware side, it spits out a STATUS_TEXT saying something about why the command failed.

This is all stuff that i have never looked at before, which I am not sure if it is even still used. Lorenz will have to comment:

  • What is the proper behavior expected from Multiplexing?
  • Are ping replies supposed to be sent out to everything and everyone?
  • Are those auth messages ever used? If so, are they meant to be sent along with every heartbeat?

@DonLakeFlyer
Copy link
Contributor

My suggestion: Let me fix the heartbeat problem. Then let's see what @LorenzMeier has to say about the rest.

@DonLakeFlyer
Copy link
Contributor

Looked into these things:

  • PING message: Seems to me that should be handled by Vehicle. It has to come inbound from a specific vehicle. Not sure why you would send a response out to all links available. Plan to move to Vehicle and handle there. Or just toss it if nothing ever sends this message.
  • Auth stuff: [UPDATE] This is related to the multiplexing support. Which as far as I can tell is not really hooked up correctly. The only place it exists is in the settings dialog. Nothing at boot references these values. So going to settings does some special setup of UDP ports which seems very odd. Could all of this be removed?

@DonLakeFlyer
Copy link
Contributor

More questions:

  • Does there really need to be an option to turn off GCS hearbeats?

@DonLakeFlyer DonLakeFlyer added this to the Release v3.0 milestone Jan 16, 2016
@DonLakeFlyer DonLakeFlyer self-assigned this Jan 16, 2016
@billbonney
Copy link
Contributor

Turn off heartbeats from GCS. I say yes as it handy to debug sometimes, and test.

@DonLakeFlyer
Copy link
Contributor

Sounds good @billbonney Thanks.

@DonLakeFlyer
Copy link
Contributor

I"m wrong about the authkey. It did go out with the heartbeat. And I just broke that with my last pull!

@DonLakeFlyer
Copy link
Contributor

Searched PX4 and APM source. No use of AUTH_KEY

@DonLakeFlyer
Copy link
Contributor

@LorenzMeier ?

@DonLakeFlyer
Copy link
Contributor

What is the proper behavior expected from Multiplexing?
Are ping replies supposed to be sent out to everything and everyone?
Are those auth messages ever used? If so, are they meant to be sent along with every heartbeat?

@LorenzMeier ?

@DonLakeFlyer
Copy link
Contributor

@LorenzMeier ?

@LorenzMeier
Copy link
Member

The auth messages could become handy in wireless networks with many drones and GCS, but I think more for 3.1.

@NaterGator
Copy link
Contributor

For lossy networks it would also be nice to give consideration to closed loop message delivery specs in future mavlink versions.

@DonLakeFlyer
Copy link
Contributor

I believe this is all dealt with now that I've remove the auth stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants