Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev: add new message for component capability #1724

Merged
merged 4 commits into from
Nov 25, 2021
Merged

Conversation

julianoes
Copy link
Collaborator

This moves the existing but unesd COMPONENT_CAP_FLAGS enum from common.xml to development.xml.

This came up because a gimbal manufacturer would like to have an easy way to know if a system (drone or camera) supports the v2 gimbal protocol.

The alternative is sending a MAV_CMD_REQUEST_MESSAGE for GIMBAL_INFORMATION which might work for this case but not well for any case, e.g. MAVLink FTP or params. The challenge when trying to detect these protocols is usually whether a timeout means it's not supported or whether the link was bad at the time we wanted to find out (e.g. at the beginning when you get a lot of initialization traffic).

This brings back part of what was removed in #1408.

@hamishwillee
Copy link
Collaborator

This reinforces for me that OllieW was at least partially right ;-(.

We turned the COMPONENT_INFORMATION message into a mechanism for component metadata, and our intent was that information like you have added here would be in the metadata file. However there is a strong case that most of the metadata is of most use for a GCS, and that there is a need for at least some of the metadata to be available in messages.
In other words, we should have kept COMPONENT_INFORMATION as-is and created a new message COMPONENT_METADATA for the metadata.

By analogy, the camera information message shares information that is useful to a GCS in the information files, but it keeps the subset of information that is useful to all components in MAVLink, so that they don't have to implement mavftp or anything else for "basic use".

Upshot, I fully support the idea of getting this info back into a MAVLink message, but I think we should take the opportunity to work out whether some of the other information that was in COMPONENT_INFORMATION should also be recovered. Pity that name is no longer available - I don't think we can rename COMPONENT_INFORMATION into COMPONENT_METADATA at this late stage.

Thoughts?

@bkueng
Copy link
Member

bkueng commented Oct 20, 2021

I don't think we can rename COMPONENT_INFORMATION into COMPONENT_METADATA at this late stage.

Renaming is fine with me, assuming it does not change the on-wire bytes, and IMO makes sense.

@hamishwillee
Copy link
Collaborator

hamishwillee commented Oct 21, 2021

EDITED; @bkueng confirmed offline we can't rename.

@julianoes Do you have an opinion otherwise about whether capability information should be part of "general component" info or you prefer the stand alone approach you have here? I assume 64 is enough options.

We typically do aggregate common info together to simplify things for users.

@julianoes
Copy link
Collaborator Author

I would be ok adding it to COMPONENT_INFORMATION but then it would prevent truncating there for the last uri field.

@hamishwillee
Copy link
Collaborator

I would be ok adding it to COMPONENT_INFORMATION but then it would prevent truncating there for the last uri field.

I don't want to add it to COMPONENT_INFORMATION. I am proposing a new message that recreates the COMPONENT_INFORMATION before it was turned into a mechanism for requesting component metadata.

Why? Because there are some things you might reasonably include in the MAVLink layer rather than via metdata. We pushed all those things into the metadata which makes them unusuable to anything that can't implement MAVFTP and a parser. I think that was a mistake.

If we we add this to COMPONENT_INFORMATION the first thing Don will say is "why this metadata and not X, Y, Z". I'd rather do that in a clean message.

We can chat over this if you like on a call.

@julianoes
Copy link
Collaborator Author

If we we add this to COMPONENT_INFORMATION the first thing Don will say is "why this metadata and not X, Y, Z". I'd rather do that in a clean message.

I guess that's why I created a new message in my PR 😄.

@hamishwillee
Copy link
Collaborator

I guess that's why I created a new message in my PR 😄

@julianoes :-)

Yes, and COMPONENT_CAPABILITY is a fine name provided we can agree:

  • this is the only component specific information we will ever need a message for, or
  • if we want more component specific information we can give that information unique named messages too, so that we aren't adding (say) "software version" to a message named COMPONENT_CAPABILITY.

Upshot, we need to decide whether there is any information a component really needs now available in messages. The old component information had: vendorName, modelName, firmwareVersion, hardwareVersion.

Yes, I know this would be duplicating the general metadata. The point is whether a component might reasonably need this which does not implement the component information protocol.

@olliw42 If we were to re-invent the old-style component information message, what fields would your components generically need? Anything other than: capabilities, vendorName, modelName, firmwareVersion, hardwareVersion. Please no "I told you so".

@olliw42
Copy link
Contributor

olliw42 commented Oct 27, 2021

Please no "I told you so".

to the contrary, I think changing an opinion and openly deserves all respect. My respect, sir.

what fields would your components generically need? Anything other than: capabilities, vendorName, modelName, firmwareVersion, hardwareVersion

concerning the design of the messages, I would need more time to think through it, but am afraid I may not find it soon ... I mean, I'm not even managing to respiond to the invalid thing in timely fashion currently :(
many good ideas and concepts were explored and foreshadowed since the first suggestion, and it might be worthwhile to put them all on the table and see what today would be the best approach.

My thought at this point in time would be that it probably could look similar to the first suggestion, including the def file, but maybe extended with a bit flag of what sort of info it may contain. My major concern with the whole def file thing is maintainability, i.e., if one just wants to correct a single line to remove let's say a bug in the def file, how could this be handled properly. Unless the component has sort of a file system which is read/writeable by the user, but which is potentially too "expensive" for many components, I don't see solutions for this so far.

At this point in time I would not be so convinced that putting the indicator that gimbal v2 is supported into a general capability flag. I would do so for mavftp, but I'm not convinced for gimbal v2. This may sound contradictory, and maybe it is, but to me it looks like not cleanly disentangling what is component generic and component type specific. Mavftp to me looks generic, as any component may support mavftp, (for whatever purpose doesn't matter, just the fact itself), while the gimbal v2 would be only for gimbals. A camera or servo or gps or imu and so one would never support it, but may support mavftp. In short, you need to know alread that a thing is a gimbal, but once you know this one also equally can ask for a gimbal related message.
Another concern, but of only technical nature, if the generic cap field would include the component specific ones, it may potentially run out of space quickly. The number of microservices and their versions is increasing rapidly these days as it seems (which is good).
We also could look at MAV_PROTOCOL_CAPABILITY and it's fate ... and flaws, and try to learn from it. I guess it is hard to not argue that it has become a wild collection of flags, with little concept. E.g., I remember that a couple years back I've asked about what some of the flags are meaning and noone knew. However, since it is already there, one could also just argue, ok, let's go on with it. GimbalV2 could then just become MAV_PROTOCOL_CAPABILITY_GIMBAL_V2.
I guess what I'm trying to say is that it probably would be worthwhile to take a step back and look at all this with a wider perspective.

@julianoes
Copy link
Collaborator Author

I can see how it would be convenient to add version and manufacturer/vendor name and co. but I don't need it at this point. Also, it's ok to have more flexibility for that sort of thing, e.g. it's nice to have just a string for the version for things like v1.2.3beta4.

@hamishwillee
Copy link
Collaborator

Maybe we discuss in the next mav call @julianoes. I have no strong opinion other than I'd rather we agreed an approach than knee-jerked.

@julianoes julianoes changed the title dev: add new mesage for component capability dev: add new message for component capability Oct 28, 2021
@hamishwillee
Copy link
Collaborator

As I recall the dev call discussion, action is for you to add the lost component information, and request comment from Ollie. The update is broadly approved, so it can then go in - ideally sometime next week.

@olliw42
Copy link
Contributor

olliw42 commented Nov 13, 2021

time_boot_ms is duplicate (and probably causes the fail ?)

I think gimbal_device_information would be a good template for what fields to add (wo the angles of course).
E.g. I continue to like much the custom fields. The uid should also come in handy.

there could have been (should be?) also a field for accounting for #1727 ;)

128 bits looks massive.
I am not fully sure what the intention with this message is. It could make sense to add also a component type specific cap_flags field, i.e., which would hold different bits depending on whether it is a autopilot, servo, gps, and so on. If you came up with 128 bits to embrace component type specific flags, it might be worthwhile to consider separating them out into an extra field type_specific_cap_flags.

What would be the relation with camera_information and gimbal_device_information?
For camera_information I think it was already concluded that it is too late and it shouldn't be changed. But is this true also for gimbal_device_information, shouldn't/couldn't it be stripped down?

I spent a bit time on researching, but couldn't really figure out the current state of affairs regarding the json meta data file thing. Maybe you could clarify that a bit,
I would think that - while I agree not absolutely mandatory for the working - it could be useful to have that flag which you once had to indicate which info is in the meta data file in this message here. As it would allow an implementation to quickly decide if it is worthwhile to trigger the whole meta data retrieval process at all (which otherwise would add another two steps procedure, first request metadata_info and then the mavftp)

Just for completeness, I also think that there is no need for a specific metadata_info message, as the url and all else could also be included here, similar to the original proposal, but I can see that this is not what you want to do.

:)

@hamishwillee
Copy link
Collaborator

hamishwillee commented Nov 14, 2021

@olliw42 This is essentially parallel to the component metadata - which is progressing - being used for parameters and now for UI config of motors/outputs.

We're adding back the information that was lost when metadata was added, so that components are not forced to implement the metadata protocol for that set of information we originally had in the COMPONENT_INFORMATION message. We're open to considering other fields and use cases.

FYI Julian is on holiday for a week. I'll leave him to comment on the rest. I have no strong opinions, other than the usual one - let's get this right because breaking things is hard.

@olliw42
Copy link
Contributor

olliw42 commented Nov 15, 2021

many thx for the info :)

@julianoes
Copy link
Collaborator Author

@olliw42

My major concern with the whole def file thing is maintainability, i.e., if one just wants to correct a single line to remove let's say a bug in the def file, how could this be handled properly.

I don't understand this problem. If there is a bug in the def file, then it has to be fixed, and the version of it will be incremented. This change then has to ship with whatever hardware this comes with, like any other software update. Whenever this version is incremented, it means that a ground station has to download it again, e.g. using mavlink ftp locally or http from a server.

Mavftp to me looks generic, as any component may support mavftp, (for whatever purpose doesn't matter, just the fact itself), while the gimbal v2 would be only for gimbals.

I disagree here as the gimbal manager can for instance be an autopilot or a camera.

Another concern, but of only technical nature, if the generic cap field would include the component specific ones, it may potentially run out of space quickly. The number of microservices and their versions is increasing rapidly these days as it seems (which is good).

Yes and no. We have added quite a few in the last years but I don't think they will keep coming at this point. I wouldn't know what is next, honestly.

128 bits looks massive.
I am not fully sure what the intention with this message is.

And that's answering your concern above. 128 just in case. My opinion is that 64 is enough but in the call the suggestion was to reserve 64. Personally, I would leave it at 64, and then add another 64 as an extension once required.

What would be the relation with camera_information and gimbal_device_information?

These are specific gimbal capabilities rather than generic MAVLink protocol capabilities.

Just for completeness, I also think that there is no need for a specific metadata_info message, as the url and all else could also be included here, similar to the original proposal, but I can see that this is not what you want to do.

I absolutely agree and that was the first iteration of that message! 😠 However, it was decided that there is no space for it and it was all removed, instead two urls were added which - in my opinion - could be just one, and then the message can be sent multiple times if required.
Anyway, I think that message is what it is, it's implemented and it will be too much of a disruption to remove it now. Hence this additional message.

This moves the existing but unesd COMPONENT_CAP_FLAGS enum from
common.xml to development.xml.
This renames the new message from COMPONENT_CAPABILITY to
COMPONENT_INFORMATION_BASICS and adds software/hardware version,
vendor/product name, as well as a reserved capability field.
Copy link
Collaborator

@hamishwillee hamishwillee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @julianoes (& @olliw42 ).

  • I renamed to remove plural COMPONENT_INFORMATION_BASIC
  • I removed the duplicate timestamp
  • Reordered the flags to the end - just to my taste (of course they will be reordered by the system and can't be truncated).

@hamishwillee
Copy link
Collaborator

@julianoes This looks pretty much as discussed/agreed. I won't merge it until we've chatted.

I was thinking a bit about the flags and the approach you've used. If we're not going to allocate flags2 stuff now, is there any point having the flags reserved? Or to put it another way, why don't we make this a byte and extend it when needed?

Also why we don't do this generally? It seems to me that you waste one bit every flags field, but - but you are never left with this huge mostly unused space in your message. If you add extensions, at most you're costing yourself and extra byte. And of course if we are confident that this is likely to be more than 7 flags we can always make the first one into 16 bits.

So now we have a message with just one field:

<field type="uint8_t" name="component_cap_flags1" enum="COMPONENT_CAP_FLAGS1" display="bitmask">Component capability flags 1</field>

and enum

    <enum name="COMPONENT_CAP_FLAGS1" bitmask="true">
      <description>Component capability flags 1 (Bitmap)</description>
      <entry value="1" name="COMPONENT_CAP_FLAGS1_PARAM">
        <description>Component has parameters, and supports the parameter protocol (PARAM messages).</description>
      </entry>
      ...
      <entry value="128" name="COMPONENT_CAP_FLAGS1_MORE_FLAGS">
        <description>Component supports additional flags in field component_cap_flags2 (may be undefined).</description>
      </entry>
    </enum>

Which we can extend by adding:

<field type="uint8_t" name="component_cap_flags2" enum="COMPONENT_CAP_FLAGS2" display="bitmask">Component capability flags 2</field>

and enum

    <enum name="COMPONENT_CAP_FLAGS2" bitmask="true">
      <description>Component capability flags 2 (Bitmap)</description>
      ...
      <entry value="128" name="COMPONENT_CAP_FLAGS2_MORE_FLAGS">
        <description>Component supports additional flags in field component_cap_flags3 (may not be define.</description>
      </entry>
    </enum>

And of course we'd start setting the COMPONENT_CAP_FLAGS1_MORE_FLAGS bit so that even things that hadn't updated would know they were missing out :-)

Copy link
Collaborator Author

@julianoes julianoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm ok with this 👍

@julianoes
Copy link
Collaborator Author

I was thinking a bit about the flags and the approach you've used. If we're not going to allocate flags2 stuff now, is there any point having the flags reserved? Or to put it another way, why don't we make this a byte and extend it when needed?

I agree that allocating 128 bits now is too much. But I would not go down to one byte, otherwise you have to extend every 8th capability that you want to add. Hence I would suggest to add 64 now, it seems like something that should cover it for a few years.

In my opinion 64 bits are enough for now. If we need more later we can
add another 64 bits in an extension.
@hamishwillee
Copy link
Collaborator

Merging, as discussed in the dev call. Thanks all.

@hamishwillee hamishwillee merged commit 28163a3 into master Nov 25, 2021
@julianoes
Copy link
Collaborator Author

Thanks @hamishwillee!

@olliw42
Copy link
Contributor

olliw42 commented Nov 25, 2021

My major concern with the whole def file thing is maintainability, i.e., if one just wants to correct a single line to remove let's say a bug in the def file, how could this be handled properly.

I don't understand this problem. If there is a bug in the def file, then it has to be fixed, and the version of it will be incremented. This change then has to ship with whatever hardware this comes with, like any other software update. Whenever this version is incremented, it means that a ground station has to download it again, e.g. using mavlink ftp locally or http from a server.

you have only exactly one way fo providing the def file in mind, namely "shipped" with it. However, it also could be in the cloud with an http url ...
anyway :)

Mavftp to me looks generic, as any component may support mavftp, (for whatever purpose doesn't matter, just the fact itself), while the gimbal v2 would be only for gimbals.

I disagree here as the gimbal manager can for instance be an autopilot or a camera.

you missed the argument. The gimbal manager would have to send gimbal_manager_status anyway, and there would also have to be a gimbal heartbeat in order to not break all of mavlink apart.
anyway :)

What would be the relation with camera_information and gimbal_device_information?

These are specific gimbal capabilities rather than generic MAVLink protocol capabilities.

yeah, but I was asking

For camera_information I think it was already concluded that it is too late and it shouldn't be changed. But is this true also for gimbal_device_information, shouldn't/couldn't it be stripped down?

this still would make lots of sense to me, if possible

no uid ?? sad
no custom_name ?? sad
no custom_cap_flags ?? sad
no meta data contents flags ?? sad
no board id ?? sad (would provide a generic means for #1727 and wouldn't need such hacky stuff)

@julianoes
Copy link
Collaborator Author

Thanks @olliw42, that's good points.

you have only exactly one way fo providing the def file in mind, namely "shipped" with it.

But that one you can update with normal firmware updates, right? In my mind this file is very much tied to the firmware version, so I don't see how you would want to change/improve that.

The gimbal manager would have to send gimbal_manager_status anyway,

Yes that's exactly the point that I'm trying solve. If we have this capability flag, then the gimbal manager does not have to send gimbal_manager_status for discovery which can be wasteful and suboptimal. And means a ground station has to wait for quite some time to be sure what's out there. I learnt that this is suboptimal and a quick request for capabilities would probably be better. That's really the whole point of why this PR exists. I realize that I should have spelled that out explicitly.

and there would also have to be a gimbal heartbeat in order to not break all of mavlink apart.
anyway :)

That one is debatable. This goes to the core of component IDs (or rather MAV_TYPEs) and what role they play. Are they for capability, or are they just an identification. At this point I'm starting to think that maybe the way we designed it for the gimbal manager is too confusing. You don't know how many times I have explained "a component is not a gimbal manager, a component implements a gimbal manager. Maybe it's time to give up on that concept and instead use multiple heartbeats to deal with this. However, this would probably mean that we have to just ignore the sequence numbers in the header, at least as an indicator for lost messages. They already do not work for this purpose as soon as a mavlink router is on the link sending non-broadcast messages to only a specific component. (Sorry, I'm opening another can of worms here but I could not explain it without all that context.)

this still would make lots of sense to me, if possible
no uid ?? sad
no custom_name ?? sad
no custom_cap_flags ?? sad
no meta data contents flags ?? sad

Oh, so you want to remove these flags? But then add them for COMPONENT_INFORMATION_BASICS? I guess we could do that. It will create some churn given Gremsy is already implementing v2, but there is a chance this could be done.

@olliw42
Copy link
Contributor

olliw42 commented Nov 26, 2021

The gimbal manager would have to send gimbal_manager_status anyway,

Yes that's exactly the point that I'm trying solve. If we have this capability flag, then the gimbal manager does not have to send gimbal_manager_status for discovery which can be wasteful and suboptimal. And means a ground station has to wait for quite some time to be sure what's out there. I learnt that this is suboptimal and a quick request for capabilities would probably be better. That's really the whole point of why this PR exists. I realize that I should have spelled that out explicitly.

I think I got the intention without you saying so explicitly. Nevertheless, I think you're argument is somewhat flawed because the gimbal manager will send status whatever what. It's not like you first would enable it based on whether what you find in component_info_basic . You just need to spy, as much as you "only" need to spy to heartbeats. That's different to e.g. parameters, Here you first need to actively trigger the action. In fact, by listening to the status you may know sooner about it than with the comp_info_basic thing ;)

Sure, having the flag for v2 doesn't hurt.

BUT, the problem I have with what I'm seeing here is that it rather looks like a quite significant change of principle of operation. The presence of a gimbal manager should be detected by the presence of the status message. But I somehow have a worry that this is changed now to looking at comp_info_basic. I mean, if you suggest that GCSes should now work dual ways, detecting a gimbal manager by both status and comp_info_basic, ok ... but then you can't really argue that it makes life easier, right !! But to replace listening to status by having to ask for comp_inf_basic and then to listen to status just doesn't make any sense to me.

I mean, for things like params, mavftp, and many other the comp_info_basic_ totally makes sense (e.g param ext vs non ext, what a relieve!!), and indeed makes it easier. But cameras and gimbals are not among them.

That one is debatable. This goes to the core of component IDs (or rather MAV_TYPEs) and what role they play.

one could debate it ... but it is not really debatable.

As I've said in another thread, if you are going to debate this (= break it up) you will open a completely new "box of pandora" as mavlink - for as long as it is as it was so far - has no concept for dealing with things which are not existant to it.

Oh, so you want to remove these flags? But then add them for COMPONENT_INFORMATION_BASICS? I guess we could do that. It will create some churn given Gremsy is already implementing v2, but there is a chance this could be done.

I want to suggest to consider if that could be a possibility ;) If possible it could be a significant streamlining (so that only camera_info is the odd duck)

Irrespective of that, in any case, I want to suggest to add the mentioned flags to COMP_INFO_BASIC ... so that one can benefit for all components

@julianoes
Copy link
Collaborator Author

by listening to the status you may know sooner about it than with the comp_info_basic thing ;)

Maybe, if it's sent with 1 Hz. I've recently seen systems where a lot of the bandwidth is consumed by camera and gimbal status (and information 🤦) and so I had to realize I was a bit careless around that. Once you have a network of a few components and stuff is broadcast everywhere it starts to add up.
Anyway, point being I'd rather have a quick: request->response to know what's there than to use more of the status/information broadcasts.

you will open a completely new "box of pandora" as mavlink

Yap 😞.

I want to suggest to consider if that could be a possibility

I'm all for it, but the reality will be that changing the message now, even though it's still marked WIP is likely to cause weird issues where suddenly the gimbal doesn't work anymore at all because the ground station does not "see" the gimbal_manager_information message which means:

  1. The gimbal v2 protocol doesn't work and is bad 💩
  2. I get requests to debug it, just to figure out all the mavlink versions floating around in a distributed system to come to the conlusion: "oh I think component X might be outdated, although it could also be component Y as the message is passing through there". 😱

So this is where my hesitation comes from. If it was a case where something is just wrong and we really have to fix it, then it would make sense but if it's merely for symmetry and having a cleaner spec, it might not be worth the trouble.

Irrespective of that, in any case, I want to suggest to add the mentioned flags to COMP_INFO_BASIC ... so that one can benefit for all components

Makes sense, let's think through it.

no uid ?? sad

So this would be 16 bytes, I assume? Uses up some space but we could have it at the end where it gets truncated.

no custom_name ?? sad

Could the custom name be in the json meta data? I guess that would be a bit annoying to actually do.

no custom_cap_flags ?? sad

Could custom capabilities be in the json meta data? This would allow to ship the description of the flag with it and would prevent random unknown/custom bits.

no meta data contents flags ?? sad

I would say that definitely goes into the json meta data, right?

@olliw42
Copy link
Contributor

olliw42 commented Nov 26, 2021

Maybe, if it's sent with 1 Hz.

for reasons you mentioned, I once suggested a dynamic rate - and that's what I do myself. When something changes I send it 4 times at high rate (can't recall, I think 4 Hz) ... you really need a high rate if you want your GUI to be responsive, but high rate is most of the time waste, so that's a perfect compromise
(a similar thing happens with camera, when you change a setting you want to get back the info on your GCS GUI if it was successful really soon, and not 5 secs after having initiated the change...)

so, my claim: it's not a problem per see and especially not of the concept, it's merely a matter of implementation. For that reason I don't buy the arguments.

(it's really just 5 lines of code to do the rate dynamically)

So this is where my hesitation comes from.

I fully understand that (and I think I did use "careful" language reflecting that LOL)

uid

whatever length, in mavlink there are already some uids with various length. 64 bit is probably the smallest reasonable.
that field can be really useful to identify things if you have several or even plenty of them in your stock

btw, it could also be used for dynamic comp ID allocation ;) (if mavlink ever would do that)(which I would find great)(and which would not be so difficult, see uavcan0.9)

Could the custom name be in the json meta data? I guess that would be a bit annoying to actually do.

IMHO absolutely not. All that effort for just one field?

Could custom capabilities be in the json meta data? This would allow to ship the description of the flag with it and would prevent random unknown/custom bits.

IMHO absolutely not. Yeah, there could be description ion the json. However, the idea of a custom flag would be that you (= mavlink) cannot make any assumptions about what it's being used for ;)

I would say that definitely goes into the json meta data, right?

no, very much to the contrary.
For the exact same reasons why you wanted the PR, just read all of your own words before. Do you really want to first trigger a request-response to know what info you possibly may find, and probably do not find. If you take what you said on the matter before you should say: ohh, yes ... :D:D

@julianoes
Copy link
Collaborator Author

When something changes I send it 4 times at high rate (can't recall, I think 4 Hz)

Sure, that makes sense. But you would still use 1 Hz for normal, and from my recent experience, having a ton of different topics at 1 Hz adds up, hence my caution around this.

no, very much to the contrary.

Ok, I guess I did not understand what you meant with meta data contents flags. Can you give an example?

@olliw42
Copy link
Contributor

olliw42 commented Nov 26, 2021

But you would still use 1 Hz for normal,

I do, but you also could just do at 1/3 Hz per default ... as one needs it

and unless there is a way to enable/disable e.g. the gimbal manager - which there isn't - at least not according to the spec - your arguments are artificial since it is going to send irrespective of that cap flag or not ... you don't gain anything concerning that mater by this

Ok, I guess I did not understand what you meant with meta data contents flags. Can you give an example?

somewhere and sometimes back there had been a flags field which told which meta data types are in the meta data file, this is what I am referring to
it would be basically a flags field holding the COMP_METADATA_TYPE's (i.e. a bit for each type)( i.e. (1<<COMP_METADATA_TYPE_GENERAL) )
so far you would get that info by i) requesting COMP_INFO and ii) reading the general json, just to potentially learn that you did that for nothing because the meta data what you are after isn't provided ...
this goes 1to1 along the lines of your argumentation

@julianoes
Copy link
Collaborator Author

I do, but you also could just do at 1/3

Yes but this means that you wait up to 3 seconds for the gimbal interface to appear in QGroundControl, as an example which is really at the upper limit of patience, or of what is expected.

With the request/response it can be almost immediate, almost as low as ping. That matters for a nice UX.

so far you would get that info by i) requesting COMP_INFO and ii) reading the general json, just to potentially learn that you did that for nothing because the meta data what you are after isn't provided ...

Oh yes, absolutely! I thought this was implemented in COMPONENT_INFORMATION but you're right it's not! Instead we have two URL fields hard-coded to general and peripheral. That doesn't make any sense 😞.

@olliw42
Copy link
Contributor

olliw42 commented Nov 26, 2021

Yes but this means that you wait up to 3 seconds for the gimbal interface to appear in QGroundControl, as an example which is really at the upper limit of patience, or of what is expected.

I well may miss the key point which drives you

yeah, the GCS may draw "something", but it couldn't draw anything which could be working, so it would be something with dead elements which only could become live or disappear as info is coming in

and especially I don't see at all how you would prevent the 1 Hz flooding ... only if you could enable/disable the gimbal manager from sending the status it would help anything (I'm saying this for the 3rd time, and haven't yet heard any counter argument)

Oh yes, absolutely!

even if it were in COMP_INFO, it would be better placed in COMP_INFO_BASIC

@julianoes
Copy link
Collaborator Author

so it would be something with dead elements which only could become live or disappear as info is coming in

Sure but having the elements dead for 3 seconds is not optimal for a UX, and with AMC for instance that's not how it works, and that's not really up to me.

only if you could enable/disable the gimbal manager from sending the status it would help anything

Yes, by default it would not send a status.

even if it were in COMP_INFO, it would be better placed in COMP_INFO_BASIC

Doesn't it need to be related to a URL though?

@olliw42
Copy link
Contributor

olliw42 commented Nov 29, 2021

so it would be something with dead elements which only could become live or disappear as info is coming in

Sure but having the elements dead for 3 seconds is not optimal for a UX, and with AMC for instance that's not how it works, and that's not really up to me.

but this would be how it still would be !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

frankly, I think you haven't yet really thought this out. How should it be possible to be responsive just based on this flag in COMP_INFO_BASIC? You still have to go through all the process of getting all the 1Hz and other data and request-responses to be able to make the GUI do anything ... so all you get is a thing showing to the user "hey, in few seconds there will be this and that available, but please be patient, it still will take still seconds..."

only if you could enable/disable the gimbal manager from sending the status it would help anything

Yes, by default it would not send a status.

so you are completely changing the whole way of how the whole things are working - in a TOTAL NON-STANDARD way !!! at least there is no standard for that nor is there any proposed changes/addition to the standard I would have seen ...

come on, seriously, that's dirty, isn't it ?

I guess you want to use a custom_cap flag, and then you can do all the dirty tricks you seem to have in mind ;) I'm not saying the dirty tricks wouldn't be good, but but unless they are or become standard they are dirty

(ironically, I envisioned such enable/disable function when I made my gimbal protocol v2 1.5 years ago: https://github.com/mavlink/mavlink/blob/master/external/dialects/storm32.xml#L318-L327, https://github.com/mavlink/mavlink/blob/master/external/dialects/storm32.xml#L417-L424 LOL)

even if it were in COMP_INFO, it would be better placed in COMP_INFO_BASIC

Doesn't it need to be related to a URL though?

no

it of course should match what one would find if one would call the general json in that URL

I guess you explained before that adding the URL also to COMP_INFO_BASIC is not an option. So with this flags the process could be

  • get COMP_INFO_BASIC
  • from the info in there check if there is a json which may have info which you want, if so request COMP_INFO
  • from the info in COMP_INFO get the URL and start the process to get general json, and else you want

@julianoes
Copy link
Collaborator Author

Sidenote: I'm trying hard not to be offended over here 😂. What I suggest is apparently not thought through and dirty. But it's only Monday morning and I have some patience, so no worries ;).

That being said, let's think through it.

How should it be possible to be responsive just based on this flag in COMP_INFO_BASIC? You still have to go through all the process of getting all the 1Hz and other data and request-responses to be able to make the GUI do anything

Ok, so let's look at that specific example of having camera and gimbal controls show up (or be active rather than greyed out):

These are the steps I'm imagining, and let's assume a round trip (ping) time of 100ms:

  1. Wait for first heartbeat (up to 1s) from a component with MAV_TYPE_CAMERA.
  2. Once that has arrived, immediately fire a REQUEST_MESSAGE command for CAMERA_INFORMATION, VIDEO_STREAM_INFORMATION, COMPONENT_INFORMATION_BASIC. (These can be sent without waiting for an ack each, so "pipelined", meaning we might have a response to all of them after e.g. 130ms if all goes well and we assume 10ms per command.)
  3. Given the flags in COMPONENT_INFORMATION we can see that the camera also implements a GIMBAL_MANAGER, therefore we send another REQUEST_MESSAGE for GIMBAL_MANAGER_INFORMATION. (This would take another 110ms).
  4. Also at this point the ground station would start downloading the camera definition file (and maybe other component definition files, unless they are already cached. (Wild guess: 500ms?)
  5. At this point the ground station can use the command SET_MESSAGE_INTERVAL to set up the rate that it wants for CAMERA_STATUS, VIDEO_STREAM_STATUS, GIMBAL_MANAGER_STATUS, GIMBAL_DEVICE_ATTITUDE_STATUS, etc.

So, by requesting all the information on connecting we can potentially be faster, in this example 1.24s instead of ~3s. Now, I know there are caveats like too much traffic right on connect, slower ping, all sorts of things can go wrong, sure, however, the broadcasting also has its challenges, let's talk about these next. And maybe that makes my motivation clearer.


so you are completely changing the whole way of how the whole things are working - in a TOTAL NON-STANDARD way !!! at least there is no standard for that nor is there any proposed changes/addition to the standard I would have seen ...

One thing I have learned in the last year or so, is that MAVLink has been designed (and you could say "standardized") around the 1 to 1 model: one ground station talking to one drone. And fixed message rates and "broadcast" messages work really well in that context, even if you add one more ground station or maybe one more drone in the air. However, once you add components, many things don't scale too well anymore. For instance, it's not out of the ordinary in my day to day to have the autopilot connected to a Linux companion computer where we have a camera component running, a logging component running, as well as a MAVSDK instance connected to it trying to do something smart, and then a link to a ground station, and yet another MAVSDK instance on the ground. So that's at least 6 components in many cases, so it's clear that this leads to a lot of broadcast messages being sent all over the place, and it doesn't scale well. Even some small UDP status message at 1 Hz start to add up with all the other status messages, heartbeats, etc. on the network.

From my reading there are two approaches to tame this mess:

  1. Send messages slower (as you suggested but that does not scale).
  2. Filter out messages (e.g. mavlink-router supports message filtering to certain endpoints).
  3. Use some sort of broker where each component registers for messages, or requests certain messages one by one.

Whether we should go towards 2. or 3. (or both) with MAVLink depends on various things, one of them being how much control a developer has over the system. E.g. if you can't control some of the components, a broker model would make sense as each individual component can configure what it wants. If however you have full control over the whole system, then you could also consider just filtering the messages in one place because you know what is sent where and needs to arrive where and at what rate.

Now you might say "well this is obviously a bigger topic that needs to be talked about but has nothing to do with the changes I suggested". Yes and no. I believe with every spec change or addition that we do we move in a certain way. And I don't think this is standardized in either way.

Let's look at the example of how the messages work in the simple autopilot to ground station case:

  • For QGroundControl with PX4, all messages are broadcast at "sensible" rates. These rates are configured on the PX4 side.
  • For MissionPlanner with ArduPilot, only heartbeat messages are sent, and MissionPlanner requests all the streams it needs upon connection. (I'm not 100% sure on this but that's what I remember from when I tried it every now and then.)

I actually prefer the latter way, where a ground station configures the stream because the ground station knows what it needs, and some ground stations might need something else, and so do MAVSDK instances, or OSDs, or whatever you connect.
And actually, for PX4 we talked about changing this but - who would have thought - it's not so easy without breaking things. (And who wants to invest time in something that already "works".)

So given this mismatch I would argue this is actually not standardized overall. Yes, we tried to somewhat give recommendations for the messages of the gimbal protocol, correct, and that's probably the part I'm questioning. So maybe before discussing the specifics here, I should make a bigger issue about how we want to deal with this in the future. And that will inevitable lead to a broader MAVLink 3 discussion.


get COMP_INFO_BASIC
from the info in there check if there is a json which may have info which you want, if so request COMP_INFO

Sounds suboptimal. I would rather add that flag to COMP_INFO as an extension.

@olliw42
Copy link
Contributor

olliw42 commented Nov 29, 2021

sorry if my wording didn't come across well ... my apologies

concerning the long part I'll respond later (I'll try to convince you that it's a logically/conceptually wrong path)

get COMP_INFO_BASIC
from the info in there check if there is a json which may have info which you want, if so request COMP_INFO

Sounds suboptimal. I would rather add that flag to COMP_INFO as an extension.

I really don't understand you here. You constantly argued that the benefit of that g_m_v2 flag is to reduce the time to action, and not having to do or wait for all this request-response or 1Hz stuff ... I think I now could pull out plenty of cites quoting you :)

If you were to take this consideration serious, i.e., not only apply to the one case you have in mind, but allow it to be valid in other cases too, then your response is difficult to understand ... as it does exactly what you are saying you want to generally achieve

Since the differences seem to not be clear , let me make the steps explicit

A) no meta type flags anywhere (like now)

    1. request COMP_INFO
    1. read URL and get general json
    1. digest general json to see what you have available

=> it needs 3 steps to know what meta types are available

comment: this is the fastest route to that info. Since there is a COMP_INFO flag in the cap flags of COMP_INFO_BASIC, on most likely always would do

    1. request COMP_INFO_BASIC
  • check cap flags, if COMP_INFO go on, else end
    1. request COMP_INFO
    1. read URL and get general json
    1. digest general json to see what you have available

=> it needs 4 steps to know what meta types are available

B) meta type flags is in COMP_INFO message (as it once was)

    1. request COMP_INFO
  • check meta type flags, if desired meta data available go on, else end

=> it needs 1 step to know what meta types are available

comment: the same comment as in A applies, so most likely one actually does

    1. request COMP_INFO_BASIC
  • check cap flags, if COMP_INFO go on, else end
    1. request COMP_INFO
  • check meta type flags, if desired meta data available go on, else end
    1. read URL and get general json
    1. digest general json to see what you have available

=> it needs 2 steps to know what meta types are available

C) meta type flags is in COMP_INFO_BASIC message (my suggestion)

    1. request COMP_INFO_BASIC
  • check cap flags and meta types flags, if COMP_INFO and desired meta type available go on, else end

=> it needs 1 step to know what meta types are available

comment: the same comment as in A applies, so most likely one actually does

    1. request COMP_INFO_BASIC
  • check cap flags and meta types flags, if COMP_INFO and desired meta type available go on, else end
    1. request COMP_INFO
  • check meta type flags, if desired meta data available go on, else end
    1. read URL and get general json
    1. digest general json to see what you have available

=> it needs 1 step to know what meta types are available

so, C is the most fastest way to know what meta types are there

@julianoes
Copy link
Collaborator Author

julianoes commented Nov 29, 2021

my apologies

No worries, thanks!

I was optimizing the path to know what meta types are there and how to get them. At which point B and C are the same speed, right?

Sounds suboptimal. I would rather add that flag to COMP_INFO as an extension.

What I meant, and I think you might not have understood me either, with this was really B)! I meant with "as an extension" not that it's in the file but that the enum is added to the message after <extensions/>.

My argument was mostly that it's easier to connect type with url as both would be in the same message. Again, that's how it was initially until the message got messed up.

@olliw42
Copy link
Contributor

olliw42 commented Nov 29, 2021

I was optimizing the path to know what meta types are there and how to get them. At which point B and C are the same speed, right?

no
only when one assumes that one is not going to call COMP_INFO_BASIC, B & C would be same speed. If however that were a reasonable assumption one would not need that COMP_INFO cap flag in COMP_INFO_BASIC in the first place, i.e., the whole logic goes mayhay. Therefore, the reasonable assumption is that one is first calling COMP_INFO_BASIC, and then are taking the next steps based on it's info.

If one is accepting this at reasonable, B is slower than C by a whole request-response cycle (= 3 sec?). And this is the whole point.

Sounds suboptimal. I would rather add that flag to COMP_INFO as an extension.

What I meant, and I think you might not have understood me either, with this was really B)! I meant with "as an extension" not that it's in the file but that the enum is added to the message after <extensions/>.

I had understood this, i.e., that you wanted B with a meta types flag in the message (like it once was)

I guess the ideal would be to have the meta type flags in both COMP_INFO_BASIC and COMP_INFO. But if one were to decide to have it only in one, then IMHO it should definitely go into COMP_INFO_BASIC. For the reasons said.

My argument was mostly that it's easier to connect type with url as both would be in the same message. Again, that's how it was initially until the message got messed up.

yeah, and I would not disagree ... however, that message got screwed up already in other was as was concluded before in one of your previous responses in the above. So, the option does not seem to be "what does make most sense" but only "what does make most sense under the given constraint" ... and under the given constraint I would go for the flag in COMP_INFO_BASIC. ;)

@julianoes
Copy link
Collaborator Author

only when one assumes that one is not going to call COMP_INFO_BASIC

Right, fair.

If one is accepting this at reasonable, B is slower than C by a whole request-response cycle (= 3 sec?). And this is the whole point.

A request-response cycle should be more like ~100 ms, as much or little as a ping, maybe 500ms in practice but certainly not 3s, unless I'm missing something?

I can see that if your assumption is that a request-response takes 3s, then you would go for C). If however, you're requesting COM_INFO and COMP_INFO_BASIC in parallel immediately, then you have all information within less than a second. And at that point B) is more intuitive because data and meta data that belong together are actually in the same message. If they are spread across messages, that's not as intuitive. Anyway, I can live with both. If we don't want (or can't) add them to COMP_INFO because that message is already full, then this discussion is mute anyway. (Unless we consider replacing the existing COM_INFO with something that makes more sense.

@olliw42
Copy link
Contributor

olliw42 commented Nov 29, 2021

A request-response cycle should be more like ~100 ms, as much or little as a ping, maybe 500ms in practice but certainly not 3s, unless I'm missing something?

QGC has timeouts of several seconds, for many things, if I'm not mistaken. So, while it can be true that the cycle will be just few 100 ms if there is a positive response ... but if not, and be it because of a losy low-bandwidth link, it can be up to this time out. And obviously one is not going to get 100 response-requests done in few 100ms either.

I find that objection puzzling. I mean, if the GCs would know that the response will be positive and/or it all would just take 100 ms then there would be no need for all what is the current topic

I can see that if your assumption is that a request-response takes 3s, then you would go for C).

no
I would go for C because for exactly the same reasons you want to go with e.g. gimbal_v2 flag, or any of the other flags !!!
Maybe you briefly remind yourself what these arguments were, and when just apply to the meta type flag here too. That's all of my argument.

yes, if one could do all desired request-responses in few 100 ms and get all 1Hz signals within 1.5 seconds then there would be no issue at all. Again, didn't you argued before that this however is not the practical reality. I think that if the user had to wait only 1.5 sec at most for the GUI to build up, there would be no concern. Pl remind your own arguments :)

@julianoes
Copy link
Collaborator Author

QGC has timeouts of several seconds, for many things, if I'm not mistaken. So, while it can be true that the cycle will be just few 100 ms if there is a positive response ... but if not, and be it because of a losy low-bandwidth link, it can be up to this time out. And obviously one is not going to get 100 response-requests done in few 100ms either.

Good point. Let's look at this.

losy low-bandwidth link

Let's call it a lossy link. It can be lossy if low or high bandwidth, so I wouldn't confuse these terms.
For a lossy link, a retry is sent after 0.5s for QGC if I'm not mistaken, and presumably up to 10 retries. So, if we lose a few messages here and there, hopefully not two in a row, then that would mean ~0.6s (if we continue with the assumed numbers).

If we have a terrible (merely functioning) link then yes, we will wait several seconds, as each command will cause more retries. At this point, there is not much we can do, in my opinion. If we can't communicate properly, then the UX is going to suffer.

Now there is one consideration when discussing (broadcast) messaging/streams vs. request-response, and that's probably latency. If latency is really high, then streams will likely work better. If latency is low, then request-response might be a breeze.

On the other hand request-response is easier to debug because if you get timeouts, then you know it's a problem with the link, however, if you don't get a GIMBAL_MANAGER_STATUS message in 3 seconds you don't know if it is not being sent at all, or if the link is bad.

Maybe you briefly remind yourself what these arguments were, and when just apply to the meta type flag here too. That's all of my argument.

I really don't know how to remind myself of my arguments. If I could I would already understand them 😂.

But if the meta flag is also relevant for COMP_INFO_BASIC, then of course it can be there, sure! I connected the meta data flag to the json file only, that's why I wanted it "shipped in the same message".

@olliw42
Copy link
Contributor

olliw42 commented Nov 29, 2021

Let's call it a lossy link. It can be lossy if low or high bandwidth, so I wouldn't confuse these terms. For a lossy link, a retry is sent after 0.5s for QGC if I'm not mistaken, and presumably up to 10 retries. So, if we lose a few messages here and there, hopefully not two in a row, then that would mean ~0.6s (if we continue with the assumed numbers).

If we have a terrible (merely functioning) link then yes, we will wait several seconds, as each command will cause more retries. At this point, there is not much we can do, in my opinion. If we can't communicate properly, then the UX is going to suffer.

Now there is one consideration when discussing (broadcast) messaging/streams vs. request-response, and that's probably latency. If latency is really high, then streams will likely work better. If latency is low, then request-response might be a breeze.

On the other hand request-response is easier to debug because if you get timeouts, then you know it's a problem with the link, however, if you don't get a GIMBAL_MANAGER_STATUS message in 3 seconds you don't know if it is not being sent at all, or if the link is bad.

I guess I'm a bit lost now

technically, theoretically, you can get all info in 2 secs. In this time span one theoretically can do all requests as well as receive all 1 Hz messages. I continue to argue if it indeed were so that all would be finsihed in 2 secs we wouldn't have this discussion.

If I'm mistaken and you want to cut the 2 secs down to 500 ms, then pl correct me. And everything I said would indeed be obsolete (while I still would not understand how your proposal would allow this).

So, my baseline is to assume that the underlying problem is that in the practical situation that's not the case but the times are longer to an extend that it's annoying users.

We now could discuss all the details for why it is longer, and do this for each message type, each link type, and so on. But I argue that's not so relevant, since what counts is that time to finsih is simply to long, so let's save time where possible.

You seem to put out now an argument that request-responses are about OK, but that the 1 Hz messages are the trouble makers.

While I would not immediately agree, let's not focus on what I think, but let's wonder about this:

If the concern is only about the 1 Hz messages, why are there then so many other cap flags in COMP_INFO_BASIC?

Let's see:

  • PARAM vs PARAM_EXT: that's request-response
  • COMPONENT_INFORMATION: that's request-response
  • MAVLINK_FTP: that's request-response
  • EVENTS_INTERFACE: that's request-response

häää?

why the heck are there these flags?

now, an answer to this is NOT needed, since my argument is much simpler:

since there are these flags one - at least - should be consistent. Ergo, put meta type also into COMP_INFO_BASICS.

All I'm saying. The reason you've put in these flags makes it that also COMP_INFO_BASICS shoudl go in - the actual reason does not matter, just the fact that there seem to have been reasons to put them in.

The only way out for you of this line of argument is to break it at some point, e.g. by saying "yeah, I really like the MAVFTP flag, but don't like a meta types flags, but I have no reason except of that".

@olliw42
Copy link
Contributor

olliw42 commented Nov 29, 2021

btw, your rasied argument with gimbal manager status holds also for any heartbeat ... so if you would take that argument serious you should have voted also for adding "zillions" of flags for each possible component in the system

you don't. That's the bottom of my earlier "dirty procedures" response

@julianoes
Copy link
Collaborator Author

I guess I'm a bit lost now

Well you argued that request-response was slow with QGC because of lossy links, so I took that claim apart and analyzed why.

You seem to put out now an argument that request-responses are about OK, but that the 1 Hz messages are the trouble makers.

Yes, this was my learning in the last year. This goes back to what I elaborated in the long text above. It's not so much that 3s is a problem but rather that it doesn't scale with number of components. My assumption one year ago was that it is not a problem to send status messages at 1 Hz. However, once you have 3+ different status messages being sent all over the place it's not "free" anymore, there is some overhead. Therefore, one solution is to go back to 1/3 Hz, however, that makes the initial UI to appear slower, from up to 1s to up to 3s.
At that point my thought is: well if we used request-response to do this discovery, we have two benefits:

  1. Faster discovery, e.g. less than 1s for some cases
  2. Less wasted traffic.

The only way out for you of this line of argument is to break it at some point, e.g. by saying "yeah, I really like the MAVFTP flag, but don't like a meta types flags, but I have no reason except of that".

I have said above that I don't have a strong opinion about these flags, except that I think they make sense. Then I was worried that they are in the wrong message but I was confused. They are actually fine in COMP_INFO_BASICS as they also apply to the COMP_INFO_BASICS message, so your suggestion makes complete sense, now that I get it.

Now you still seem to be insisting that somehow I'm wrong and contradicting myself and that I'm suggesting dirty procedures. I'm not 100% sure if you're trying to proof that I'm wrong or if you want to find a solution together.

@olliw42
Copy link
Contributor

olliw42 commented Nov 29, 2021

Yes, this was my learning in the last year. This goes back to what I elaborated in the long text above. It's not so much that 3s is a problem but rather that it doesn't scale with number of components. My assumption one year ago was that it is not a problem to send status messages at 1 Hz. However, once you have 3+ different status messages being sent all over the place it's not "free" anymore, there is some overhead. Therefore, one solution is to go back to 1/3 Hz, however, that makes the initial UI to appear slower, from up to 1s to up to 3s.

the problem is that there are two different arguments in the game, and they are constantly flipped.

I cite from the first post:
"This came up because a gimbal manufacturer would like to have an easy way to know if a system (drone or camera) supports the v2 gimbal protocol. ... The alternative is sending a MAV_CMD_REQUEST_MESSAGE for GIMBAL_INFORMATION."

so here it is clearly about the initial stages of getting things up and running.

only and only in this context does the discussion on the gimbal v2 and meta type flags make any sense.

here you however relate to an argument which - IMHO - is used in a quite odd way:

If it would be about bandwidth cost which exists permanently, i.e. after things are up and running, it could be a concern, but I think I have given compelling arguments: the flags in comp_info_basic will not help with that and there is a simple way around by dynamically adjusting the rate

You however apply it as argument for the initial startup phase ("makes the initial UI to appear slower"), which is odd because:

  • of all the traffic which goes on in the initial states, the 1 Hz status messages are really the least concern. I mention all the request-response traffic, parameter traffic, data stream traffic, and heck else. (I think a log of the traffic in the first 15-30 secs will show this even for your system - it surely does for my system)
  • the flags in comp_info_basic won't help at all here. All it would tell you is that such a message might (!!!!) be coming, but you still have to wait for that message in order to have the info in that message. Ergo, you are simply not able to display anything which the user would consider as "responsive GUI" unless you have waited for the status message. That flaw of argument applies to other messages/protocols too.
  • there are more and other 1Hz messages

so it totally fails to achieve your point 1. ("Faster discovery, e.g. less than 1s for some cases")

it also fails your point 2. ("Less wasted traffic."), since - to say it again - it only would help if you had a mechanism to enable/disable the status message.

so, you said, yes you will have such a enable/disable mechansim. However: Where is this mechanism in the standard which you follow? It is not there. So you need to play tricks which are beyond the standard (which is what I called dirty)(not dirty as in morally dirty).

but you don't seem to note the inconsistency in your argument: If the status is disabled, how could you possibly get then the info which it carries sooner to make the GUI appear faster ????? You will enable it much later since you don't want it in the initial phase to avoid the traffic ... how much clearer can a logical flaw be ?

proof that I'm wrong or if you want to find a solution together.

I don't try to proof you wrong. I try to point out the inconsistencies in a line of reasoning which leads to message design which - IMHO - isn't sound. As such I consider it a helpful prerequisite towards a solution. And indeed I tend to try to do that in a form reminescent to proofs, but that's just the way I think.

I think I offered some "solutions" to the matter of reducing the permanent traffic after startup and the meta types, but you are correct that in the particular matter of gimbal status v2 I have only pointed at why I think the current won't be very helpful but have not offered an alternative solution.

In fact, IMHO, there is no solution. One simply can't get the status information before the information is beeing sent out, and knowing it will come is not equal to the information itself (unless it's encodable in 1 bit LOL).

This can be very different to all the "_INFO" type of messages, which are one-time descriptions of particular parts of the system, and which currently are mostly obtained by some request-response types. Here, in principle improvements are possible. Logically this is so because these info could in principle even been known in advance.

@julianoes
Copy link
Collaborator Author

You're right that I started off trying to fix the discovery for the gimbal manufacturer. While thinking through it all I have, however, the learnings also in mind, for instance that the constant broadcasting of several status messages seems inefficient in a setup with several components. So, agreed, I'm trying to solve both problems at once.

of all the traffic which goes on in the initial states, the 1 Hz status messages are really the least concern. I mention all the request-response traffic, parameter traffic, data stream traffic, and heck else. (I think a log of the traffic in the first 15-30 secs will show this even for your system - it surely does for my system)

That's unfortunately true. There is potential for improvement there.

the flags in comp_info_basic won't help at all here. All it would tell you is that such a message might (!!!!) be coming, but you still have to wait for that message in order to have the info in that message.

No you don't have to wait for it. You can request it using REQUEST_MESSAGE. You can anyway request it but without having the flag tell you that gimbal v2 is implemented you don't even know which component to request it from. And if you request it from all components (compid 0) then it gets tricky because some might ack with UNSUPPORTED, and others with ACK, and others with DENIED, and you have to filter through it all. So instead of sending a REQUEST_MESSAGE to everyone for particular messages it would - in my opinion - beneficial to get an overview using these flags.

it also fails your point 2. ("Less wasted traffic."), since - to say it again - it only would help if you had a mechanism to enable/disable the status message.

By default it's disabled. Then it can be requested once using REQUEST_MESSAGE. And if you need it at a certain rate you configure (enable) it using SET_MESSAGE_INTERVAL (I mentioned that above).

so, you said, yes you will have such a enable/disable mechansim. However: Where is this mechanism in the standard which you follow? It is not there.

See above. It's there, and has been for a long time, and is used by ArduPilot extensively if I'm not mistaken (unless maybe the previous but similar command is used).

You will enable it much later since you don't want it in the initial phase to avoid the traffic

A ground station can order these things as it requires them. If gimbal and camera is most important it requests these first, and params and mission later.

I try to point out the inconsistencies in a line of reasoning which leads to message design which - IMHO - isn't sound.

Ok fair, I appreciate that. I guess I was not clear in laying my reasoning out properly.

@olliw42
Copy link
Contributor

olliw42 commented Nov 30, 2021

many thx, this was insightful

indeed, you are correct, I failed to recognize the request_message mechanism. I was indeed thinking in terms of enabling/disabling the gimbal manager, but failed to see that what you have in mind is to simply enable/disable sending the status message only. This certainly a good idea. Thx for explaining this, and I certainly take back what I have said concerning this before.

it is good to hear that there indeed had been a shift of intention. However, what are we then now talking about? Are we talking only about the broadcasted status messages? Are we talking only about getting the GUI response better? Or both?

also, yet, you still circumvent addressing how this flag would provide substantial help. Especially - and I emphasize this - ALL the three bullets in my previous post do still stand !

and especially the logical flaw still applies, which was that you can't get the GUI to be more responsive by delaying reception of the respective messages. Your request-the-status -message approach obviously makes the GUI less responsive.

and also the fact that it is just one 1Hz message in the huge bunch of other startup messages is noteworthy (= all this is, in German, nur ein Tropfen auf den heißen Stein).

and you just argued away the need of that flag yourself. Request-response cycles are considered swift and not much of a burden. Further, it also gives the info of the presence of the gimbal manager. Further, it needs to be done anyways if one wants the message. So, at the point of time where the GUI wants to know about and get the status message it simply could do that request-response. So, the flag is not needed at all for proper operation and achieving the goal of reducing the 1Hz burden.

The only benefit I do see of that flag would be that the GUI could - by some algorithm which potentially needs to be quite sophisticated (and hence difficult to implement) to be effective - arrange the order of what it does in order to achieve a minimal startup time - which seems you also suggest. HOWEVER: this ONLY could work in any meaningful way if you have NEARLY COMPLETE INFO on the system available from the beginning. So, there need to be many many more such flags.

you created just more questions. E.g.:

  • What happens if you have two gimbals in the system ("unfortunately" VERY realistic!!!)?
  • What happens if the gimbal manger is in the gimbal, and not the autopilot, how could the autopilot report that flag in its COMP_INFO_BASIC message? This relates to the 3rd paragraph in your last post. You explained the "issues" with a broadcast request
  • what is actual the role of the "old" methods of heartbeat, mav type, comp id? The flags would make them largely obsolete as a mechanism of discovery.

etc. pp.

Please don't get me wrong. I am not fundamentally against this flag. However, I am very worried that it is going to cause more problems than it "solves", or even will break up things. And this is because - IMHO - it's conceptually the wrong approach. I still owe you a response to your longer Px4/ArduPilot post, I need to do it. I also always for looking for consistent approaches. Like: if there is flag for gimbal manager, one also would want a flag for camera, and if there is a flag for camera, one would ... ... ... ...

@olliw42
Copy link
Contributor

olliw42 commented Nov 30, 2021

so. let me hold my promise and let me share my thoughts (concerns) from a broader perspective.

I will only look at the startup situation (GUI responsiveness), since I (continue to) think that once started up there is no real issue, also not with the 1 Hz messages (and if it should be rarther the heartbeats which should worry). One can request them as needed (as you tought me), one can send them with dynamic rates (which works great for me), etc. IMHO all tools are already there to get along with this.

Furthermore, concerning startup, the issue is "only" that of giving the GCS/GUI the information it needs/wants in some timely fashion. It is not about communication between components of a system. For the latter I do not see a solution to reduce traffic which would fit into the fabric of mavlink. So, let's consider how we can help the GCS/GUI.

The problem which is attempted to be attacked with the flags, and the comp info messages per see, and other similar messages, is to communicate to the GCS/GUI some information on how the system looks like, so that it can produce usefull GUI elements. The new aspect introduced in this PR here is to also communciate this info as early as possible so that the GCS/GUI can do something usefull as well as do so in a timely manner. However, either way, it is about giving that info to the GCS/GUI.

But: I have choosen the words right here when I wrote "information on how the system looks like" and not "information on how the components of the system look like".

That is, while some info might indeed be component-attached, such as if a component supports param ext or not, but other info might be system-attached, and not easily attachable to a single or specific component. As much as I can see, the gimbal_v2 flag would be of the second kind.

To further complicate the situation, some info might not even be attachable to any component in the system, since it is not actually a mavlink component, in the sense that it would emit a heartbeat and hence would be discoverable, but is an "external peripheral". You particularly referred to this as relevant in the above yourself. Such as "dump" gimbal for which there is "only" a gimbal manager on e.g. the autopilot.

What I'm trying to say is that many things would get easier, cleaner, or even only possible, if one would have info of the kind: the system has a gimbal, the system has camera, the system has this and that components, suports this and that things, and blabla.

So far, the GSC/GUI has to piece the info on the system together from the info it can get from the individual parts of it, which can even include info on parts which are not actually existing mavlink components and could speak for themselves.

So, when it comes to the startup topic, the common problem I see with all the attempts surounding comp info, external pheripherals, various status messages, discovery, types, and so on, is that it gives the GCS/GUI the info only in small junks from which it then, buy some more or less tedious and time consuming fashion, needs to piece together the complete picture of the system.

This analysis doesn't suggest a solution by itself. However, and this is my general believe, a correct diagnosis is prerequiste for a successful treatment.

This however may come to mind immediately: You may recall, we had a discussion to distinguish between implementer, intergrator, and user. For many things you voted for "the integrator should set parameters correctly"-type of solutions over "the system should autoconfigure itself"-type solutions. In this spirit, it looks to me that the most simplest and most canonical approach would be to have the implementer setting up the "parameters" correctly, and to not have endless flags distributed over endless messages emited by endless number of components and non-componets. One file, with all info, stored somewhere centrally on the system, into which the integrator writes all info. So, not a component description file, but a system description file. You would get it by a broadcast request to the system, and only one component in the system would response and deliver its url. Simple, clean, straight forward, fast, complete. The only downside, a person has to fill it in by manual labor.

btw, this could be made quite flexible in the sense that one can vary between "having it all in one file" and "having this and that in a file stored on this and that component". So, general.json would be that system_info thing, but there still could be urls on components (which however, for sure, would slow down the GUI; it's the integrators choice what she wants).

sorry for this huge text

@hamishwillee
Copy link
Collaborator

TLDR;

But this message is in development.xml, so it can change and I'm sure there will be a follow PR if I need to engage my brain :-)

@olliw42
Copy link
Contributor

olliw42 commented Dec 8, 2021

yeah, I realize this :)

however, there is a difference between "developing a message" and "putting a thing on different feet".

The outlined thoughts, and "immediate" suggestion suggests to not think in terms of components but in terms of a system of components.

Btw, to me it looks like it would be a quick, general, a quickly to realize and yet not so difficult to realize solution for all systems of which one wants to deploy many or should be a turn-key solution to a customer. I think it has a certain simplicity and power to solve the issue "once and for all" to it which might be appealing. You could amaze your customers by not only giving them what they wanted, a faster way to know if gimbal v2 is supported, but a generally faster way. That's at least how look at it :)

@julianoes
Copy link
Collaborator Author

@olliw42 sorry it took me a while to pick this back up.

So far, the GSC/GUI has to piece the info on the system together from the info it can get from the individual parts of it, which can even include info on parts which are not actually existing mavlink components and could speak for themselves.

True. That is indeed the situation with this distributed system.

In this spirit, it looks to me that the most simplest and most canonical approach would be to have the implementer setting up the "parameters" correctly, and to not have endless flags distributed over endless messages emited by endless number of components and non-componets.

This might look "simple" to you but it does not sound very doable to me. This is because not everything is known at "compile/integration" time. For instance this does not work when you have a drone with an autopilot, and a companion computer which supports swappable camera payloads. The companion computer interfacing to various camera, needs to change the camera definition at "runtime". When you swap from one to another camera payload, you might also swap out the gimbal which changes the availability of a gimbal manager in turn.

For swappable things there are two approaches. And I'm going to make the analogy with a computer peripheral like a printer or USB stick:

  1. You either have a central place where you configure it all properly. (Like you used to install all drivers 15 years ago in Windows.)
  2. You have standards that allow each device to be discovered and configured automatically. (In the same way that most standard devices just work in windows these days.)

Does that make sense? Maybe the analogy is not optimal but what I'm trying to say that we either come up with a standard (so mavlink) how these things can talk to each other.

Simple, clean, straight forward, fast, complete.

This, to me, is how you would describe a Mac. A wonderful black box that you don't get to customize.

But maybe you are right, and we are trying to hard to extend MAVLink to a standard for components and intra-vehicle communication, and we should instead strictly use it between a "system" and use other protocols inside. At that point we lose the benefit of having a standard gimbal and camera protocol that manufacturers can adhere to though which is something I have been invested in for the past couple of years.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants