Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check QoS policy when configuring intraprocess, skip interprocess publish when possible #664

Conversation

ivanpauno
Copy link
Member

@ivanpauno ivanpauno commented Mar 21, 2019

This is part of the intraprocess comm improvement (ros2/ros2#649, https://docs.google.com/document/d/1dM8O34GM-SbFps1vd1rocPMAg_7O9_-OG4NF4axOz7E/edit#)

As the current and proposed IPM doesn't complain with a non-volatile 'Durability' QoS policy, intraprocess comm shouldn't be configured in those cases.

  • Advantage: Solves compliance issues with QoS policies.
    e.g.: If a publisher and subscriber in the same process had been configured with QoS policy
    'durability=transient_local" and "history=keep_last", a late subscriber would receive the last message
    published, as this was not supported by the intra process manager.
  • Disadvantage: Intraprocess comm is not supported in those cases.

Previously, interprocess publish was always done. Now, it is skipped when only having intraprocess subscriptions.

  • Advantage: Less publish-to-subscribe latency, cpu usage, memory usage when the topic is only used for intraprocess comm.
  • Disadvantage: Probably, the time it takes to recognize a interprocess subscription done just before the publishing is longer than before.
    e.g.: Process B subscribes to topic /topic.
    Process A publishes to topic /topic just after process B subscriber.
    Since maybe the count updating time it's not the same as how the publish method checks its subscribers, a message is not received when before it was.
    (This hasn't been checked, it's only a possible drawback of this change)

@ivanpauno ivanpauno self-assigned this Mar 21, 2019
@ivanpauno ivanpauno added the in progress Actively being worked on (Kanban column) label Mar 21, 2019
@@ -231,6 +231,19 @@ PublisherBase::setup_intra_process(
IntraProcessManagerSharedPtr ipm,
const rcl_publisher_options_t & intra_process_options)
{
// Skip intraprocess configuration if the "durability" qos policy is not "volatile".
// TODO(ivanpauno):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good, in general, to have methods to check what is the real value of RMW_QOS_*_SYSTEM_DEFAULT. Or maybe, just delete all these defines and only support explicit configurations. Having unknown default values doesn't seem like a good idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct snippet of the TODO:

// TODO(ivanpauno):
// RMW_QOS_POLICY_DURABILITY_SYSTEM_DEFAULT is for FastRTPS and Connext also
// equal to "volatile". It would be good to add a
// is_durability_policy_volatile(rcl_publisher_options_t options)
// method to rcl, and corresponging methods in rmw and the rmw vendor
// implementations, which will also checks if "system_default" is "volatile" or not.
if (intra_process_options.qos.durability != RMW_QOS_POLICY_DURABILITY_VOLATILE) {
RCLCPP_WARN(
rclcpp::get_logger("rclcpp"),
"Enabled intraprocess communication with incompatible QoS policy.");
return;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I agree with you that we should avoid implicit settings, which is what the SYSTEM_DEFAULT gives you, but there's a use case that some people want supported, which allows you to configure QoS settings from external XML files. That is what this allows for, so I don't know if we could realistically get rid of those.

We definitely should have a way to check the effective QoS settings after the publisher is created.

Also, I think this (re)raises the issue that use_intra_process_comms needs to be an option on publishers/subscriptions, and not on the node itself. Otherwise you'd be unable to use intra process and have a non-volatile durability on one of your publishers.

Once we have that, I'd say it should be an error (not a warning) to have both intra process enabled and transient local durability on the same publisher/subscription.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about having use_intra_process_comms option at publish/subscription level. Switching the warning to an error after that also sounds good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still find this point unresolved, until we have the ability to know what the durability is when SYSTEM_DEFAULT is in use and until we have the ability to control which publishers and subscriptions have intra-process enabled.


To be clear about the first point, I think we need a way to inspect a publisher or subscription after creation to see which durability it is using. Something like rcl_ret_t rcl_publisher_get_actual_qos(const rcl_publisher_t *, rmw_qos_profile *), which may need an rmw equivalent.

I know this increases the scope of this issue considerably, but without it the warning is only going to work part of the time and would produce very subtle bugs for the user.

The alternative is an even bigger scope increase, which is to actually support transient-local durability correctly when intra-process is enabled.


For the second point, what I imagine is that the creation publishers and subscriptions have a three-way option which is "explicitly enable intra-process", "explicitly disable intra-process", or "default". Where "default" would be based on the intra-process setting at the node level.

I think this is important to do now (again I know it increases the scope of this issue), because otherwise you have to use a warning rather than an error to let the user know that they have conflicting configurations. Without it, you also have scenarios which are impossible to achieve, i.e. using intra-process and transient-local durability at the same time but on different publishers and/or subscriptions.

@ivanpauno ivanpauno added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Mar 21, 2019
rclcpp/include/rclcpp/publisher.hpp Outdated Show resolved Hide resolved
@@ -231,6 +231,19 @@ PublisherBase::setup_intra_process(
IntraProcessManagerSharedPtr ipm,
const rcl_publisher_options_t & intra_process_options)
{
// Skip intraprocess configuration if the "durability" qos policy is not "volatile".
// TODO(ivanpauno):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I agree with you that we should avoid implicit settings, which is what the SYSTEM_DEFAULT gives you, but there's a use case that some people want supported, which allows you to configure QoS settings from external XML files. That is what this allows for, so I don't know if we could realistically get rid of those.

We definitely should have a way to check the effective QoS settings after the publisher is created.

Also, I think this (re)raises the issue that use_intra_process_comms needs to be an option on publishers/subscriptions, and not on the node itself. Otherwise you'd be unable to use intra process and have a non-volatile durability on one of your publishers.

Once we have that, I'd say it should be an error (not a warning) to have both intra process enabled and transient local durability on the same publisher/subscription.

@@ -169,7 +169,7 @@ class PublisherBase

using IntraProcessManagerWeakPtr =
std::weak_ptr<rclcpp::intra_process_manager::IntraProcessManager>;
bool use_intra_process_;
bool intra_process_is_enabled_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the naming of the variable I see why you don't want to use use_ anymore since the decision about if it is actually being used is delayed. I think the order of the parts of the variable name was better before. I would suggest allow_intra_process_ instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree about this name change suggestion, but the other one is fine.

if (!allow_intra_process_ || has_inter_process_subscriptions) {

Isn't as clear to read in my opinion, as it reads to me one of "not allow intra process" or "allow not intra process", neither of which make sense to me, you have to look at it a while to get to "intra process is not allowed" and even then allow implies to me that it may or may not be enabled, which isn't the case. At this point it is either enable or it is not.

Whereas:

if (!intra_process_is_enabled_ || has_inter_process_subscriptions) {

Is very clear immediately, as it reads one of "not intra process is enabled" or "intra process is not enabled", which are better for me than the alternative you suggested.

So 👎 from me to change this variable name.

@@ -216,7 +216,11 @@ class Publisher : public PublisherBase
virtual void
publish(std::unique_ptr<MessageT, MessageDeleter> & msg)
{
this->do_inter_process_publish(msg.get());
bool inter_process_subscriptions_exist =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above I would suggest naming this one has_inter_process_subscriptions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find this name any better or worse.

if (intra_process_options.qos.durability != RMW_QOS_POLICY_DURABILITY_VOLATILE) {
RCLCPP_WARN(
rclcpp::get_logger("rclcpp"),
"Enabled intraprocess communication with incompatible QoS policy.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the warning should be more specific about the consequence:

Skipping intraprocess communication with incompatible QoS policy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivanpauno are you not planning to add the publisher/subscription option to enable intra-process and therefore make this an error? If you are planning to do so, then I think this warning does not need to be reworded as it will be removed.

Copy link
Member Author

@ivanpauno ivanpauno Mar 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do it in another PR. For the moment, I will keep the warning reworded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really agree, I think warning isn't really sufficient, as I said in my other comment this basically prevents you from using intra-process and transient-local durability at the same time (even if on different entities). So after merging this pr we'll be potentially breaking existing use cases for a very narrow optimization. I would prefer to see the improvement done completely in this pr than have two pr's.

@ivanpauno
Copy link
Member Author

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@@ -231,6 +231,19 @@ PublisherBase::setup_intra_process(
IntraProcessManagerSharedPtr ipm,
const rcl_publisher_options_t & intra_process_options)
{
// Skip intraprocess configuration if the "durability" qos policy is not "volatile".
// TODO(ivanpauno):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still find this point unresolved, until we have the ability to know what the durability is when SYSTEM_DEFAULT is in use and until we have the ability to control which publishers and subscriptions have intra-process enabled.


To be clear about the first point, I think we need a way to inspect a publisher or subscription after creation to see which durability it is using. Something like rcl_ret_t rcl_publisher_get_actual_qos(const rcl_publisher_t *, rmw_qos_profile *), which may need an rmw equivalent.

I know this increases the scope of this issue considerably, but without it the warning is only going to work part of the time and would produce very subtle bugs for the user.

The alternative is an even bigger scope increase, which is to actually support transient-local durability correctly when intra-process is enabled.


For the second point, what I imagine is that the creation publishers and subscriptions have a three-way option which is "explicitly enable intra-process", "explicitly disable intra-process", or "default". Where "default" would be based on the intra-process setting at the node level.

I think this is important to do now (again I know it increases the scope of this issue), because otherwise you have to use a warning rather than an error to let the user know that they have conflicting configurations. Without it, you also have scenarios which are impossible to achieve, i.e. using intra-process and transient-local durability at the same time but on different publishers and/or subscriptions.

@ivanpauno
Copy link
Member Author

To be clear about the first point, I think we need a way to inspect a publisher or subscription after creation to see which durability it is using. Something like rcl_ret_t rcl_publisher_get_actual_qos(const rcl_publisher_t *, rmw_qos_profile *), which may need an rmw equivalent.

Ok, I will add those methods.

The alternative is an even bigger scope increase, which is to actually support transient-local durability correctly when intra-process is enabled.

Supporting non-volatile durability is a really big change, specially in the case HISTORY=KEEP_ALL. I think we can do a big improvement in the intraprocess comm without tackling that.

For the second point, what I imagine is that the creation publishers and subscriptions have a three-way option which is "explicitly enable intra-process", "explicitly disable intra-process", or "default". Where "default" would be based on the intra-process setting at the node level.

I was thinking in adding a bool option to them, but a three-way option sounds quite better.

@wjwwood
Copy link
Member

wjwwood commented Mar 22, 2019

Ok, I will add those methods.

Thank you, I know it's a bit of a task, so if you need help let me know.

Supporting non-volatile durability is a really big change, specially in the case HISTORY=KEEP_ALL. I think we can do a big improvement in the intraprocess comm without tackling that.

I agree, that's why I want to instead properly plug this hole in the API until we can do the improvement.

Which also brings up the point, if there are any other QoS the intra-process are incompatible with at the moment, we should use these new "actual QoS" accessors to disable it in those cases as well.

I was thinking in adding a bool option to them, but a three-way option sounds quite better.

Again, a little more work, but I think it will make it easier for people to transition to from the current state.

@ivanpauno ivanpauno added in progress Actively being worked on (Kanban column) and removed in review Waiting for review (Kanban column) labels Mar 22, 2019
…p interprocess publish when only having intraprocess subscriptions

Signed-off-by: ivanpauno <ivanpauno@ekumenlabs.com>
Signed-off-by: ivanpauno <ivanpauno@ekumenlabs.com>
Signed-off-by: ivanpauno <ivanpauno@ekumenlabs.com>
Signed-off-by: ivanpauno <ivanpauno@ekumenlabs.com>
@ivanpauno ivanpauno force-pushed the ivanpauno/skip-interprocess-publish-when-possible branch from 1ee7c94 to 8a176bf Compare March 27, 2019 16:33
Signed-off-by: ivanpauno <ivanpauno@ekumenlabs.com>
@ivanpauno ivanpauno changed the base branch from master to ivanpauno/publisher_get_actual_qos March 27, 2019 17:14
@ivanpauno ivanpauno merged commit 026e947 into ivanpauno/publisher_get_actual_qos Mar 28, 2019
@ivanpauno ivanpauno removed the in progress Actively being worked on (Kanban column) label Mar 28, 2019
@ivanpauno
Copy link
Member Author

ivanpauno commented Mar 29, 2019

I merged this over ivanpauno/publisher_get_actual_qos unintentionally.
I will open a follow-up PR.

@wjwwood
Copy link
Member

wjwwood commented Mar 29, 2019

Wait did you revert it too? Please be careful to only merge things that have been reviewed and pass CI...

@wjwwood
Copy link
Member

wjwwood commented Mar 29, 2019

Wait was it merged into something other than master? That makes more sense 😀

@wjwwood wjwwood deleted the ivanpauno/skip-interprocess-publish-when-possible branch March 29, 2019 22:23
@ivanpauno
Copy link
Member Author

Wait was it merged into something other than master? That makes more sense

Yes, thanks for the correction! I merged this into ivanpauno/publisher_get_actual_qos branch unintentionally. I've already reverted the changes in that branch, and opened a follow-up PR here.

nnmm pushed a commit to ApexAI/rclcpp that referenced this pull request Jul 9, 2022
* Implement a generic way to change logging levels

This allows a per-logger configuration of the logging level,
along with the global level.

Signed-off-by: Chen Lihui <Lihui.Chen@sony.com>
Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
Co-authored-by: Tomoya.Fujita <Tomoya.Fujita@sony.com>
Co-authored-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants