-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only keep the last sample in the image tools by default. #238
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're using fastrtps right? I wouldn't have expected the publisher to keep the samples if there are no subscribers (unless we use transient local durability, which we aren't). But if this is the behaviour that we observe, then I agree it makes sense to change the default of the demo.
The help text of the demo/title of the PR are misleading because keep last will keep up to the queue size, not just 1 sample. I added e8e401a to improve the docs to more closely match https://github.com/ros2/ros2/wiki/About-Quality-of-Service-Settings
We should delay/keep this change in mind wrt ros2/rmw_fastrtps#202 which is referencing the existing defaults
Fast-RTPS, correct. Do you think the publisher keeping the samples even if there are no subscribers is a bug we should report to upstream Fast-RTPS?
👍
Good point. I'll keep this in progress until that is merged, then I'll retest. |
I'm not sure if the behaviour is within the spec or not, so don't know that it's a bug. We could ask eProsima. |
@richiprosima did you have any input on if this is the expected behaviour? |
This is not a bug but expected behavior in DDS. The purpose is to provide all "historical" data to late joining subscribers. |
even if using durability of volatile as mentioned? |
I wouldn't expect it to happen with volatile. Seems like the same thing we discussed earlier: ros2/rmw_fastrtps#68 (comment) |
While publisher was running, was a subscriber running too? In this case, was the subscriber's application killed or stopped neatly? My thought is that publisher doesn't know the subscriber is not anymore there and is storing the samples until the subscriber acknowledge them. |
I can confirm that this is reproducible without any subscriber running. Using master, without any other node running, I ran: After 14 sec, the process is using 4.5GB of RAM and crashes when trying to publish the 5001st sample
That is expected when we use KEEP_ALL from what I understand ? ros2/rmw_fastrtps#68 @clalancette The patch should be modified to use the defaults profile from rmw. This problem was faced and worked around in ros2/rmw#82. So I think that this demo should stop hard-coding the history and reliability policy (but keep defining the queue size) and use the default profiles instead |
Although you use KEEP_ALL, in VOLATILE samples are stored until subscribers acknowledge them. As I see this mechanism is implemented when an ACKNACK message is received. In your scenario where there aren't subscribers this mechanism is not called and the samples are not removed from the history. This is a bug. I will work on it. Thanks for the reporting. |
@mikaelarguedas Thanks for the additional testing; that is exactly what we saw in testing.
I think there is some value in having a demo where users can easily play with the "low-level" settings of history and reliability to see what the effects on the system are. That being said, I think that we should probably use the default profiles as a default, and only modify it if the user passes a flag. What do you think about that?
@richiware Thanks for the confirmation. |
Yes, that is what I'm suggesting 👍
Removing the hard-coded defaults will not modify user's ability to tweak the QoS by passing command line arguments |
e8e401a
to
9ea0a33
Compare
I re-tested this with the latest Fast-RTPS changes, and this still "leaks" memory if there are no subscribers. Therefore, I've gone ahead nad rebased this, and changed it to stick with I tested by first running |
@mikaelarguedas Pointed out that I should test against the |
image_tools/src/cam2image.cpp
Outdated
return 0; | ||
} | ||
|
||
// Initialize a ROS 2 node to publish images read from the OpenCV interface to the camera. | ||
auto node = rclcpp::Node::make_shared("cam2image"); | ||
rclcpp::Logger node_logger = node->get_logger(); | ||
|
||
// Set the parameters of the quality of service profile. Initialize as the default profile | ||
// and set the QoS parameters specified on the command line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found these comments valuable even if they come at the cost of a few local variables.
Do you mind keeping reliability_policy
and friends variables but initialize them with the default profile values instead of the previously hard-coded ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did basically this, but just re-used the fields of the structure. The code looks slightly weird this way, but at least has the documentation that you liked. I could re-introduce the local variables, but it seems like more of a way to make a mistake than anything else. Let me know what you think.
9ea0a33
to
9bea79d
Compare
It's not a very big deal but while this allows to keep the comments, all the statements have not effect so I don't know if it's clearer for users. I'd prefer keeping the temporary variables (initialized with the default values of the qos profile), modify them based on user provided input, and then assign them to the custom qos profile with the appropriate comment. Then I don't see it as a blocker for getting this merged. |
9bea79d
to
464d7ce
Compare
Particularly for cam2image, running it without a subscriber and KEEP_ALL will eventually cause it to exhaust virtual memory space. Running it with KEEP_LAST lets it run indefinitely without growing memory usage. Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
464d7ce
to
79ff18b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks for iterating
Can you look into the failures on windows ? it looks to be due to this change as it doesnt appear on master |
Yeah, just looking into it now. Odd failure, I'm not sure why it is happening. |
Windows was getting upset at the Unicode ones. Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
So this was machine independant after all ? or is there something different on portable than the other machines? |
It was not machine dependent. I made a mistake while trying to reproduce it, and finally found the real problem (the quotes). |
👍 |
CI is passing, and I have two approvals, so I'm going to merge this in. Thanks! |
Particularly for cam2image, running it without a subscriber
and KEEP_ALL will eventually cause it to exhaust virtual
memory space. Running it with KEEP_LAST lets it run indefinitely
without growing memory usage.
We saw this problem while just running cam2image for this tutorial https://github.com/ros2/ros2/wiki/Rosbag-with-ROS1-Bridge . We had left it running for a little while before running the bridge/rosbag, and it would crash after a few minutes. With this change in place, we can leave it running for as long as we like.