Skip to content

Conversation

@jacobperron
Copy link

Add overloads for spinOnce methods to include a timeout argument and call this when spinning in the future.

This fixes a bug where the future.get(...) call blocks forever, e.g. when waiting on a service response and the service becomes unavailable.

Add overloads for spinOnce methods to include a timeout argument and call this when spinning in the future.

This fixes a bug where the future.get(...) call blocks forever, e.g. when waiting on a service response and the service becomes unavailable.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
@jacobperron jacobperron requested a review from ivanpauno July 10, 2021 00:22
Copy link
Collaborator

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that the Future returned is calling spin() behind the scenes when you call get() is already a problem, because the current executor doesn't support to be spinned in parallel.

IMO the returned future should not call spin().
We could for example use the java intrinsic wait/notify methods.

@jacobperron
Copy link
Author

IMO the returned future should not call spin().
We could for example use the java intrinsic wait/notify methods.

Makes sense to me.

In any case, do you see any issue in adding the new RCLJava spin methods? It'd be nice to get this fix in and then refactor to avoid calling spin in Future altogether.

@jacobperron
Copy link
Author

For instance, without this change the example breaks (hangs forever) if we add the timeout argument to this call:

https://github.com/ros2-java/ros2_java_examples/blob/4305e58493bfa37484457b3fe12525cbc0f87319/rcljava_examples/src/main/java/org/ros2/rcljava/examples/client/AddTwoIntsClient.java#L46

@ivanpauno
Copy link
Collaborator

In any case, do you see any issue in adding the new RCLJava spin methods?

No, this is not adding a new problem, so I think it's fine to merge this PR.

For instance, without this change the example breaks (hangs forever) if we add the timeout argument to this call:

https://github.com/ros2-java/ros2_java_examples/blob/4305e58493bfa37484457b3fe12525cbc0f87319/rcljava_examples/src/main/java/org/ros2/rcljava/examples/client/AddTwoIntsClient.java#L46

How does that happen? Is there no server running?
I would assume that only hangs if the server never replies.
In that case, the timeout helps the user to detect the issue, but the client implementation will still leak memory https://github.com/ros2-java/ros2_java/blob/2ac6f6cecb4e50ea04ffa00b588f434a64799976/rcljava/src/main/java/org/ros2/rcljava/client/ClientImpl.java#L56.

rclcpp has a similar issue ros2/rclcpp#1697.

@jacobperron
Copy link
Author

How does that happen? Is there no server running?

Yes, we can contrive an example where we don't wait for a service and directly send a request.

Even if we call waitForService, we have still observed that sometimes the request or the response (we're not sure) is not received. This seems to depend on the RMW. I'm just speculating, but it seems like there's a race between waitForService returning true and the client actually establishing a connection with the service.

@jacobperron jacobperron merged commit 09c0e68 into galactic-devel Jul 12, 2021
@jacobperron jacobperron deleted the jacob/fix_future_get branch July 12, 2021 22:08
@ivanpauno
Copy link
Collaborator

Even if we call waitForService, we have still observed that sometimes the request or the response (we're not sure) is not received. This seems to depend on the RMW. I'm just speculating, but it seems like there's a race between waitForService returning true and the client actually establishing a connection with the service.

Yes, that used to happen in some rmw implementations.
The problem usually is that the client sends a request and the server receives it, but it has not yet discovered the response reader of the client, so the server doesn't send anything back.

jacobperron added a commit to ros2-java/ros2_java that referenced this pull request May 17, 2022
Add overloads for spinOnce methods to include a timeout argument and call this when spinning in the future.

This fixes a bug where the future.get(...) call blocks forever, e.g. when waiting on a service response and the service becomes unavailable.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
@bekleyis95
Copy link

Hello, for my term project I am trying to use ros2_java project. I have a subscriber that I am activating by RCLJava.spin and i am doing this in a new thread. But as soon as spin is called all the other stuff that is being executed stopping, so RCLJava.spin block all the other work and it does not subscribe to the topic without this command. I dont know if this is the right place to post it but I thought i can find an answer here. All help is appreciated, thank you in advanced

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants