Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call destroy() before removing ServerGoalHandle #1070

Conversation

otamachan
Copy link
Contributor

@otamachan otamachan commented Jan 15, 2023

Signed-off-by: Tamaki Nishino otamachan@gmail.com

As reported by @longjie0723 here https://answers.ros.org/question/411394/rclpy-action-server-seems-slowing-down-and-consuming-memory-in-long-term/ ,Python ActionServer could keep Future objects while it's running, which causes memory consumption increase and the performance issue.
It seems that it's because ServerGoalHandler registers Future when it is constructed and unregisters when destroy() is called, but destroy() is not called when the object is removed here .

This PR calls destroy() before ServerGoalHandler is deleted.

I confirmed the the original issue is resolved by this PR.

@otamachan otamachan force-pushed the unregister-future-of-ServerGoalHandle branch 2 times, most recently from f36900a to 7100df5 Compare January 15, 2023 07:07
@otamachan otamachan changed the title Remove future when ServerGoalHandle is destroyed Call destroy() before removing ServerGoalHandle Jan 15, 2023
Signed-off-by: Tamaki Nishino <otamachan@gmail.com>
@otamachan otamachan force-pushed the unregister-future-of-ServerGoalHandle branch from 7100df5 to c417d83 Compare January 15, 2023 07:13
Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@otamachan thanks for the contribution.

@@ -405,6 +405,7 @@ async def _execute_get_result_request(self, request_header_and_message):
async def _execute_expire_goals(self, expired_goals):
for goal in expired_goals:
goal_uuid = bytes(goal.goal_id.uuid)
self._goal_handles[goal_uuid].destroy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this works and addresses memory leak.

how about adding dtor in ServerGoalHandle to call destroy method internally? so that once this object is deleted, it automatically issues dtor to destroy the goal_handle?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC: @sloretz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. I thought so too at the first time, but I fount only a few classes have __del__ method, so I assumed that the most of the classes in rclpy are expected to be called destroy. I'm ok to change to call destroy in __del__ but I'd like to ask maintainers' opinion as well.

$ git grep __del__
rclpy/rclpy/executors.py:    def __del__(self):
rclpy/rclpy/signals.py:    def __del__(self):
rclpy/rclpy/task.py:    def __del__(self):

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not hard requirement, so that we can do the refactor later, but lets wait and see other opinion.

CC: @sloretz @clalancette @ivanpauno

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I assumed that the most of the classes in rclpy are expected to be called destroy

Most classes in rclpy support either being cleaned up by the garbage collector, or having their resources cleaned up before then by destroy(). The purpose of destroy() is to allow stopping external communication, like ceasing to subscribe to topics. It cleans up resources, but it's purpose is more like Socket.close() or File.close(). It just so happens it "closes" those resources by freeing the rcl entities being used.

Ideally here del self._goal_handles[goal_uuid] below would be sufficient as Python's garbage collector would clean up the goal handle. Theirs something larger wrong here with the addition and removal of the future from the Waitable being done in ServerGoalHandle, and maybe the ownership of the future itself. The result future isn't even used by the ServerGoalHandle class.

I would recommend fixing by making the result_future for each goal handle be owned by the ActionServer class, and then here the action server can remove the future from itself directly instead of indirectly via ServerGoalHandle destroy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thank you for your suggestion. I'll try that!

@fujitatomoya fujitatomoya self-assigned this Jan 18, 2023
@fujitatomoya fujitatomoya added the bug Something isn't working label Jan 18, 2023
@apockill
Copy link

apockill commented Mar 1, 2023

I'm running into this memory leak in production- but the concern is less in the memory aspect but the fact that action call rate slows down rather dramatically after a few hundred calls (from 300 calls per second to 50 to 10).

I've put together a small reproducible example that shows that ActionServer._futures is indeed growing at a high rate, but unfortunately for some reason, _execute_expire_goals is never being called by the ActionServer, and this fix is therefor not working.

Could someone help me understand what triggers the calling of _execute_expire_goals, so I can better help debug this (and my) issue? Thanks!

Edit: ignore me! I understand now- there's a timeout wherein execute_expire_goals gets called. It looks like the fix proposed in this PR also solves the problem on my end 🎉

@fujitatomoya
Copy link
Collaborator

@otamachan are you working on #1070 (comment)?

@dcconner
Copy link

dcconner commented Mar 9, 2023

I am also interested in the status of this fix

@apockill
Copy link

Hey folks! This issue seems to be pretty key to any action running in ROS- especially if you rely on ROS actions being relatively fast to call, and if you plan on running a robot for longer periods of time. I understand that #1070 (comment) might not be the ideal fix, but wouldn't it be worth merging it in the interim until the proposed refactor can be complete?

@fujitatomoya
Copy link
Collaborator

@apockill Just FYI, i created #1113.

@clalancette
Copy link
Contributor

Closing this one in favor of #1113

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants