Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xmlrpccpp spins forever when no more file handles #914

Closed
trainman419 opened this issue Oct 19, 2016 · 6 comments
Closed

xmlrpccpp spins forever when no more file handles #914

trainman419 opened this issue Oct 19, 2016 · 6 comments

Comments

@trainman419
Copy link
Contributor

When a ROS node runs out of available file handles, it can go into an infinite spin loop if it gets an incoming xmlrpc request.

We found this when executing the rosprofiler nodes against our system, so this is partially blocking us from running rosprofiler.

I've put a small test program that demonstrates this: https://github.com/trainman419/ros_infinite_loop ; compile and run the test.launch and top to see the loop node spin at 100%

I've done some very informal profiling and it looks like the program is spending all of its time in ros::CallbackQueue::callAvailable.

It also looks like the socket close here should not be commented out, and this may be contributing to leaked sockets:

cc @Knio

@trainman419
Copy link
Contributor Author

This is preventing my company from using the ROS profiling tools, and it's rather annoying. Does someone have time to take a look at this?

@wjwwood
Copy link
Member

wjwwood commented Dec 1, 2016

I'm sorry to say, but I don't have time right now, and I wouldn't count on any of the maintainers looking into it before the end of the year, but I could be wrong about that. Maybe someone out in the community could dig into it, especially since you set up a reproducible example (thanks for that btw).

Since you spent the time to make the reproducible example and make a suggestion of where the issue might be, I guess you guys spent some time trying to fix it and are stumped? Unfortunately, I'm not very familiar with the XMLRPC code that is embedded (😞) in ros_comm, so I probably can't give any suggestions or clues without spending a huge amount of time researching the issue. I think @ruffsl spent some time in that part of the code while he was looking at SROS recently. But he might not have time or insight into it either.

I assume it's not an option for your company to spend more time to dig into the issue and find the solution? If so that's a shame, because it would help your company and everyone else in the ROS community! Alternatively, you could raise the visibility of this by posting it on http://discourse.ros.org/ and seeing if anyone else finds it annoying enough to help try and fix it.

@ruffsl
Copy link

ruffsl commented Dec 1, 2016

Well my exploration of ROS's xmlrpc for SROS has been mainly been confined to rospy, but I just tested example, and was able to reproduce the issue on my local machine:

$ apt-cache policy ros-kinetic-ros-comm 
ros-kinetic-ros-comm:
  Installed: 1.12.6-0xenial-20161026-193907-0700
  Candidate: 1.12.6-0xenial-20161026-193907-0700
  Version table:
 *** 1.12.6-0xenial-20161026-193907-0700 500
        500 http://packages.ros.org/ros/ubuntu xenial/main amd64 Packages
        100 /var/lib/dpkg/status

Needed to make a small change to the cmake to compile:
trainman419/ros_infinite_loop#1

@trainman419 , I tried uncommenting out that line you pointed to in this dockerfile.
I don't see the process pegging out anymore, but I do see the connection was refused.

ERROR: connection refused to [http://0e01582fb6e7:34856/]

image

Here was my build steps:

git clone -b docker https://github.com/ruffsl/ros_infinite_loop.git
cd ros_infinite_loop/docker
docker build -t ros:loop .
docker run --rm -it ros:loop /ros_infinite_loop.sh

@trainman419
Copy link
Contributor Author

We don't have a lot of time to work on this either. I may try to fix this on my company's copy of ros_comm, but the last time I submitted a pull request to ros_comm was not a pleasant experience, so I'm not motivated to submit that fix upstream.

@wjwwood
Copy link
Member

wjwwood commented Jan 10, 2017

I may try to fix this on my company's copy of ros_comm, but the last time I submitted a pull request to ros_comm was not a pleasant experience, so I'm not motivated to submit that fix upstream.

I'm sorry to hear that. If there's anything I can do to motivate you (or your company) to do so when and if you come up with a fix, let me know.

@mgrrx
Copy link
Contributor

mgrrx commented Jan 11, 2017

I reproduced the problem today (Thanks @trainman419 for the reproducible example!). Basically I see here two different issues:

  • running out of available file descriptors and
  • not handling the problem when running out of available file descriptors.

The first problem is another instance of #610, the second is not yet addressed and I'll look into this issue in the coming days. I've an open pull request for #610 in #834, but it's not yet ready to be merged as discussed with @dirk-thomas. Any contribution on #834 is highly welcome and I can confirm that it fixes the first of the two problems (i.e. you are no longer running out of available file descriptors with this fix).

dirk-thomas added a commit that referenced this issue Jan 17, 2017
[xmlrpcpp] close socket when connection can't be accepted. Fixes #914
rsinnet pushed a commit to MisoRobotics/ros_comm that referenced this issue Jun 19, 2017
[xmlrpcpp] close socket when connection can't be accepted. Fixes ros#914
trainman419 added a commit to trainman419/ros_comm that referenced this issue Nov 22, 2017
Add integration tests for the XmlRpcServer interacting with an
XmlRpcClient in the same process, particularly around the behavior when
the server process runs out of available file handles.

Update the XmlRpcServer with two different mitigations for file handle
exhaustion (Fixes ros#914, replaces ros#960 and ros#977):
 * Measure the number of free file handles and reject incoming
connections if the pool of free file descriptors is too small. This
actively rejects incoming clients instead of waiting for complete file
handle exhaustion, which would leave clients in a pending state until a
file descriptor becomes free.
 * If accept fails due to complete file handle exhaustion, temporarily
stop calling accept on this socket. This prevents a busy-loop where
poll() believes the listening socket is readable, but accept() fails to
allocate a file descriptor and leaves the socket in a readable state.
trainman419 added a commit to trainman419/ros_comm that referenced this issue Dec 14, 2017
Add integration tests for the XmlRpcServer interacting with an
XmlRpcClient in the same process, particularly around the behavior when
the server process runs out of available file handles.

Update the XmlRpcServer with two different mitigations for file handle
exhaustion (Fixes ros#914, replaces ros#960 and ros#977):
 * Measure the number of free file handles and reject incoming
connections if the pool of free file descriptors is too small. This
actively rejects incoming clients instead of waiting for complete file
handle exhaustion, which would leave clients in a pending state until a
file descriptor becomes free.
 * If accept fails due to complete file handle exhaustion, temporarily
stop calling accept on this socket. This prevents a busy-loop where
poll() believes the listening socket is readable, but accept() fails to
allocate a file descriptor and leaves the socket in a readable state.
dirk-thomas pushed a commit that referenced this issue Dec 19, 2017
* Tests and bug fixes for XmlRpcServer

Add integration tests for the XmlRpcServer interacting with an
XmlRpcClient in the same process, particularly around the behavior when
the server process runs out of available file handles.

Update the XmlRpcServer with two different mitigations for file handle
exhaustion (Fixes #914, replaces #960 and #977):
 * Measure the number of free file handles and reject incoming
connections if the pool of free file descriptors is too small. This
actively rejects incoming clients instead of waiting for complete file
handle exhaustion, which would leave clients in a pending state until a
file descriptor becomes free.
 * If accept fails due to complete file handle exhaustion, temporarily
stop calling accept on this socket. This prevents a busy-loop where
poll() believes the listening socket is readable, but accept() fails to
allocate a file descriptor and leaves the socket in a readable state.

* Add test depend on boost

* Run astyle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants