-
Notifications
You must be signed in to change notification settings - Fork 911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using poll() in favor of select() in the XmlRPCDispatcher #833
Conversation
@ros-pull-request-builder retest this please |
Can you please add a description to this pull request describing which exact use case this patch fixes. If possible also with a reproducible example. |
if ( ! src->getKeepOpen()) | ||
src->close(); | ||
} else if (newMask != (unsigned) -1) { | ||
thisIt->getMask() = newMask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're mutating these lines anyway, perhaps switch to ROS style for the braces? http://wiki.ros.org/CppStyleGuide#Formatting
Haven't tested yet, but this change seems reasonable to me, to get out from under the 1024 socket limit of Linux's Of course, fixing the CLOSE_WAIT issue will also have a big mediating effect, so maybe it will turn out not to be as necessary— but I could still see it as valid, especially for nodes which have to make a ton of subscribers or publishers, like rosbag record/play. |
Should we consider this for the "hitlist"? I will add the label tentatively. Please feel free to remove it. |
I've ported this to kinetic-devel, +1 for the hitlist. |
SourceList::iterator it; | ||
for (it=_sources.begin(); it!=_sources.end(); ++it) { | ||
std::size_t i = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer the scope of this variable to be limited to the for loop since it isn't used afterwards.
The loop in line 119 can then declare its own local index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree. i
is used the same way as it
. Moving i
into it's own scope would mean to add an extra pair of brackets and indention and breaking the overall flow. Or did I miss something?
I have reverted this change in #981. Please see that PR for more details about the rational. Please create a new PR to apply these changes again. It looks like it will need some modifications to pass CI cleanly. |
It's weird that there's a checkmark on c7c2087, so it seems like it ran clean at least once. Maybe some tests need extra time buffers or something? |
It can also be that since that CI ran something else had changed. While each PR on its own passes it doesn't mean both being merged still passes. Just as one example why this is not a 100% guarantee. |
A temporary fix to avoid buffer overflow in XmlRpcDispatch, since select() only works for file descriptors < FD_SETSIZE (1024). The real solution is to switch to poll as in ros#833, but since that PR was reverted, this avoids catastrophic failure. It throws instead of being silent, since (a) this behaviour is no less intrusive than a buffer overflow, and (b) returning an error was handled silently, and just looked like ros comm not being connected.
A temporary fix to avoid buffer overflow in XmlRpcDispatch, since select() only works for file descriptors < FD_SETSIZE (1024). The real solution is to switch to poll as in ros#833, but since that PR was reverted, this avoids catastrophic failure. It throws instead of being silent, since (a) this behaviour is no less intrusive than a buffer overflow, and (b) returning an error was handled silently, and just looked like ros comm not being connected.
* Using poll() in favor of select() in the XmlRPCDispatcher * ros#832: initialize pollfd event with 0. Added check for POLLHUP and POLLNVAL * Fix syntax
* Using poll() in favor of select() in the XmlRPCDispatcher * #832: initialize pollfd event with 0. Added check for POLLHUP and POLLNVAL * Fix syntax
* Using poll() in favor of select() in the XmlRPCDispatcher * #832: initialize pollfd event with 0. Added check for POLLHUP and POLLNVAL * Fix syntax
* fix open mode on Windows * Fix BZip2 inclusion * Respect if/unless for roslaunch-check. * fix rosmsg show from bag * fix rosbag::View::iterator copy assignment operator (ros#1017) * refactor test_rosbag_storage * fix rosbag::View::iterator copy assignment operator the compiler-generated copy assignment operator did lead to segfault and memory leaks. * Add subscriber to connection log messages. (ros#1023) * ensure cwd exists * Sleep in rospy wait_for_service even if exceptions raised * Avoid deleting XmlRpcClient's while they are being still in use on another thread (ros#1013) * Avoid deleting XmlRpcClient's while they are being still in use on another thread * Acquire clients_mutex_ before deleting the clients * Remove the timed wait for the clients to become not in use * Only delete and erase clients that are not in use * Clients that would still be in use would delete themselves on release * Wait for clients that are in use to finish in XmlRpcManager::shutdown * Abort topic lookup on connection refused In a multimaster environment where a topic has multiple publishers, when a node drops out abruptly (host is shutdown), a single subscriber update on that topic will cause multiple threads to be created (one for each host) in order to resolve the topic location. This cause a thread leak as host which are turned off will not respond and when they come back online, the xmlrpc URI is changed causing a connection refused error at the socket layer. This fix catches the connection refused error and terminate the thread with the understanding that if the connection is refused, the rosnode cannot be reached now or never. This effectively prevents thread leak. Note: if the remote host where the rosnode is thought to be never comes back up, then the thread will still be leaked as the exception received is a host unreachable type. This is intentional to avoid abruptly terminating the thread in case of a temporary DNS failure. * Fix bug in transport_tcp (ros#1050) * Fix bug in transport_tcp It assumes that the `connect` method of non-blocking scoket should return -1 and `last_socket_error()` should return `ROS_SOCKETS_ASYNCHRONOUS_CONNECT_RETURN`(=`EINPROGRESS`). But a non-blocking `connect` can return 0 when TCP connection to 127.0.0.1 (localhost). [http://stackoverflow.com/questions/14027326/can-connect-return-0-with-non-blocing-socket](http://stackoverflow.com/questions/14027326/can-connect-return-0-with-non-blocing-socket) * Modify code format Modify code format * Fix race condition that lead to miss first message (lunar) (ros#1058) * Fix race condition that lead to miss first message Callback queue waits for callback from "callOne" method. When internal queue is not empty this last method succeeded even if id info mapping does not contains related callback's id. In this case, first callback (for one id) is never called since "addCallback" method first push callback into internal queue and *then* set info mapping. So id info mapping has to be set before push callback into internal queue. Otherwise first message could be ignored. * Added test for addCallback race condition * ensure pid file is removed on exit * Changed the check command output to be a little bit more specific. * [roslaunch] Fix pid file removing condition * [rospy] Add option to reset timer when time moved backwards (ros#1083) * Add option to reset timer when time moved backwards * refactor logic * add missing mutex lock for publisher links * [rospy] Improve rospy.logXXX_throttle performance * Added logging output when `roslogging` cannot change permissions (ros#1068) * Added logging output when `roslogging` cannot change permissions Added better differentiated logging output to `roslogging` so that problems with permission is made clear to the user. Accompanying test have also been added. * Removed testing, updated warning message and fixed formatting Removed testing since test folder should not be stored together with tests. Since testing group permission requires intervention outside the test harness the test it self is also removed. Updated the warning message to include `WARNING` and updated to using `%` formatting. * [rostest] Check /clock publication neatly in publishtest (ros#973) * Check /clock publication neatly in publishtest - Use time.sleep because rospy.sleep(0.1) hangs if /clock is not published - Add timeout for clock publication * Add comment explaining use of time.sleep. * Use logwarn_throttle not to flood the console * A fix to a critical stack buffer overflow vulnerability which leads to direct control flow hijacking (ros#1092) * A fix to a critical stack buffer overflow vulnerability which leads to control flow hi-jacking. * Much more simple fix for the stack overflow bug * only launch core nodes if master was launched by roslaunch * Made copying rosbag::Bag a compiler error to prevent crashes and added a swap function instead (ros#1000) * Made copying rosbag::Bag a compiler error to prevent crashes * Added Bag::swap(Bag&) and rosbag::swap(Bag&, Bag&) * Fixed bugs in Bag::swap * Added tests for Bag::swap * [roscpp] add missing header for writev(). After an update of gcc and glibc roscpp started to fail builds with the error: /home/rojkov/work/ros/build/tmp-glibc/work/i586-oe-linux/roscpp/1.11.21-r0/ros_comm-1.11.21/clients/roscpp/src/libros/transport/transport_udp.cpp:579:25: error: 'writev' was not declared in this scope ssize_t num_bytes = writev(sock_, iov, 2); ^~~~~~ According to POSIX.1-2001 the function writev() is declared in sys/uio.h. The patch includes the missing header for POSIX compliant systems. * Add SteadyTimer (ros#1014) * add SteadyTimer based on SteadyTime (which uses the CLOCK_MONOTONIC). This timer is not influenced by time jumps of the system time, so ideal for things like periodic checks of timeout/heartbeat, etc... * fix timer_manager to really use a steady clock when needed This is a bit of a hack, since up to boost version 1.61 the time of the steady clock is always converted to system clock, which is then used for the wait... which obviously doesn't help if we explicitly want the steady clock. So as a workaround, include the backported versions of the boost condition variable if boost version is not recent enough. * add tests for SteadyTimer * [test] add -pthread to make some tests compile not sure if this is only need in my case on ROS indigo... * use SteadyTime for some timeouts * add some checks to make sure the backported boost headers are included if needed * specialize TimerManager threadFunc for SteadyTimer to avoid the typeid check and make really sure the correct boost condition wait_until implementation is used * Revert "[test] add -pthread to make some tests compile" This reverts commit f62a3f2. * set minimum version for rostime * mostly spaces * Close CLOSE_WAIT sockets by default (ros#1104) * Add close_half_closed_sockets function * Call close_half_closed_sockets in xmlrpcapi by default * fix handling connections without indices * fix rostopic prining long integers * update tests to match stringify changes * ignore headers with zero stamp in statistics * Improves the stability of SteadyTimerHelper. Due to scheduling / resource contention, `sleep`s and `wait_until`s may be delayed. The `SteadyTimerHelper` test class was not robust to these delays, which was likely the cause of a failing test (`multipleSteadyTimeCallbacks` in `timer_callbacks.cpp`:220). * Improve steady timer tests (ros#1132) * improve SteadyTimer tests instead of checking when the callback was actually called, check for when it was added to the callback queue. This *should* make the test more reliable. * more tolerant and unified timer tests * xmlrpc_manager: use SteadyTime for timeout * Replaced deprecated lz4 function call * Removed deprecated dynamic exception specifications * only use CLOCK_MONOTONIC if not apple * Improved whitespace to fix g++ 7 warning .../ros_comm/tools/rosbag_storage/src/view.cpp:249:5: warning: this ‘if’ clause does not guard... [-Wmisleading-indentation] if ((bag.getMode() & bagmode::Read) != bagmode::Read) ^~ .../ros_comm/tools/rosbag_storage/src/view.cpp:252:2: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘if’ boost::function<bool(ConnectionInfo const*)> query = TrueQuery(); ^~~~~ * Using poll() in favor of select() in the XmlRPCDispatcher (ros#833) * Using poll() in favor of select() in the XmlRPCDispatcher * ros#832: initialize pollfd event with 0. Added check for POLLHUP and POLLNVAL * Fix syntax * poll flags emulate select, verify requests, sync/init sources/fds This commit makes sure that the poll flags emulate select (which it replaces). It also double-checks event types to make sure they were requested (e.g. POLLERR might trigger both). It keeps track of fd/src relationship through two parallel arrays, instead of an iterator / array hybrid. * Fix rostopic hz and bw in Python 3 (ros#1126) * string.atoi is not present in Python3, just use int(x) * rostopic bw: set default value of window_size arg to -1 instead of None * Check for window_size < 0 when constructing ROSTopicBandwidth object * Revert "Check for window_size < 0 when constructing ROSTopicBandwidth object" This reverts commit 86a2a29. * Revert "rostopic bw: set default value of window_size arg to -1 instead of None" This reverts commit 4c74df9. * Check for argument != None before calling int(options.window_size) * Properly check for options.window_size != None before converting to int * Don't direct users to build rosout with rosmake. (ros#1140) * Don't direct users to build rosout with rosmake. * Eliminate special case for rosout being missing. * use not deprecated console_bridge macros and undefine the deprecated ones * Fix rosbag API for Python 3 Closes ros#1047 * Sort the output of rosnode info. * Minor fixes for compatibility with both Python 2 & 3 * [bug] fixes ros#1158 (ros#1159) * [bug] fixes ros#1158 XmlLoader and LoaderContext no longer share the param list to child 'node' contexts. This was causing the creation of unintended private parameters when the tilde notation was used. * added test cases for tilde param in launch * added test cases for tilde param in python * fixed tilde param issue for group tags Issue ros#1158 manifested for group tags that appear before (but not containing) node tags. * added one more test case for issue ros#1158 used param tag to make sure we test the proposed fix * Added negative tests for extra parameters Some parameters should not be present at all. * rosconsole: replaced 'while(0)' by 'while(false)' * Change rospy.Rate hz type from int to float (ros#1177) * Don't try to set unknown socket options (ros#1172) * Don't try to set unknown socket options These are not avaible on FreeBSD, for example * individualize ifdefs * fix whitespace * rosnode: Return exit code 1 if there is an error. (ros#1178) * Fixed an out of bounds read in void rosbag::View::iterator::increment() (ros#1191) - Only triggered if reduce_overlap_ = true - When iters_.size() == 1 and iters_.pop_back() gets called in the loop, the next loop condition check would read from iters_.back(), but iters_ would be empty by then. * Test bzfile_ and lz4s_ before reading/writing/closing (ros#1183) * Test bzfile_ before reading/writing/closing * Test lz4stream before reading/writing * More agile demux. (ros#1196) * More agile demux. Publishers in demux are no longer destroyed and recreated when switching, which results in much faster switching behavior. The previous version took even 10 seconds to start publishing on the newly selected topic (all on localhost). Please, comment if you feel the default behavior should stay as the old was, and this new behavior should be triggered by a parameter. * update style * catch exception with `socket.TCP_INFO` on WSL (ros#1212) * catch exception with `socket.TCP_INFO` on WSL fixes issue ros#1207 this only catches socket error 92 * Update util.py * Update util.py * avoid unnecessary changes, change import order * fix path check * fix error message to mention what was actually tested * fix roswtf test when rosdep is not initialized * update changelogs * 1.12.8 * backward compatibility with libconsole-bridge in Jessie * update changelogs * 1.12.9 * backward compatibility with libconsole-bridge in Jessie (take 2) * update changelogs * 1.12.10 * Revert "Replaced deprecated lz4 function call" This reverts commit a31ddd0. * update changelogs * 1.12.11 * backward compatibility with libconsole-bridge in Jessie (take 3) (ros#1235) * backward compatibility with libconsole-bridge in Jessie (take 3) * copy-n-paste error * remove fragile undef * dont rely on existence of short version of the macros, define the long with API calls instead * remove empty line * update changelogs * 1.12.12 * place console_bridge macros definition in header and use it everywhere console_bridge is included (ros#1238)
The XmlRPCDispatcher uses select() for checking the open file descriptors, but this function can't check more than FD_SETSIZE file descriptors in Linux. FD_SETSIZE is set to 1024 by default and increasing that value would require to rebuild the current kernel. I observed this bug during my work on the CLOSE_WAIT problem (#610, #831): Every now and then the rosout process died. I found out that it reached this limit of 1024 open connections (most of them were in the CLOSE_WAIT state) when relaunching our robot's bringup launch file several times. The easiest way to fix this bug is to use select's 'successor' poll() which isn't limited by 1024. From a performance point of view, both implementations are similar.