Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roscpp: method getNumSubscribers() returns wrong number of subscribers? #1122

Open
hujun1413 opened this issue Aug 7, 2017 · 9 comments
Open

Comments

@hujun1413
Copy link

hujun1413 commented Aug 7, 2017

Hi, everyone

We noticed that method getNumSubscribers() returns 1 when there are two subscribers in one procedure listening to the same topic. The following are the code and the running result:

this is the publisher node, which publishing message on the test_topic topic:

#include "ros/ros.h"
#include <sstream>
#include "std_msgs/String.h"

#define MSGLEN (1024 * 1024 * 3)
#define HZ (60)

int main(int argc, char ** argv) {
  ros::init(argc, argv, "talker", ros::init_options::AnonymousName);
  ros::NodeHandle n;
  ros::Publisher p = n.advertise< std_msgs::String >("test_topic", HZ);

  sleep(1);
  ros::Rate loop_rate(HZ);
  int count = 0;
  while (ros::ok()) {
    std::stringstream ss;
    ss << "message #" << count << "...";
    std::string info = ss.str();

    for (int i = 0; i < MSGLEN - 20; i++)
        ss << " ";
    std_msgs::String msg;
    msg.data = ss.str();

    ROS_INFO("subscriber number: %d  info: [%s]", p.getNumSubscribers(), info.c_str());
    p.publish(msg);

    ros::spinOnce();
    loop_rate.sleep();
    count++;
  }
  return 0;
}

and this is the subscriber node, two subscribers in one procedure listening to the same test_topic topic:

#include "ros/ros.h"
#include "std_msgs/String.h"

void chatterCallback(const std_msgs::String::ConstPtr & msg) {
  char str[21] = {'\0'};
  strncpy(str, msg->data.c_str(), 20);
  ROS_INFO("I heard: [%s]", str);
}

int main(int argc, char ** argv) {
  ros::init(argc, argv, "listener", ros::init_options::AnonymousName);
  ros::NodeHandle n;
  ros::Subscriber s = n.subscribe("test_topic", 60, chatterCallback);
  ros::Subscriber s1 = n.subscribe("test_topic", 60, chatterCallback);
  ros::spin();
  return 0;
}

this is the result of running the publisher and the subscriber:

output of the publisher node:
[ INFO] [1502094707.795889330]: subscriber number: 0  info: [message #15...]
[ INFO] [1502094707.860226202]: subscriber number: 0  info: [message #16...]
[ INFO] [1502094707.924455427]: subscriber number: 0  info: [message #17...]
[ INFO] [1502094707.989259088]: subscriber number: 1  info: [message #18...]
[ INFO] [1502094708.060402012]: subscriber number: 1  info: [message #19...]
[ INFO] [1502094708.130473595]: subscriber number: 1  info: [message #20...]
[ INFO] [1502094708.200720050]: subscriber number: 1  info: [message #21...]

output of the subscriber node:
[ INFO] [1502094707.996161976]: I heard: [message #18...      ]
[ INFO] [1502094707.996315771]: I heard: [message #18...      ]
[ INFO] [1502094708.066305320]: I heard: [message #19...      ]
[ INFO] [1502094708.066384036]: I heard: [message #19...      ]
[ INFO] [1502094708.135836806]: I heard: [message #20...      ]
[ INFO] [1502094708.135907461]: I heard: [message #20...      ]
[ INFO] [1502094708.206246607]: I heard: [message #21...      ]
[ INFO] [1502094708.206339077]: I heard: [message #21...      ]

as you can see, method getNumSubscribers() returns 1, though the two subscribers' callback function all get called when a new message has arrived on the test_topic topic.

what confuses me is that the two callback functions all get called means that the two functions has been registered, but why the method getNumSubscribers() only think there is 1 subscriber. I think it should return 2 as there are two callback function waiting for the message.

So can anybody tell me how method getNumSubscribers() count the number of subscribers in such case?

ps: if the two subscribers are in two procedures, of course method getNumSubscribers() returns 2.

@mikepurvis
Copy link
Member

I believe this is by design— getNumSubscribers represents the number of transports being serviced by each message sent to the publisher, and multiple subscribers in the same process get collapsed down to a single socket, where the message is fanned out to all receivers once it arrives.

@hujun1413
Copy link
Author

hujun1413 commented Aug 8, 2017

Thanks for your response. @mikepurvis

Just as you said, if

getNumSubscribers represents the number of transports being serviced...

why not name the method as getNumTransports?


After doing a experiment, I found that getNumSubscribers returns 1 when there are multiple nodelets running on the same nodelet manager and subscribing to the same topic.

Well, then, what API should I call to get the number of subscribers?


ps: Maybe ROS can add a API to get the exact number of subscribers and name it getNumSubscribers.

@evenator
Copy link
Contributor

evenator commented Aug 8, 2017

Unfortunately, the ROS master doesn't (can't) know how many times a node (or nodelet manager) has subscribed to a topic, because it has no introspection into the node or nodelet manager. In order to do this, every node would have to do some kind of bookkeeping and report back to the master the number of callbacks it has attached to each subscriber.

@dirk-thomas
Copy link
Member

why not name the method as getNumTransports?

The name of this function has been chosen a long time ago. Changing it now would imply updating a lot of existing code. Instead the documentation should clarify the semantic of the function. If you think the documentation or docblock should be improved please propose a change to either of them.

Ideally in an anonymous publish / subscribe system you shouldn't rely on the exact number of subscribers anyway. The provided number is mostly useful to determine if there are any subscriber or if not your code might be able to skip some computations to save resources.

@hujun1413
Copy link
Author

So greatful to receive your answers, really thank you all you guys!

I would like to help change the documentation or docblock to help more people who encounter this problem like me.

@abhijit-hota
Copy link

Late to the party but this issue still persists!

And I'm using ros::TopicManager. The method is the same. Only it takes the name of the topic as a parameter.

When I run my node and do a "rostopic info [name of topic]", I can see the subscribers clearly connected. But the method just returns 0!

Any alternative method I can use?

@dirk-thomas
Copy link
Member

When I run my node and do a "rostopic info [name of topic]", I can see the subscribers clearly connected. But the method just returns 0!

@abhijit-hota There are several unit tests in place which ensure that getNumSubscribers() returns the correct count and they all pass. Please provide enough information to reproduce you case (exact platform, versions, environment, reproducible steps).

@ndepal
Copy link

ndepal commented Aug 5, 2021

I'm having a similar problem as @abhijit-hota:
I have a roscpp node that has an actionlib SimpleActionClient. During start up, my node calls ac.waitForServer(). This sometimes hangs indefinitely, sometimes it just takes several minutes, despite the ROS node serving the action already being up.

The debug output of ros.actionlib.ConnectionMonitor of the client node shows:

isServerConnected: Client has not yet connected to feedback topic of server

Which is printed here.

Doing a rostopic echo /my_action/feedback does show the server node as the publisher as well as the client node as a subscriber.

Doing a rostopic pub /my_action/feedback ... will get the ac.waitForServer() to complete right away.

It's worth noting that the server node actually serves multiple actions that the same client node connects to. After the first ac.waitForServer() succeeds (after several minutes), the next one still hangs.

Environment

I am running this on ROS melodic. Both server and client are C++ nodes.
roscore, the server and client are each running inside their own Docker containers and communicating over a bridged Docker network (although it also happens if they are running with network_mode: host). I can rosnode ping each node from every other container, and publishing/subscribing works just fine, so I know the communication between the nodes is fine in principle. The only thing that hangs are the ac.waitForServer() calls.

I have never seen this issue when running the nodes outside of Docker.

@ndepal
Copy link

ndepal commented Aug 6, 2021

I think I found out what was causing the issue for me. The issue was that my Docker containers had a much higher nofiles ulimit. Setting this to 1024 (which is what the Ubuntu host OS also has) seems to resolve the issue.

I found the solution thanks to ros/actionlib#93 (comment).

I'm guessing something like this is happening moby/moby#38814:

In particular, RLIMIT_NOFILE, a number of open files limit, which is set to 2^20 (aka 1048576), causes a slowdown in a number of programs, as they use the upper limit value to iterate over all potentially opened file descriptors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants