Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark is performed incorrectly while transport is enabled and the count of publisher is big (such as 200). #688

Closed
Barry-Xu-2018 opened this issue Mar 22, 2021 · 5 comments
Labels
bug Something isn't working

Comments

@Barry-Xu-2018
Copy link
Contributor

Barry-Xu-2018 commented Mar 22, 2021

Description

Benchmark is performed incorrectly while transport is enabled and the count of publisher is big (such as 200).

Expected Behavior

Benchmark is performed correctly while transport is enabled and the count of publisher is big (such as 200).

Actual Behavior

Benchmark is performed incorrectly

Get below error:

[benchmark_publishers-2] 1616403871.076306 [] benchmark_: create_thread: benchmark_publi: no free slot

And execution is terminated.

To Reproduce

  1. benchmark config file --- 'test.yaml'

    rosbag2_performance_benchmarking:
    benchmark_node:
    ros__parameters:
      benchmark:
        summary_result_file:  "results.csv"
        db_root_folder:       "rosbag2_performance_test_results"
        repeat_each:          1     # How many times to run each configurations (to average results)
        no_transport:         False  # Whether to run storage-only or end-to-end (including transport) benchmark
        preserve_bags:        False # Whether to leave bag files after experiment (and between runs). Some configurations can take lots of space!
        parameters:                 # Each combination of parameters in this section will be benchmarked
          max_cache_size:      [100000000]
          max_bag_size:         [0]
          compression:            [""]
          compression_queue_size: [1]
          compression_threads:    [0]
          storage_config_file:    [""] 
  2. producers config file --- '10k_300inst_100hz.yaml'

    rosbag2_performance_benchmarking_node:
    ros__parameters:
    publishers: # publisher_groups parameter needs to include all the subsequent groups
      publisher_groups: [ "10k_300inst" ] 
      wait_for_subscriptions: True
      10k_300inst:
        publishers_count:   300
        topic_root:         "benchmarking_10k_300inst"
        msg_size_bytes:     10000
        msg_count_each:     2000
        rate_hz:            100
  3. Perform benchmark

    $ ros2 launch rosbag2_performance_benchmarking benchmark_launch.py benchmark:=/PATH/TO/test.yaml producers:=/PATH/TO/10k_300inst_100hz.ymal
     ...
     [ros2-1] [INFO] [1616403871.061948147] [rosbag2_transport]: Subscribed to topic '/rosbag2_performance_benchmarking_node/benchmarking_10k_300inst_165'
     [ros2-1] [INFO] [1616403871.067212137] [rosbag2_transport]: Subscribed to topic '/rosbag2_performance_benchmarking_node/benchmarking_10k_300inst_133'
     [ros2-1] [INFO] [1616403871.072642593] [rosbag2_transport]: Subscribed to topic '/rosbag2_performance_benchmarking_node/benchmarking_10k_300inst_198'
     [benchmark_publishers-2] 1616403871.076306 [] benchmark_: create_thread: benchmark_publi: no free slot
     [ros2-1] [INFO] [1616403871.078466443] [rosbag2_transport]: Subscribed to topic '/rosbag2_performance_benchmarking_node/benchmarking_10k_300inst_272'
     [ros2-1] [INFO] [1616403871.083757241] [rosbag2_transport]: Subscribed to topic '/rosbag2_performance_benchmarking_node/benchmarking_10k_300inst_145'
     [ros2-1] [INFO] [1616403871.089160870] [rosbag2_transport]: Subscribed to topic '/rosbag2_performance_benchmarking_node/benchmarking_10k_300inst_134'
     ...
     [ERROR] [benchmark_publishers-2]: process has died [pid 191720, exit code -6, cmd '/home/barry-xu/Work/ROS2/ros2_latest_ws/install/rosbag2_performance_benchmarking/lib/rosbag2_performance_benchmarking/benchmark_publishers --ros-args -r __node:=rosbag2_performance_benchmarking_node --params-file /home/barry-xu/Work/ROS2/ros2_latest_ws/src/ros2/rosbag2/rosbag2_performance/rosbag2_performance_benchmarking/config/producers/10k_100inst_100hz.ymal --params-file /tmp/launch_params_gfhe2ca9 --params-file /tmp/launch_params_22ve8q52 --params-file /tmp/launch_params_5rfwiuzu --params-file /tmp/launch_params_fwupe38p --params-file /tmp/launch_params_rnmr8tel --params-file /tmp/launch_params_o2tvsutj'].
     [INFO] [launch.user]: Writer error. Shutting down benchmark.
     ...

System (please complete the following information)

  • OS: Ubuntu Bionic
  • ROS 2 Distro: Rolling
  • Version: rosbag2 b64cf74

Additional context

In my environment, If publishers_count is set to less than 119 (such as 118), the benchmark can work well.
From 119, the above error will occur.

@Barry-Xu-2018 Barry-Xu-2018 added the bug Something isn't working label Mar 22, 2021
@Barry-Xu-2018
Copy link
Contributor Author

After checking, the problem is related to cyclonedds. (Fastdds doesn't have this issue)

The error is output at here

https://github.com/eclipse-cyclonedds/cyclonedds/blob/cd2136d9321212bd52fdc613f07bbebfddd90dec/src/core/ddsi/src/q_thread.c#L258-L266

And here set maximum number of thread. And seem that this setting cannot be changed by external config file.

https://github.com/eclipse-cyclonedds/cyclonedds/blob/cd2136d9321212bd52fdc613f07bbebfddd90dec/src/core/ddsc/src/dds_init.c#L115

Now cyclonedds is default RMW. So it is better to describe this limitation for the user.

@fujitatomoya
Copy link
Contributor

@Barry-Xu-2018

this is a constraint in the same process space? if i am not mistaken...i agree that it would be nice to have this description at somewhere, but i guess it is unlikely to create more than 100 threads in the same process space especially on edge iot devices. (i think it is likely for enterprise applications for server service.)

@Barry-Xu-2018
Copy link
Contributor Author

@fujitatomoya

this is a constraint in the same process space?

Yes.

if i am not mistaken...i agree that it would be nice to have this description at somewhere, but i guess it is unlikely to create more than 100 threads in the same process space especially on edge iot devices. (i think it is likely for enterprise applications for server service.)

Yeah. Agree.
For flexibility,cyclonedds should provide a way for user to adjust this setting.

@adamdbrw
Copy link
Collaborator

adamdbrw commented Mar 23, 2021

@Barry-Xu-2018 I don't understand why you assumed that the benchmark is the source of error. You can have only 128 threads in CycloneDDS, 11 or so of which are internally used.

Yes, the benchmark runs each publisher in a separate thread, which is very uncommon in the actual use cases. We are aware of that. You can modify benchmarks to support thread pooling, a thing we planned but had no time to do. Could you modify the issue description accordingly?
One can also compile CycloneDDS with a changed thread limit (it is a hardcoded value and easy to locate in code).

@Barry-Xu-2018
Copy link
Contributor Author

@adamdbrw

I don't understand why you assumed that the benchmark is the source of error.

While finding this problem, I create this bug issue.
After further investigate, I find the cause isn't related to rosbag2. So I don't think this is a bug for rosbag2.

Could you modify the issue description accordingly?

I will close this bug issue since the cause isn't from rosbag2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants