Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when saving data under multiple threads #581

Closed
nedbat opened this issue May 11, 2017 · 35 comments
Closed

Race condition when saving data under multiple threads #581

nedbat opened this issue May 11, 2017 · 35 comments
Labels
bug Something isn't working

Comments

@nedbat
Copy link
Owner

nedbat commented May 11, 2017

Originally reported by Anonymous


I do not have a minimum repro for this issue yet, but we're seeing errors in our CI after upgrading coverage from 4.3.4 to 4.4b1 and 4.4.

We have some tests running using rostest. Upon upgrading from coverage 4.3.4 on our CI server, we started seeing the following errors:

Traceback (most recent call last):
  File "/tmp/tmpw8DgrF/src/app/test/integration/test_head_service.py", line 86, in <module>
    main()
  File "/tmp/tmpw8DgrF/src/app/test/integration/test_head_service.py", line 82, in main
    rostest.rosrun(PKG, NAME, TestHeadService)
  File "/opt/ros/indigo/lib/python2.7/dist-packages/rostest/__init__.py", line 148, in rosrun
    _stop_coverage([package])
  File "/opt/ros/indigo/lib/python2.7/dist-packages/rostest/__init__.py", line 227, in _stop_coverage
    _cov.stop()
  File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 899, in analysis
    f, s, _, m, mf = self.analysis2(morf)
  File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 920, in analysis2
    analysis = self._analyze(morf)
  File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 935, in _analyze
    self.get_data()
  File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 824, in get_data
    if self.collector.save_data(self.data):
  File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 382, in save_data
    covdata.add_lines(abs_file_dict(self.data))
  File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in abs_file_dict
    return dict((abs_file(k), v) for k, v in iitems(d))
  File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in <genexpr>
    return dict((abs_file(k), v) for k, v in iitems(d))

It's possible this is a rostest issue too - I haven't dug down very deeply, but we're certain that this issue does not occur when using coverage 4.3.4 - only 4.4b1 and 4.4.

The source code for the caller is found here. It looks like stop is being called from line 227 and then blowing up internally.

Will post more info once I have it.

Contact info: peter@mayfieldrobotics.com


@nedbat
Copy link
Owner Author

nedbat commented May 12, 2017

Original comment by Peter Baughman (Bitbucket: pbaughman, GitHub: pbaughman)


Original poster here. It looks like I truncated the stack trace and didn't report the actual error. The final three lines should be:

  File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in <genexpr>
    return dict((abs_file(k), v) for k, v in iitems(d))
RuntimeError: dictionary changed size during iteration

@nedbat
Copy link
Owner Author

nedbat commented May 12, 2017

Hmm, this looks interesting. I'll try to reproduce it. Do you think I'll be able to clone your repo and run it myself?

@nedbat
Copy link
Owner Author

nedbat commented May 12, 2017

Original comment by Peter Baughman (Bitbucket: pbaughman, GitHub: pbaughman)


That's not possible - currently this happens in our integration tests which requires a ton of other infrastructure to run. I'm working on a stand-alone test-case that I can share.

@nedbat
Copy link
Owner Author

nedbat commented May 12, 2017

OK, then we'll have to share clues here. This looks to me like you are measuring coverage inside of coverage.py itself. That shouldn't happen. Try changing line 198 to:

_cov = coverage.coverage(debug=["config", "sys"])

then capture what appears on stderr, and share it here.

@nedbat
Copy link
Owner Author

nedbat commented May 12, 2017

Original comment by Peter Baughman (Bitbucket: pbaughman, GitHub: pbaughman)


Here you go. I removed some internal product names from the paths, but this is otherwise un-messed-with

-- config ----------------------------------------------------
 attempted_config_files: .coveragerc
                         setup.cfg
                         tox.ini
                 branch: False
            concurrency: None
           config_files: -none-
            cover_pylib: False
              data_file: .coverage
                  debug: config
                         sys
       disable_warnings: -none-
           exclude_list: #\s*(pragma|PRAGMA)[:\s]?\s*(no|NO)\s*(cover|COVER)
              extra_css: None
             fail_under: 0
               html_dir: htmlcov
             html_title: Coverage report
          ignore_errors: False
                include: None
                   note: None
                   omit: None
               parallel: False
    partial_always_list: while (True|1|False|0):
                         if (True|1|False|0):
           partial_list: #\s*(pragma|PRAGMA)[:\s]?\s*(no|NO)\s*(branch|BRANCH)
                  paths: {}
         plugin_options: {}
                plugins: -none-
              precision: 0
           show_missing: False
           skip_covered: False
                 source: None
                  timid: False
             xml_output: coverage.xml
      xml_package_depth: 99
-- sys -------------------------------------------------------
              version: 4.4
             coverage: /usr/local/lib/python2.7/dist-packages/coverage/__init__.pyc
          cover_paths: /usr/local/lib/python2.7/dist-packages/coverage
          pylib_paths: /usr/lib/python2.7
               tracer: CTracer
 plugins.file_tracers: -none-
         config_files: .coveragerc
                       setup.cfg
                       tox.ini
         configs_read: -none-
            data_path: /root/-----/app/test/integration/.coverage
               python: 2.7.6 (default, Oct 26 2016, 20:30:19) [GCC 4.8.4]
             platform: Linux-4.4.0-72-generic-x86_64-with-Ubuntu-14.04-trusty
       implementation: CPython
           executable: /usr/bin/python
                  cwd: /root/-----/app/test/integration
                 path: /tmp/tmpgSRcfQ/src/app/src
                       /root/-----/app/test/integration
                       /tmp/tmpgSRcfQ/devel_isolated/lib/python2.7/dist-packages
                       /opt/-----/lib/python2.7/dist-packages
                       /opt/ros/indigo/lib/python2.7/dist-packages
                       /usr/lib/python2.7
                       /usr/lib/python2.7/plat-x86_64-linux-gnu
                       /usr/lib/python2.7/lib-tk
                       /usr/lib/python2.7/lib-old
                       /usr/lib/python2.7/lib-dynload
                       /usr/local/lib/python2.7/dist-packages
                       /usr/local/lib/python2.7/dist-packages/pip-9.0.1-py2.7.egg
                       /usr/lib/python2.7/dist-packages
                       /usr/lib/python2.7/dist-packages/PILcompat
                       /usr/lib/python2.7/dist-packages/gtk-2.0
                       /usr/lib/pymodules/python2.7
          environment: PYTHONPATH = /tmp/tmpgSRcfQ/devel_isolated/lib/python2.7/dist-packages:/opt/-----/lib/python2.7/dist-packages:/opt/ros/indigo/lib/python2.7/dist-packages
         command_line: /tmp/tmpgSRcfQ/src/app/test/integration/test_head_service.py --cov --gtest_output=xml:/tmp/tmpgSRcfQ/build_isolated/-----/test_results/-----/rosunit-test_head_service.xml __name:=test_he
ad_service __log:=/root/.ros/log/cc97f6b0-3738-11e7-a7d6-0242ac110002/test_head_service-14.log
         source_match: -none-
    source_pkgs_match: -none-
        include_match: -none-
           omit_match: -none-
          cover_match: /usr/local/lib/python2.7/dist-packages/coverage
          pylib_match: /usr/lib/python2.7
-- end -------------------------------------------------------

@nedbat
Copy link
Owner Author

nedbat commented May 12, 2017

Hmm, thanks. Now add "trace" to that list of options:

_cov = coverage.coverage(debug=["config", "sys", "trace"])

which will produce more output, one line per file executed.

@nedbat
Copy link
Owner Author

nedbat commented May 12, 2017

Original comment by Peter Baughman (Bitbucket: pbaughman, GitHub: pbaughman)


I removed some lines that say Tracing '/tmp/our-internal-product-stuff/foo'

Coverage is down at the bottom - it looks like it's being skipped. I do see multiprocessing in the list, though. Pure speculation, but that's one of the things I'm investigating to try to repro this in a smaller way.

EDIT I didn't include this at first - but the thing blows up right after the line

Not tracing '/usr/local/lib/python2.7/dist-packages/coverage/collector.py': is part of coverage.py at the end

I've edited the output below to include the stack trace when it blows up

Not tracing '/usr/lib/python2.7/threading.py': is in the stdlib
Tracing '/tmp/tmpKoLMLj/src/app/src/-----/__init__.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/registration.py'
Not tracing '/usr/lib/python2.7/unittest/loader.py': is in the stdlib
Not tracing '/usr/lib/python2.7/functools.py': is in the stdlib
Not tracing '/usr/lib/python2.7/unittest/case.py': is in the stdlib
Not tracing '/usr/lib/python2.7/unittest/suite.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/masterslave.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosunit/core.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/names.py'
Not tracing '/usr/lib/python2.7/posixpath.py': is in the stdlib
Not tracing '/usr/lib/python2.7/genericpath.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/topics.py'
Not tracing '/usr/lib/python2.7/stat.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosunit/xmlrunner.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/tcpros_pubsub.py'
Not tracing '/usr/lib/python2.7/logging/__init__.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/core.py'
Not tracing '/usr/lib/python2.7/urlparse.py': is in the stdlib
Not tracing '<string>': not a real file name
Not tracing '/usr/lib/python2.7/xmlrpclib.py': is in the stdlib
Not tracing '/usr/lib/python2.7/urllib.py': is in the stdlib
Not tracing '/usr/lib/python2.7/string.py': is in the stdlib
Not tracing '/usr/lib/python2.7/httplib.py': is in the stdlib
Not tracing '/usr/lib/python2.7/socket.py': is in the stdlib
Not tracing '/usr/lib/python2.7/unittest/result.py': is in the stdlib
Tracing '/tmp/tmpKoLMLj/src/app/test/integration/test_joint_states_service.py'
Tracing '/root/-----/app/test/integration/test_base.py'
Not tracing '/usr/lib/python2.7/Queue.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph/names.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/statistics.py'
Not tracing '/usr/lib/python2.7/mimetools.py': is in the stdlib
Not tracing '/usr/lib/python2.7/rfc822.py': is in the stdlib
Not tracing '/build/python2.7-t6d6Zr/python2.7-2.7.6/Modules/pyexpat.c': pyexpat lies about itself
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/transport.py'
Not tracing '/usr/lib/python2.7/multiprocessing/process.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/client.py'
Not tracing '/usr/lib/python2.7/logging/handlers.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/msproxy.py'
Not tracing '/usr/lib/python2.7/UserDict.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph/network.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph/rosenv.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/tf2_ros/buffer.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/tf2_ros/buffer_interface.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph/masterapi.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/msg.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph_msgs/msg/_Clock.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/genpy/rostime.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/simtime.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/service.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/rostime.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/tf2_ros/transform_listener.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/actionlib/action_client.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/control_msgs/msg/_FollowJointTrajectoryAction.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/control_msgs/msg/_FollowJointTrajectoryActionGoal.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/std_msgs/msg/_Header.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/actionlib_msgs/msg/_GoalID.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/control_msgs/msg/_FollowJointTrajectoryGoal.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/trajectory_msgs/msg/_JointTrajectory.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/control_msgs/msg/_FollowJointTrajectoryActionResult.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/actionlib_msgs/msg/_GoalStatus.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/control_msgs/msg/_FollowJointTrajectoryResult.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/control_msgs/msg/_FollowJointTrajectoryActionFeedback.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/control_msgs/msg/_FollowJointTrajectoryFeedback.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/trajectory_msgs/msg/_JointTrajectoryPoint.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/tf2_msgs/msg/_TFMessage.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_TransformStamped.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Transform.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Vector3.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Quaternion.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/sensor_msgs/msg/_JointState.py'
Not tracing '/usr/lib/python2.7/abc.py': is in the stdlib
Not tracing '/usr/lib/python2.7/_weakrefset.py': is in the stdlib
Not tracing '/usr/lib/python2.7/_abcoll.py': is in the stdlib
Not tracing '/usr/lib/python2.7/SocketServer.py': is in the stdlib
Not tracing '/usr/lib/python2.7/BaseHTTPServer.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Pose.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Point.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Pose2D.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/cv_bridge/core.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/sensor_msgs/msg/_CameraInfo.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/sensor_msgs/msg/_RegionOfInterest.py'
Tracing '/tmp/tmpKoLMLj/src/app/src/app_bringup/params.py'
Tracing '/usr/local/lib/python2.7/dist-packages/yaml/__init__.py'
Tracing '/usr/local/lib/python2.7/dist-packages/yaml/loader.py'
Tracing '/usr/local/lib/python2.7/dist-packages/yaml/reader.py'
Tracing '/usr/local/lib/python2.7/dist-packages/yaml/scanner.py'
Tracing '/usr/local/lib/python2.7/dist-packages/yaml/error.py'
Tracing '/usr/local/lib/python2.7/dist-packages/yaml/tokens.py'
Tracing '/usr/local/lib/python2.7/dist-packages/yaml/parser.py'
Tracing '/usr/local/lib/python2.7/dist-packages/yaml/composer.py'
Tracing '/usr/local/lib/python2.7/dist-packages/yaml/constructor.py'
Tracing '/usr/local/lib/python2.7/dist-packages/yaml/resolver.py'
Tracing '/usr/local/lib/python2.7/dist-packages/yaml/events.py'
Tracing '/usr/local/lib/python2.7/dist-packages/yaml/nodes.py'
Not tracing '/usr/lib/python2.7/SimpleXMLRPCServer.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/validators.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph/roslogging.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/rosout.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph_msgs/msg/_Log.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/genpy/message.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslib/packages.py'
Not tracing '/usr/lib/python2.7/dist-packages/rospkg/rospack.py': is in the stdlib
Not tracing '/usr/lib/python2.7/dist-packages/rospkg/environment.py': is in the stdlib
Not tracing '/usr/lib/python2.7/os.py': is in the stdlib
Not tracing '/usr/lib/python2.7/xml/etree/ElementTree.py': is in the stdlib
Not tracing '/usr/lib/python2.7/xml/etree/ElementPath.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/catkin/find_in_workspaces.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/catkin/workspace.py'
Not tracing '/usr/lib/python2.7/dist-packages/catkin_pkg/packages.py': is in the stdlib
Not tracing '/usr/lib/python2.7/dist-packages/catkin_pkg/package.py': is in the stdlib
Not tracing '/usr/lib/python2.7/xml/dom/minidom.py': is in the stdlib
Not tracing '/usr/lib/python2.7/xml/dom/expatbuilder.py': is in the stdlib
Not tracing '/usr/lib/python2.7/StringIO.py': is in the stdlib
Not tracing '/usr/lib/python2.7/copy.py': is in the stdlib
Not tracing '/usr/lib/python2.7/copy_reg.py': is in the stdlib
Not tracing '/usr/lib/python2.7/re.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/rlutil.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/parent.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/config.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/core.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/xmlloader.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/loader.py'
Not tracing '/usr/lib/python2.7/shlex.py': is in the stdlib
Not tracing '/usr/lib/python2.7/subprocess.py': is in the stdlib
Not tracing '/usr/lib/python2.7/platform.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/pmon.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/server.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph/xmlrpc.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/launch.py'
Not tracing '/usr/lib/python2.7/gzip.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/nodeprocess.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/node_args.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/substitution_args.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslib/__init__.py'
Not tracing '/usr/lib/python2.7/atexit.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/std_msgs/msg/_String.py'
Not tracing '/usr/lib/python2.7/shutil.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/std_msgs/msg/_Empty.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Twist.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/actionlib/simple_action_server.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/actionlib/server_goal_handle.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/actionlib/action_server.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/actionlib/goal_id_generator.py'
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/actionlib_msgs/msg/_GoalStatusArray.py'
Not tracing '/usr/local/lib/python2.7/dist-packages/coverage/control.py': is part of coverage.py
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/timer.py'
Not tracing '/usr/lib/python2.7/traceback.py': is in the stdlib
Not tracing '/usr/lib/python2.7/linecache.py': is in the stdlib
Not tracing '/usr/lib/python2.7/unittest/util.py': is in the stdlib
Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rostest/__init__.py'
Not tracing '/usr/local/lib/python2.7/dist-packages/coverage/collector.py': is part of coverage.py
Traceback (most recent call last):
  File "/tmp/tmpE4nq9m/src/app/test/integration/test_bagfile_recorder.py", line 83, in <module>
    main()
  File "/tmp/tmpE4nq9m/src/app/test/integration/test_bagfile_recorder.py", line 79, in main
    rostest.rosrun(PKG, NAME, TestBagfileRecorder)
  File "/opt/ros/indigo/lib/python2.7/dist-packages/rostest/__init__.py", line 148, in rosrun
    _stop_coverage([package])
  File "/opt/ros/indigo/lib/python2.7/dist-packages/rostest/__init__.py", line 228, in _stop_coverage
    _cov.stop()
  File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 919, in analysis
    f, s, _, m, mf = self.analysis2(morf)
  File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 940, in analysis2
    analysis = self._analyze(morf)
  File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 955, in _analyze
    self.get_data()
  File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 824, in get_data
    if self.collector.save_data(self.data):
  File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 382, in save_data
    covdata.add_lines(abs_file_dict(self.data))
  File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in abs_file_dict
    return dict((abs_file(k), v) for k, v in iitems(d))
  File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in <genexpr>
    return dict((abs_file(k), v) for k, v in iitems(d))
RuntimeError: dictionary changed size during iteration

@nedbat
Copy link
Owner Author

nedbat commented May 12, 2017

Original comment by Peter Baughman (Bitbucket: pbaughman, GitHub: pbaughman)


Also tracking this issue on the rostest side over here: ros/ros_comm#1055 but, but that side appears a little bit less responsive

@nedbat
Copy link
Owner Author

nedbat commented May 12, 2017

Original comment by Peter Baughman (Bitbucket: pbaughman, GitHub: pbaughman)


I'm taking my repro-case and cutting it down binary search style. I'm noticing that as I remove more stuff, the likelyhood that coverage blows up is going down. With everything in there, I can get it to blow up every time. With about half of the stuff removed, it's now to the point where it'll blow up about 1 in 3 times.

@nedbat
Copy link
Owner Author

nedbat commented May 13, 2017

Now I'm thinking of something drastic, like copying the dict just before that line, then catching RuntimeError and checking the difference between the original dict and the current dict.

@nedbat
Copy link
Owner Author

nedbat commented May 15, 2017

Original comment by Peter Baughman (Bitbucket: pbaughman, GitHub: pbaughman)


Ok, I modified save_data to be the following:

    def save_data(self, covdata):
        """Save the collected data to a `CoverageData`.

        Returns True if there was data to save, False if not.
        """
        if not self._activity():
            return False

        def abs_file_dict(d):
            """Return a dict like d, but with keys modified by `abs_file`."""
            copy_dict = d.copy()
            try:
                return dict((abs_file(k), v) for k, v in iitems(d))
            except RuntimeError:
                new_dict = d.copy()
                added = set(new_dict.keys()) - set(copy_dict.keys())
                removed = set(copy_dict.keys()) - set(new_dict.keys())
                print >> sys.stderr, "**** HERE COMES THE INFO  ****"
                print >> sys.stderr, "Things that were added: {}".format(added)
                print >> sys.stderr, "Things that were removed: {}".format(removed)
                print >> sys.stderr, "**** END OF THE INFO ****"
                raise

And here's what I see (with a little bit of manual formatting added):

Things that were added: set([
    u'/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Vector3.py',
    u'/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Transform.py',
    u'/opt/ros/indigo/lib/python2.7/dist-packages/tf2_ros/transform_listener.py',
    u'/opt/ros/indigo/lib/python2.7/dist-packages/tf2_msgs/msg/_TFMessage.py',
    u'/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_TransformStamped.py',
    u'/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Quaternion.py'])

Things that were removed: set([])

These all seem like legit files that might be getting coverage generated for them. There's also approx 22 python threads hanging around, most of them waiting to publish or receive something on a socket.

@nedbat
Copy link
Owner Author

nedbat commented May 15, 2017

Original comment by Peter Baughman (Bitbucket: pbaughman, GitHub: pbaughman)


I have a sort of stand-alone repro case that I need to clean up a little bit, but here's the gist:

  1. Start coverage
  2. Spin up a bunch of python threads (20, in my case). Each thread imports a module, and then imports a class from that module
  3. stop coverage
  4. save coverage
  5. Observe the "dictionary changed size during iteration" exception that comes out of collector.py lin 377.

It's not 100% standalone yet, because I'm still using ROS to get a list of modules and classes to import. I have a really long list of ROS objects of the form:
["module name", "class name"]

for example:

importlist = [
["std_msgs.msg", "Bool",],
["std_msgs.msg", "Byte",],
["std_msgs.msg", "ByteMultiArray",],
["std_msgs.msg", "Char",],
["std_msgs.msg", "ColorRGBA",],
["std_msgs.msg", "Duration",],
["std_msgs.msg", "Empty",],
["std_msgs.msg", "Float32",],
["std_msgs.msg", "Float32MultiArray",],
["std_msgs.msg", "Float64",],
["std_msgs.msg", "Float64MultiArray",],
]

Then my threads all run the following method:

def random_load():
    global _run
    global importlist
    while _run:
        candidate = random.choice(importlist)
        mod = importlib.import_module(candidate[0])
        cls = getattr(mod, candidate[1])

I don't think this is ROS specific - but using the ROS messages was a convenient way to quickly generate a very large list of modules and class names to import.

I think I can give you instructions to fully repro the issue using a docker container that has the ROS stuff in it. Again, I don't believe that ROS is necessary - it's just a convenient way for me to generate a large list of things to import.

Here is the program I'm using right now to reproduce the problem without the importlist:

_cov = None
_run = True

def _start_coverage():
    global _cov
    import coverage
    _cov = coverage.coverage()
    _cov.start()

def _stop_coverage():
    _cov.stop()
    print "YOU WILL NEVER GET HERE"
    # Mega strange.  When the program blows up, it reports that _cov.stop() blows up.
    # The thing is, I see "YOU WILL NEVER GET HERE" printed to the console.  Also the
    # program doesn't blow up without _cov.save()
    _cov.save()

def random_load():
    global _run
    global importlist
    while _run:
        candidate = random.choice(importlist)
        mod = importlib.import_module(candidate[0])
        # cls = getattr(mod, candidate[1]) # Not needed for repro

_start_coverage()
for n in range(20):
    tr = threading.Thread(target=random_load)
    tr.start()
time.sleep(0.1)
_stop_coverage()
_run = False

Here's the stack trace that I see:

peter@docker-dev:~/cov_repro$ python repro.py 
YOU WILL NEVER GET HERE
Traceback (most recent call last):
  File "repro.py", line 440, in <module>
    _stop_coverage()
  File "repro.py", line 422, in _stop_coverage
    _cov.stop()
  File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 772, in save
    self.get_data()
  File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 824, in get_data
    if self.collector.save_data(self.data):
  File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 382, in save_data
    covdata.add_lines(abs_file_dict(self.data))
  File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in abs_file_dict
    return dict((abs_file(k), v) for k, v in iitems(d))
  File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in <genexpr>
    return dict((abs_file(k), v) for k, v in iitems(d))
RuntimeError: dictionary changed size during iteration

@nedbat
Copy link
Owner Author

nedbat commented May 15, 2017

OK, I had wondered about threads. How are you starting them? Do you start them before or after coverage is started?

@nedbat
Copy link
Owner Author

nedbat commented May 15, 2017

Original comment by Peter Baughman (Bitbucket: pbaughman, GitHub: pbaughman)


I have edited the above comment with a more complete repro, although not 100% stand-alone yet. Threads are being started after coverage is started.

@nedbat
Copy link
Owner Author

nedbat commented May 15, 2017

Original comment by Peter Baughman (Bitbucket: pbaughman, GitHub: pbaughman)


A python program that reproduces the issue when run in a docker container. See the comments below for instructions on how to reproduce

@nedbat
Copy link
Owner Author

nedbat commented May 15, 2017

Original comment by Peter Baughman (Bitbucket: pbaughman, GitHub: pbaughman)


repro.py is attached to this issue.
Steps to reproduce using a docker container are below. The ros-indigo-ros-base docker image is used because it contains already all of the classes that are imported to repro the issue. I believe it is possible to reproduce this issue without ROS, too, but do not have a good way of generating a big list of imports without ROS being installed:

Make the directory ~/cov_repro and put the file repro.py in it
> docker run -it --rm -v ~/cov_repro:/root/cov_repro ros:indigo-ros-base /bin/bash
----- You are now inside a docker container -----
> easy_install pip
> pip install coverage==4.4
> cd /root/cov_repro
> python repro.py

Observe the following output:

YOU WILL NEVER GET HERE
Traceback (most recent call last):
  File "repro.py", line 201, in <module>
    _stop_coverage()
  File "repro.py", line 182, in _stop_coverage
    _cov.stop()
  File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 772, in save
    self.get_data()
  File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 824, in get_data
    if self.collector.save_data(self.data):
  File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 382, in save_data
    covdata.add_lines(abs_file_dict(self.data))
  File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in abs_file_dict
    return dict((abs_file(k), v) for k, v in iitems(d))
  File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in <genexpr>
    return dict((abs_file(k), v) for k, v in iitems(d))
RuntimeError: dictionary changed size during iteration

Notes:

  1. You will probably have to terminate the program with ctrl+\ because the threads won't be shut down.
  2. Commenting out the line _cov.save() will prevent the problem from happening, despite the stack trace saying that the issue comes from inside _cov.stop

@nedbat
Copy link
Owner Author

nedbat commented May 16, 2017

@pbaughman Thanks so much for the test case, I am able to reproduce it. And it's nothing to do with your code. I replaced your imports with some synthetic ones, and could still cause the failure. This program does it without needing Docker, etc:

import importlib
import random
import threading
import time

_cov = None
_run = True

for i in xrange(1000):
    with open("m{:04d}.py".format(i), "w") as f:
        f.write("a = 1\n")

def _start_coverage():
    global _cov
    import coverage
    _cov = coverage.coverage()
    _cov.start()

def _stop_coverage():
    _cov.stop()
    print "YOU WILL NEVER GET HERE!!"
    # Mega strange.  When the program blows up, it reports that _cov.stop() blows up.
    # The thing is, I see "YOU WILL NEVER GET HERE" printed to the console.  Also the
    # program doesn't blow up without _cov.save()
    _cov.save()

def random_load():
    global _run
    while _run:
        candidate = "m{:04d}".format(random.randint(0, 999))
        mod = importlib.import_module(candidate)

_start_coverage()
for n in range(20):
    threading.Thread(target=random_load).start()
time.sleep(0.1)
_stop_coverage()
_run = False

Now to work on fixing the problem.... that might be harder...

@nedbat
Copy link
Owner Author

nedbat commented May 16, 2017

Original comment by Peter Baughman (Bitbucket: pbaughman, GitHub: pbaughman)


A somewhat cheesy solution that does not address a possible underlying logic issue might be to make a copy of the data in collector.py save_data like:

        dat = self.data.copy()
        if self.branch:
            covdata.add_arcs(abs_file_dict(dat))
        else:
            covdata.add_lines(abs_file_dict(dat))
        covdata.add_file_tracers(abs_file_dict(dat))

I didn't deep-dive, but it seems strange that things are still being added to the dictionary after stop is called

@nedbat
Copy link
Owner Author

nedbat commented Jun 26, 2017

Original comment by Peter Baughman (Bitbucket: pbaughman, GitHub: pbaughman)


@nedbat Should I submit a pull request w/ the cheesy fix? We're getting burned by a different issue now that looks like it was fixed in 4.4, but we can't update to 4.4 because of this problem.

@nedbat
Copy link
Owner Author

nedbat commented Jun 27, 2017

@pbaughman I did some work to investigate fixes, but got sidetracked. Copying the data dictionary makes the problem less likely, but does not fix it robustly. I'm sorry to say this, but for the time being, you should run with your fix from a fork of coverage.py.

@nedbat
Copy link
Owner Author

nedbat commented Aug 3, 2017

Original comment by Olivier Grisel (Bitbucket: ogrisel, GitHub: ogrisel)


I can reproduce a similar issue on the CI infrastructure of scikit-learn (but did no observe this on my local machine):

https://s3.amazonaws.com/archive.travis-ci.org/jobs/260611942/log.txt?X-Amz-Expires=30&X-Amz-Date=20170803T154403Z&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJRYRXRSVGNKPKO5A/20170803/us-east-1/s3/aws4_request&X-Amz-SignedHeaders=host&X-Amz-Signature=094dbeee4fd45df1666ecf7df10abf707743864ba1b69821b42cd93b69e94e29

Traceback (most recent call last):
  File "/home/travis/miniconda/envs/testenv/bin/nosetests", line 6, in <module>
    sys.exit(nose.run_exit())
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/core.py", line 121, in __init__
    **extra_args)
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/unittest/main.py", line 95, in __init__
    self.runTests()
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/core.py", line 207, in runTests
    result = self.testRunner.run(self.test)
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/core.py", line 66, in run
    result.printErrors()
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/result.py", line 110, in printErrors
    self.config.plugins.report(self.stream)
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/plugins/manager.py", line 99, in __call__
    return self.call(*arg, **kw)
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/plugins/manager.py", line 167, in simple
    result = meth(*arg, **kw)
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/plugins/cover.py", line 183, in report
    self.coverInstance.stop()
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/coverage/control.py", line 1086, in html_report
    return reporter.report(morfs)
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/coverage/html.py", line 139, in report
    self.report_files(self.html_file, morfs, self.config.html_dir)
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/coverage/report.py", line 91, in report_files
    report_fn(fr, self.coverage._analyze(fr))
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/coverage/control.py", line 955, in _analyze
    self.get_data()
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/coverage/control.py", line 824, in get_data
    if self.collector.save_data(self.data):
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/coverage/collector.py", line 382, in save_data
    covdata.add_lines(abs_file_dict(self.data))
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/coverage/collector.py", line 377, in abs_file_dict
    return dict((abs_file(k), v) for k, v in iitems(d))
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/coverage/collector.py", line 377, in <genexpr>
    return dict((abs_file(k), v) for k, v in iitems(d))
RuntimeError: dictionary changed size during iteration

@nedbat
Copy link
Owner Author

nedbat commented Aug 3, 2017

Original comment by Olivier Grisel (Bitbucket: ogrisel, GitHub: ogrisel)


Note that this was observed on a PR that makes use of loky's implementation of ProcessPoolExecutor that uses several long running threads under the hood.

@nedbat
Copy link
Owner Author

nedbat commented Aug 8, 2017

Original comment by Olivier Grisel (Bitbucket: ogrisel, GitHub: ogrisel)


I updated the minimal repro script to work in Python 3 with the print function and could therefore enable flush=True and I can see the print function output on stdout:

from __future__ import print_function
import importlib
import random
import threading
import time

_cov = None
_run = True

for i in range(1000):
    with open("m{:04d}.py".format(i), "w") as f:
        f.write("a = 1\n")

def _start_coverage():
    global _cov
    import coverage
    _cov = coverage.coverage()
    _cov.start()

def _stop_coverage():
    _cov.stop()
    print("YOU WILL NEVER GET HERE!!", flush=True)
    # Mega strange.  When the program blows up, it reports that _cov.stop() blows up.
    # The thing is, I see "YOU WILL NEVER GET HERE" printed to the console.  Also the
    # program doesn't blow up without _cov.save()
    _cov.save()

def random_load():
    global _run
    while _run:
        candidate = "m{:04d}".format(random.randint(0, 999))
        mod = importlib.import_module(candidate)

_start_coverage()
for n in range(20):
    threading.Thread(target=random_load).start()
time.sleep(0.1)
_stop_coverage()
_run = False

I get the following output:

YOU WILL NEVER GET HERE!!
Traceback (most recent call last):
  File "coverage_bug.py", line 38, in <module>
    _stop_coverage()
  File "coverage_bug.py", line 21, in _stop_coverage
    _cov.stop()
  File "/volatile/ogrisel/code/coveragepy/coverage/control.py", line 772, in save
    self.get_data()
  File "/volatile/ogrisel/code/coveragepy/coverage/control.py", line 824, in get_data
    if self.collector.save_data(self.data):
  File "/volatile/ogrisel/code/coveragepy/coverage/collector.py", line 382, in save_data
    covdata.add_lines(abs_file_dict(self.data))
  File "/volatile/ogrisel/code/coveragepy/coverage/collector.py", line 377, in abs_file_dict
    return dict((abs_file(k), v) for k, v in iitems(d))
  File "/volatile/ogrisel/code/coveragepy/coverage/collector.py", line 377, in <genexpr>
    return dict((abs_file(k), v) for k, v in iitems(d))
RuntimeError: dictionary changed size during iteration

and then the program is waiting indefinitely (since the _run flag is never set to False).

@nedbat
Copy link
Owner Author

nedbat commented Aug 8, 2017

Original comment by Olivier Grisel (Bitbucket: ogrisel, GitHub: ogrisel)


The following patch fixes the bug for me:

diff -r [12d026f6db21 (bb)](https://bitbucket.org/ned/coveragepy/commits/12d026f6db21) coverage/collector.py
--- a/coverage/collector.py	Sat Jul 08 11:03:31 2017 -0400
+++ b/coverage/collector.py	Tue Aug 08 16:25:10 2017 +0200
@@ -374,7 +374,7 @@
 
         def abs_file_dict(d):
             """Return a dict like d, but with keys modified by `abs_file`."""
-            return dict((abs_file(k), v) for k, v in iitems(d))
+            return dict((abs_file(k), v) for k, v in list(iitems(d)))
 
         if self.branch:
             covdata.add_arcs(abs_file_dict(self.data))

This makes the iteration thread-safe. But maybe writing into covdata is not thread safe?

@nedbat
Copy link
Owner Author

nedbat commented Aug 8, 2017

Original comment by Olivier Grisel (Bitbucket: ogrisel, GitHub: ogrisel)


Apparently there is only one thread (the main thread) calling into save_data while the other thread is concurrently mutating the self.data dict. I therefore thing the proposed fixed is correct.

@nedbat
Copy link
Owner Author

nedbat commented Aug 8, 2017

I will have to get back to this soon. I am not just adding a list() call for a few reasons: first, it doesn't solve the race condition, it just narrows the window from the time to execute the entire line to the time to execute the list(). Adding a lock around this line might be a solution, but I think there's also technically a race condition in the trace function where we add keys to the data dictionary in the first place. That is an even narrower window for problems, but at least at an academic level, is still a problem. Locking on every line recorded doesn't seem feasible.

At the very least, I'd like to understand the scope of the problem before adding this simple pragmatic 99% solution.

@nedbat
Copy link
Owner Author

nedbat commented Aug 9, 2017

Original comment by Olivier Grisel (Bitbucket: ogrisel, GitHub: ogrisel)


Alright. Just a remark: list is a builtin type in Python so I believe that the GIL will protect against the dictionary iteration concurrency issue although I am not 100% sure.

@nedbat
Copy link
Owner Author

nedbat commented Aug 9, 2017

Original comment by Olivier Grisel (Bitbucket: ogrisel, GitHub: ogrisel)


I think the fundamental problem comes from the fact that threading.settrace can only be used to enable tracing on future threads and cannot be used to enable or disable tracing on running threads. Actually I am reading the source code of the PyTracer class and the stopped flag should be able to deal with this constraint.

@nedbat
Copy link
Owner Author

nedbat commented Aug 9, 2017

The thread-safety of list() is something to consider. I don't understand your point about past vs future threads? I thought the problem here was one thread recording data into the shared dictionary while another thread was trying to save the data?

@nedbat
Copy link
Owner Author

nedbat commented Aug 9, 2017

Original comment by Olivier Grisel (Bitbucket: ogrisel, GitHub: ogrisel)


The problem is that the main thread is trying to call save_data after stopping the coverage collection but the thread still running in the background is appending data into the dict concurrently.

However I just noticed something inconsistent in the traceback I reported in: https://bitbucket.org/ned/coveragepy/issues/581/44b1-44-breaking-in-ci#comment-38963003

The traceback is about an exception in the main thread in the _cov.stop() call (in the _stop_coverage() function) while the print statement has already printed the "YOU WILL NEVER GET HERE!!" message which should only happen after the call to _cov.stop() based on the source code of the script...

@nedbat
Copy link
Owner Author

nedbat commented Aug 10, 2017

Original comment by Olivier Grisel (Bitbucket: ogrisel, GitHub: ogrisel)


I opened a PR for this issue: https://bitbucket.org/ned/coveragepy/pull-requests/127/fix-thread-safe-collectorsave_data

@nedbat
Copy link
Owner Author

nedbat commented Oct 21, 2017

Fixed in the commits leading up to 6ba2af43cfdc (bb)

@nedbat
Copy link
Owner Author

nedbat commented Oct 21, 2017

BTW, @ogrisel @pbaughman I added you both to the CONTRIBUTORS file. Thanks!

@nedbat
Copy link
Owner Author

nedbat commented Nov 5, 2017

I up the concurrency protection in d781dd6eb3d1dc05d99dc04ecf37000464484446 (bb). With the previous list() call, I still got a "dict changed size" error on a CI server once.

This has shipped in version 4.4.2

@nedbat
Copy link
Owner Author

nedbat commented Nov 6, 2017

Original comment by Olivier Grisel (Bitbucket: ogrisel, GitHub: ogrisel)


Thanks!

@nedbat nedbat closed this as completed Nov 6, 2017
@nedbat nedbat added major bug Something isn't working labels Jun 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant