Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

retry rtiddsgen(_server) invocation if it fails with a return code #38

Merged
merged 2 commits into from Dec 4, 2019

Conversation

dirk-thomas
Copy link
Member

When the buildfarm rebuilds the majority of the Debian packages of a ROS distro it frequently happens that packages containing messages fail to build.

The invocation of rtiddsgen_server returns with a non-zero code and the following error message (from http://build.ros2.org/view/Ebin_uB64/job/Ebin_uB64__action_msgs__ubuntu_bionic_amd64__binary/35/consoleFull):

ERROR com.rti.ndds.nddsgen.Main Fail: com.rti.ndds.nddsgen.Main$ArgumentException: File /tmp/binarydeb/ros-eloquent-action-msgs-0.8.0/obj-x86_64-linux-gnu/rosidl_generator_dds_idl/action_msgs/msg/dds_connext/GoalInfo_.idl doesn't exist. Please make sure that you have specified the correct path.

The logic already retries the invocation it finished with a rc 0 but didn't generate the desired files (

count = 1
max_count = 5
while True:
subprocess.check_call(cmd)
# fail safe if the generator does not work as expected
any_missing = False
for suffix in ['.h', '.cxx', 'Plugin.h', 'Plugin.cxx', 'Support.h', 'Support.cxx']:
filename = os.path.join(output_path, msg_name + suffix)
if not os.path.exists(filename):
any_missing = True
break
if not any_missing:
break
print("'%s' failed to generate the expected files for '%s/%s'" %
(idl_pp, pkg_name, msg_name), file=sys.stderr)
if count < max_count:
count += 1
print('Running code generator again (retry %d of %d)...' %
(count, max_count), file=sys.stderr)
continue
raise RuntimeError('failed to generate the expected files')
). But that doesn't seem to be the problem here. The command fails with a non-zero return code.

The logic already explicitly checks that the IDL file passed to the generator exists (

assert os.path.exists(idl_file), 'Could not find IDL file: ' + idl_file
) but the invocation still results in the above error message.

I can't reproduce the problem locally and newly triggered builds of the same job usually pass. So this patch is mostly an attempt to recover from this error condition and hopefully will make the binarydeb jobs pass reliably on the first attempt.

It could be a race deleting the IDL after the initial check (hence the new additional check after an invocation failed) but it is unclear why the file would be deleted. The other possibility is that the code generator fails for whatever other reason but reports a bogus error message...

Signed-off-by: Dirk Thomas <dirk-thomas@users.noreply.github.com>
@dirk-thomas dirk-thomas added bug Something isn't working in review Waiting for review (Kanban column) labels Dec 3, 2019
@dirk-thomas dirk-thomas self-assigned this Dec 3, 2019
@jacobperron
Copy link
Member

I think it would good to open a issue tracking the problem and reference it in the code.

Signed-off-by: Dirk Thomas <dirk-thomas@users.noreply.github.com>
@dirk-thomas
Copy link
Member Author

I think it would good to open a issue tracking the problem and reference it in the code.

I added HACK(...) to the new retry logic in f1ff333. I would only open a new ticket after seeing the results of the rebuild depending if they confirm or debunk the hunch.

@jacobperron
Copy link
Member

Ah, I think this is the relevant ticket: #7

Copy link
Member

@jacobperron jacobperron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see what happens I suppose.

@dirk-thomas
Copy link
Member Author

Let's see what happens I suppose.

(Assuming there is no logic bug in this patch) worst case should be a no-op when the generator passes and take extra time when it repeatedly fails.

Sanity check that it still builds on all platforms:

  • Linux Build Status
  • macOS Build Status
  • Windows Build Status

@dirk-thomas
Copy link
Member Author

@mjcarroll Can you approve this being backported and released into Eloquent?

Copy link
Member

@mjcarroll mjcarroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable, and CI looks good. (also yes to eloquent)

@dirk-thomas dirk-thomas added this to Needs Backport in Eloquent Patch Release 1 Dec 4, 2019
@dirk-thomas dirk-thomas merged commit a98335b into master Dec 4, 2019
@delete-merged-branch delete-merged-branch bot deleted the dirk-thomas/retry-failing-rtiddsgen-invocation branch December 4, 2019 16:41
@dirk-thomas
Copy link
Member Author

Fast forwarded the commit from master to eloquent, released 0.8.3, and fast forwarded the release from the eloquent branch back to master. So both branches are still in sync.

dirk-thomas added a commit that referenced this pull request Dec 4, 2019
@dirk-thomas
Copy link
Member Author

A build where the anticipated failure happened: http://build.ros2.org/view/Ebin_uB64/job/Ebin_uB64__rcl_interfaces__ubuntu_bionic_amd64__binary/42/consoleFull

But all retries failed with the same error message even though they passed the assertion that the passed IDL file does exist between each attempt 😟

dirk-thomas added a commit that referenced this pull request Dec 4, 2019
… code (#38)"

This reverts commit a98335b.

Signed-off-by: Dirk Thomas <dirk-thomas@users.noreply.github.com>
@dirk-thomas dirk-thomas removed this from Needs Backport in Eloquent Patch Release 1 Dec 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working in review Waiting for review (Kanban column)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants