Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End-to-end test coverage for CLI commands output #304

Merged
merged 11 commits into from
Oct 22, 2019

Conversation

hidmic
Copy link
Contributor

@hidmic hidmic commented Jul 19, 2019

This pull request aims to add full end-to-end test coverage to all several ros2 CLI commands:

topic service action param msg srv interface node run component lifecycle pkg daemon launch
Correct output
Graceful termination
Error handling

Note: this table will be updated as progress is made. Test coverage for commands that have been struck through i.e. param, component, lifecycle, etc., will not be provided here but in follow-up PRs. These commands have side effects beyond their execution lifetime that have to be checked in ways that are specific to each command. Thus, these don't lend themselves well to be integrated with the "standardized" testing device used for other commands.

Connected to ros2/ros2#739.

@hidmic hidmic added this to In progress in Eloquent via automation Jul 19, 2019
@hidmic
Copy link
Contributor Author

hidmic commented Jul 19, 2019

There's still the discussion as to whether this is enough for end-to-end testing (let alone unit tests which are not covered by this PR) or even the behavior currently shown is appropriate. We'll have to iterate on it as we move forward.

@hidmic
Copy link
Contributor Author

hidmic commented Jul 19, 2019

Running CI to uncover non-Linux, platform specific issues (build up to test_ros2cli, test test_ros2cli, including opensplice):

  • Linux Build Status (Linux-only first to save CI time)
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@hidmic hidmic changed the title [WIP] End-to-end test coverage for CLI commands End-to-end test coverage for CLI commands output Jul 26, 2019
@hidmic
Copy link
Contributor Author

hidmic commented Jul 26, 2019

Ok, I'll stop here for the reasons exposed in this PR's description and because this is getting larger than I'm usually comfortable with.

Copy link
Member

@dirk-thomas dirk-thomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read the full patch yet - just a few high level comments:

  • The size of the tests makes me wonder if we really can't write them more compact? Is there anything we could do API / feature wise which would help reduce this?
  • What is the reason that all the tests are bundled in the separate package test_ros2cli? What is preventing them to be implemented in the package which command/verb they are testing?

ros2node/ros2node/verb/list.py Outdated Show resolved Hide resolved
@@ -12,11 +12,21 @@
<test_depend>action_tutorials</test_depend>
<test_depend>ament_lint_auto</test_depend>
<test_depend>ament_lint_common</test_depend>
<test_depend>demo_nodes_py</test_depend>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will create a repository level cycle since packages in the repository containing demo_nodes_py depend on packages in this repo. Imo we shouldn't do that.

Copy link
Contributor Author

@hidmic hidmic Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that is true for ros2run. It's another instance of an issue we have elsewhere e.g. launch_ros, and that I was consciously eluding to avoid scope creep. IMO we shouldn't be using demo nodes for system or integration testing, nor duplicate ros2/system_tests talkers, listeners, servers, etc., and others. We need a package with test nodes that ros2/ros2cli, ros2/launch*, ros2/system_tests, can all depend on. Probably in its own repo, like we did for test_interface_files.

I can put this on hold, do that and come back, or fix it afterwards. What would you rather do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo a package/repo like test_nodes sounds like a good idea.

if("${TEST_RMW_IMPLEMENTATION}" STREQUAL "rmw_connext_cpp")
# Connext startup is too slow. It needs a few extra seconds till
# discovery completes.
set(TEST_SETUP_DELAY 5.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks problematic because it:

  • is fragile in case Connext might just take a second longer
  • is slowing down the tests in case Connext happens to be faster than the threshold
  • needs to be done for potentially other RMW impl

Instead we should find a way to write the test to work without this delay.

Copy link
Contributor Author

@hidmic hidmic Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is problematic, I know. Truth to be told, all these tests have a delay to deal with CPU load fluctuations, uneven discovery times, etc., which makes them all flakey. I had to remove publishing/echoing rate tests because it's completely inconsistent, time wise.

Now, to fix this. Many tests don't need this delay at all e.g. ros2 msg list, so we can cut down time there. For those that do need it, I see two kinds: those that are bound to communication latency and those that are bound to discovery time.

For the first kind, we can wait for specific output or events e.g. for ros2 topic bw wait till we have any (non filtered) output to start testing it.

For the second kind e.g. ros2 topic list, well, it gets harder. Assuming the daemon is there, we may repeatedly execute the command, though that isn't really less fragile than a delay (and I can't figure how to pull that off using ros_testing as it is today). We may also wait for the ROS graph to show some nodes, but since the CLI command, the CLI daemon and launch test are three separate processes, it may not work as intended.

Alternatively, I think that we could make our CLI more predictable. Just like ros2 service call awaits for the server to be available, ros2 service list could have an optional ROS graph condition to wait for e.g. ros2 service list --when "'my_server' in nodes". The same goes for all discovery time bound commands. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the daemon is there, we may repeatedly execute the command, though that isn't really less fragile than a delay.

Why would that not be less fragile? It will pass the test as soon as the command returns the desired output and only runs to a maximum timeout if the command doesn't do so.

Just like ros2 service call awaits for the server to be available, ros2 service list could have an optional ROS graph condition to wait for e.g. ros2 service list --when "'my_server' in nodes".

This is a good example how not to add conditions imo. If you want to wait for X (services of a node being available) you can't use a condition Y (check if a node has been discovered). Simply because there is a still a race since the node might become available first shortly before the services are actually advertised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would that not be less fragile? It will pass the test as soon as the command returns the desired output and only runs to a maximum timeout if the command doesn't do so.

I meant neither option makes it not fragile. You may not delay enough or you may not wait enough. Sure, we can increase the timeout a lot to increase the likelihood of discovery to find all peers for free, but that doesn't make it not fragile. If CI nodes are running with enough load, tests may still flake.

Anyways, even if we go down that route, we lack support in launch testing to do it so we have to go back to it first.

Simply because there is a still a race since the node might become available first shortly before the services are actually advertised.

Yes, it doesn't make it not fragile either. I should've said that. I'm not well versed enough on DDS discovery mechanisms, is it atomic in any aspect e.g. some information chunks that are guaranteed to arrive in single block?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can increase the timeout a lot to increase the likelihood of discovery to find all peers for free, but that doesn't make it not fragile. If CI nodes are running with enough load, tests may still flake.

I don't think that is the definition of flaky. The fact that we have to define an upper bound how long we are willing to wait is fine.

test_ros2cli/CMakeLists.txt Outdated Show resolved Hide resolved
@hidmic
Copy link
Contributor Author

hidmic commented Jul 29, 2019

The size of the tests makes me wonder if we really can't write them more compact?

What do you mean by size? These tests leverage the commonalities in the CLI commands to have single test_cli.py.in instance per command, parameterized N times. Thus the logic in test_cli.py.in. We could find other ways to reuse and instead have N tests. That'll yield shorter tests but the sheer size of this PR will increase due to boilerplate code duplication. Is that what you had in mind @dirk-thomas ?

What is the reason that all the tests are bundled in the separate package test_ros2cli?

There's currently no way to setup launch tests in pure Python packages (see ros2/launch#237).

OpaqueFunction(function=lambda context: ready_fn())
)
launch_description = LaunchDescription([
# Always restart daemon to isolate tests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of doing this it would be simpler to add an option --no-daemon to all the commands to not start the daemon on demand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that is a good idea, but won't it result in test instances affecting each other through daemon state?

add_custom_cli_test(
test_ros2interface
CONFIG_MODULE ros2interface_test_configuration
TIMEOUT 240)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sum of all timeouts is 31 min. Is that really necessary?

Copy link
Contributor Author

@hidmic hidmic Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's insane. I was going to bring it up in tomorrow's meeting. And yes, it used to be half of it and tests would timeout in CI. Cutting down delays we may be able to reduce it, but I wouldn't expect a large reduction. Connext is very slow.

@dirk-thomas
Copy link
Member

What do you mean by size?

I was wondering why are TestCommandFinishesInAFiniteAmountOfTime as well as TestCommandOutput necessary at all? I would expect that to be provided by launch_testing. Otherwise each package implementing test would need to implement that logic.

The length of each actual test file is also super long and verbose. I don't see an obvious way to reduce that though.

There's currently no way to setup launch tests in pure Python packages

I think it would be very important to make it possible to write a test function which can run a launch test. The fact that this PR moves all the tests into a separate package is a good indicator that not having that functionality is resulting in new code to be written which immediately introduces tech debt.

@hidmic
Copy link
Contributor Author

hidmic commented Jul 29, 2019

I was wondering why are TestCommandFinishesInAFiniteAmountOfTime as well as TestCommandOutput necessary at all? I would expect that to be provided by launch_testing. Otherwise each package implementing test would need to implement that logic.

If we like this way of specifying tests we may standardize it, but no, it's not provided by launch_testing out of the box.

I think it would be very important to make it possible to write a test function which can run a launch test.

I agree with that, though it's not a blocker for these tests to land.

Copy link
Member

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minimal comments.

@@ -37,4 +37,4 @@ def main(self, *, args):
if args.count_nodes:
print(len(node_names))
elif node_names:
print(*[n.full_name for n in node_names], sep='\n')
print(*sorted(n.full_name for n in node_names), sep='\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why sorting them? Was the order random?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe they show in discovery order, which is non-deterministic, yes.

output.extend((' ' + yaml.dump({
'partial_sequence': sequence
})).splitlines())
output.append('')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


sys.path.append(os.path.dirname(__file__))

from cli_test_configuration import CLITestConfiguration # noqa
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer noqa: I100 than wildcard noqa (I know that tests were doing that, and that I wrote it 😄 ).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same in many places.

'Waiting for an action server to become available...',
'Sending goal:',
' order: {}'.format(order),
'',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

' order: {}'.format(order),
'',
re.compile('Goal accepted with ID: [a-f0-9]+'),
'',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

),
CLITestConfiguration(
command='interface',
arguments=['show', 'std_msgs/msg/String'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show should be tested with an action and a srv too.

),
CLITestConfiguration(
command='msg',
arguments=['show', 'std_msgs/msg/NotAMessageType'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a test case doing:

arguments=['show', 'std_msgs/srv/NotAMessageType']

and another doing:

arguments=['show', 'std_msgs/action/NotAMessageType']

Actually, I'm curious if they work correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is my result:

ros2 msg show std_msgs/action/String
string data

... it isn't working fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this one is fun:

ros2 msg show std_msgs/asd/String
string data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivanpauno I can confirm interface introspection CLIs ignore the interface type namespace. I've commented out those tests, we should address that in a follow up PR.

def @TEST_NAME@(self, proc_info, command_under_test, test_configuration):
"""Test that the command under test finished in a finite amount of time."""
success = proc_info.waitForShutdown(
process=command_under_test, timeout=test_configuration.timeout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitForShutdown could take an optional keyword argument: assert: bool = False. In that way, we wouldn't need the extra if ...: assert ... here.
I think it may be useful.
Note: We have also assertWaitForShutdown, but it's less versatile than an extra argument.
Note: I won't delete the original waitForShutdown

get_add_two_ints_server_action(service_name='_add_two_ints')],
expected_output=itertools.chain(
# Cope with launch internal ROS 2 node.
itertools.repeat(re.compile(r'/launch_ros/.*parameter.*'), 6),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are checking for an exact number of lines here (6), of something that may easily change.
I don't know if it's better checking just if some lines are in the output, or checking for exact output.
This apply in many places, but I don't have an strong opinion about what's better.

Check lines in output pro: Updating the test is needed less often
Check lines in output con: Maybe, the test should have been updated (adding extra checks) after a change in other place, and it wasn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check lines in output con: Maybe, the test should have been updated (adding extra checks) after a change in other place, and it wasn't.

That's the main reason why I went for a restrictive rather than a permissive test. If we trust CLI release testing on these tests, I'm more comfortable being as precise as possible regarding how they should behave.

),
CLITestConfiguration(
command='topic',
arguments=['pub', '/my_ns/chatter', 'std_msgs/msg/String', '{data: test}'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the test also subscribe to the topic and check that the message is received?
I think that checking output is not enough in this case

Copy link
Contributor Author

@hidmic hidmic Jul 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why there's a listener, to check its output. Yes, we're trusting the listener in the process...

@ivanpauno
Copy link
Member

The size of the tests makes me wonder if we really can't write them more compact? Is there anything we could do API / feature wise which would help reduce this?

I think that most of the code is just test cases for parameterizing the test.
It will be difficult to do it more compact.

Instead we should find a way to write the test to work without this delay.

I agree with this point. It's probably a bit difficult to implement.

I was wondering why are TestCommandFinishesInAFiniteAmountOfTime as well as TestCommandOutput necessary at all?

I did a comment about improving waitForShutdown, so TestCommandFinishesInAFiniteAmountOfTime will be shorter.
launch_testing methods for asserting process output could be better. Probably, something similar to TestCommandOutput could be provided by launch_testing.

@hidmic
Copy link
Contributor Author

hidmic commented Jul 31, 2019

It will be difficult to do it more compact.

That is true @dirk-thomas. Test parameterizations are as few and compact as they can be. We simply have way to many combinations that need to be tested. But the test itself could use an API simplification.

@hidmic
Copy link
Contributor Author

hidmic commented Oct 15, 2019

@ivanpauno @dirk-thomas rebased and ready for re-review. bf24f0d is a major refactor of this PR, so anything before that commit is likely outdated. PTAL when you have time. Splitting is also an option if you find this PR hard to grok.

@hidmic
Copy link
Contributor Author

hidmic commented Oct 18, 2019

Considering how CI looks like atm, I don't think we'll get to merge this patch today. There are test fixture executables failing on specific platforms in a variety of ways, some even segfault in MacOS, while nightlies are all green. That means this is going to require further investigation on each platform.

CC @mjcarroll @dirk-thomas

@ivanpauno
Copy link
Member

IMO, this can be merged after the feature freeze. It's just adding tests, no features.
It would be good to merge ros2/launch#279 if it's ready, though.

@hidmic
Copy link
Contributor Author

hidmic commented Oct 18, 2019

It would be good to merge ros2/launch#279 if it's ready, though.

The same applies to that one. Test failures are mostly unrelated to this PR, but to other packages that make use of launch_testing and now fail in ways that are apparently unrelated to ros2/launch#279. But since it's the testing system itself that is under scrutiny, the only thing we can do is to go over each test in the right environment.

@hidmic
Copy link
Contributor Author

hidmic commented Oct 21, 2019

Alright, recent builds from master show that segmentation faults on MacOS w/ Opensplice (http://ci.ros2.org/job/ci_osx/6892/) and flaky pendulum control tests on Linux-aarch64 w/ Opensplice (http://ci.ros2.org/job/ci_linux-aarch64/4368/) are already there and unrelated to this patch.

CI also experienced some issues and thus the current failed builds. I'll re-trigger after solving some other failures related to ros2cli tests on MacOS.

@hidmic
Copy link
Contributor Author

hidmic commented Oct 21, 2019

Rebased to resolve conflicts.

@hidmic
Copy link
Contributor Author

hidmic commented Oct 21, 2019

CI (full, against Fast-RTPS, Connext and Opensplice)

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status
    • (only up to ros2topic after 2feb1dd) Build Status

- action command
- service command
- topic command
- interface command
- node command
- msg command
- srv command

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Improved test coverage for:

- ros2action
- ros2service
- ros2topic
- ros2msg
- ros2srv
- ros2interface
- ros2node
- ros2pkg

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
@hidmic
Copy link
Contributor Author

hidmic commented Oct 22, 2019

Alright, @ivanpauno @mjcarroll may I ask for a final review? Test failures on Linux and MacOS are all unrelated to this patch and I had to re-trigger Windows CI because it failed for unrelated reasons (yet again). I'd like to merge this ASAP to avoid further conflicts with any new changes to ros2/ros2cli .

Copy link
Member

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

It would be good to immediately follow-up in the TODOs (which are bugs unrelated to this PR that should be fixed).

@ivanpauno
Copy link
Member

Please, double check that all CI failures are unrelated.

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
@hidmic
Copy link
Contributor Author

hidmic commented Oct 22, 2019

Alright, finally green! Though that means there's some flakiness that needs to be addressed. @mjcarroll may I proceed with this PR and any follow-up ones after it?

@hidmic hidmic merged commit 309f9c5 into master Oct 22, 2019
Eloquent automation moved this from Reviewer approved to Done Oct 22, 2019
@delete-merged-branch delete-merged-branch bot deleted the hidmic/cli-test-coverage branch October 22, 2019 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Eloquent
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants