Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entry point, plugin system, daemon, existing tools #1

Merged
merged 11 commits into from
Jun 14, 2017

Conversation

dirk-thomas
Copy link
Member

The PR adds some command line tools and their infrastructure. The pieces are structured in separate commits to make it easier to review:

  • The entry-point based plugin system, interfaces for commands and verbs, the cli ros2
  • Some "not advertised" commands to introspect the plugin system: extension_points, extensions
  • Using argcomplete for completion in bash
  • The interface to use rclpy nodes (either directly or via a daemon using xml-rpc for communicating)
  • Linter tests for copyrights and flake8
  • Add the command ros2 pkg with the verbs list and prefix
  • Add the command ros2 node with the verb list (similar to previous prototypes)
  • Add the command ros2 topic with the verbs echo, list, and pub (similar to previous prototypes)

The CI builds are not really saying much (beside that the linter tests for the new code are passing):

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

Since this covers all the features provided in https://github.com/ros2/cli_tools I suggest to delete that repo after this PR has been merged and is being referenced in the ros2.repos file.

@dirk-thomas dirk-thomas self-assigned this Jun 12, 2017
@dirk-thomas dirk-thomas added the in progress Actively being worked on (Kanban column) label Jun 12, 2017
@dirk-thomas dirk-thomas added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Jun 12, 2017
@dirk-thomas
Copy link
Member Author

Ready for review.

@dirk-thomas dirk-thomas force-pushed the initial_features branch 4 times, most recently from c144ae8 to 043520b Compare June 13, 2017 17:19
Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks ok to me, but I have several logistical comments throughout the pr that should be addressed or explained before I give a +1.


Also there are some design choices that I wish there had been consensus on before doing this pr. I don't think this should block this pr at all (because of the timeline for the beta), but I think it's a poor approach to development on a team (program first, discuss later). These items could have easily been settled with a design doc, no matter how minimal:

  • why use xmlrpc for the daemon?
  • why use long running daemon?
    • when does it shutdown?
  • why not use (or extend) osrf_pycommon's verb pattern (which is used by ament_tools)?
  • since you're not using osrf_pycommon's verb patter, why extension points?
  • why not iterate on the existing cli tools repo and rename it?
    • we're loosing the history from there

There seems to be a pattern of starting over, which I think is going to mean more technical debt in the long run, as we've punted the consolidating of the verb pattern further down the road. If you think there are improvements to be made, then that's fine, but imo we should be building on existing implementations where appropriate rather than continuously restarting from scratch just because it's an option.

For example, you could have iterated on osrf_pycommon.verb_pattern either on a branch or by duplicating the module as osrf_pycommon.verb_pattern_v2 or even by just copying the code into this repo. With the intention of reintegrating in the future, but without incurring the cost of doing so now (having to update ament_tools, et al). Perhaps there is a reason not to start with osrf_pycommon, but again that's lost here because there was no discussion in the first place.

This pattern of starting over is echoed by completely avoiding the existing cli_tools repo, for which I cannot come up with a technical reason and has the consequence of discarding contributions from others on the team unnecessarily. Personally, I really feel that this pr should be applied to that repository, even if the result is just deleting those packages in that repo. At least the contributor stats would have been preserved. But again, I won't make my review of this pr contingent on that.

@@ -0,0 +1,69 @@
# Copyright 2016-2017 Dirk Thomas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are your copyrights in here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the code was written by me in my spare time for a different project.


def main(*, script_name='ros2', argv=None, description=None, extension=None):
if description is None:
description = 'ros2 is an extensible command-line tool for ROS 2.'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pass in ros2 as the script name and not use it here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument allows other packages to pass a different script_name to this function (e.g. if you want an executable called ros2topic).

and not use it here

I am not sure what you mean with this. The variable is used later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description string has ros2 hard coded rather than using script_name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use script_name in the description.


from ros2cli.daemon import PORT

import xmlrpc.client
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why XMLRPC? Seems like a decision that should have been spec in a design before implementing. Especially since the daemon was not in the critical path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using XML-RPC gets us going quickly since it is very easy to use in Python.

In the future it might be replaced with any other protocol. E.g. even with DDS once we are able to connect to a service without the overhead of the discovery.


def spawn_daemon(args):
subprocess.Popen(
['_ros2_daemon'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A long lived daemon process is another pattern that ought to have been decided in a design before implementing imo. I think it is probably the right approach, but there's no traceability to a rationale or listing of alternatives.

Copy link
Member Author

@dirk-thomas dirk-thomas Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding a daemon was the idea to speed up the response time.

Do you imagine the daemon to have a "max-time-to-live"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you imagine the daemon to have a "max-time-to-live"?

Do the tools spawn a daemon if one is not available or does the user have to spawn one?

If it is the latter case, then I wouldn't expect this function to be necessary (no need to fork).

If the tool forks one itself, then I would expect a max-time-to-live, like how it works for rti's code gen or bazel's build engine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what happens if I do?:

$ ROS_DOMAIN_ID=42 ros2 topic list
$ ROS_DOMAIN_ID=43 ros2 topic list

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the tools spawn a daemon if one is not available or does the user have to spawn one?

The tools spawn a daemon on demand.

I would expect a max-time-to-live, like how it works for rti's code gen or bazel's build engine.

This can certainly be implemented. What default timeout of inactivity are you proposing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can certainly be implemented. What default timeout of inactivity are you proposing?

I don't know. I think bazel's long lived daemon runs until something like two hours after no activity.

At the moment the user has no way to shut it down without kill (or the task list on windows), or to even know if it is running. So I think it should be visible and easy to shutdown portably or it should have a ttl.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what happens if I do?:

$ ROS_DOMAIN_ID=42 ros2 topic list
$ ROS_DOMAIN_ID=43 ros2 topic list

The daemon and the command line tools currently don't know anything about ROS_DOMAIN_ID. So this use case is currently not covered. I can certainly add this though - different ways are possible:

  • E.g. whenever a RPC is being made to the daemon the current domain id is being passed. The daemon only answers the request if the domain id matches. Otherwise it should probably shut itself down.

  • The daemon chooses a port number which depends on the domain id. That way multiple daemon would happily coexist when a user uses different domain ids from different shells.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more like, every request had the domain id embedded and the daemon could create a node for each domain id it encounters. And the function that detects if a daemon is available would additionally see that the daemon had a node running for the current domain id already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw your other commit, port is fine too.

@@ -0,0 +1,20 @@
# Copyright 2015 Open Source Robotics Foundation, Inc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy here is 2015, but other test_*.py files that also appear to be copied use 2017.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

HIDDEN_TOPIC_PREFIX = '_'


def get_topic_names_and_types(*, node, include_hidden_topics=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of having node as a keyword argument? It's required. That doesn't seem to follow the standard python style. Having all calls to this function be (node=node, ...) seems wasteful to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is public API intended to also be used from other code it ensures that callers pass the arguments by keyword which ensures that if we add additional arguments in the future existing code is guaranteed to work correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only new parameters need to be kwargs to avoid any issues. node should always be first, even if new parameters are added. There's no reason to make it kwarg like this imo.

This seems like something we should do everywhere or only when required. As it stands it's in contrast to all of our existing code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo a function call like this:

func('one', True, 23)

is much less readable than:

func(package_name='foo', show_hidden=True, timeout=23)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are literals, there is no way that get_topic_names_and_types(node, ...) is ambiguous, unless the node parameter name is bad, but that's it's own problem.

I could be convinced that this would be a good thing to do for literals, but it is still the case that this code is in direct contradiction to most of our existing code. If this is the "new way forward" it needs to be documented in our developer guide after we discuss it as a team.

from ros2cli.plugin_system import satisfies_version


class VerbExtension(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think it's confusing to have lots of VerbExtension classes in different modules. A more descriptive name would be nice, like TopicVerbExtension.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment about the namespace of the module clarifying the context / name.

# an OrderedDict.
# Inspired by:
# http://stackoverflow.com/a/16782282/7169408
def represent_ordereddict(dumper, data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be duplicated across several places too. Maybe it should be put in a common place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this function more than once. Can you please point to the duplicate locations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I guess I passed over it twice while reviewing. I don't see one.

except ValueError:
raise RuntimeError('The passed message type is invalid')
module = importlib.import_module(package_name + '.msg')
msg_module = getattr(module, message_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has showed up too. Perhaps this should be factored into rclpy, a function like rclpy.get_message_module_by_name(str)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could indeed be provided by a different Python package. I don't think rclpy is a good place for this. Maybe some rosidl related package. But since we currently don't have a better place the code exists here. It can be moved somewhere else in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put a TODO then to deduplicate this code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Done.

nonlocal shutdown
print('Remote shutdown requested')
shutdown = True
server.register_function(shutdown_handler, 'system.shutdown')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is the daemon shutdown? I couldn't find anywhere where it is shutdown from.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rpc shutdown is currently not being called. But it allows any client to trigger a shutdown of an already running daemon (for whatever reason).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So again, when does the daemon shutdown, if it is forked off ("spawned")? Or how does the user shut it down?

self.node = DaemonNode(args)
else:
spawn_daemon(args)
self.node = DirectNode(args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why spawn the daemon if you're not going to use it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where the daemon is not running yet using the spawned daemon doesn't provide any benefit. Using the direct node is at least as fast as using the just spawned daemon and it allows the user to control e.g. the timeouts using command line arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the DirectNode always wait? If so, this is fine. The only case I was worried about was ros2 topic list returning the wrong data on the first shot because it didn't wait for enough input before returning.

See my other comment about the timeout being pretty short by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command line has a timeout option to specify how long the user wants to wait: https://github.com/ros2/ros2cli/pull/1/files#diff-0db5b41e1e3495a1cb622aa6f10bdf78R25


def spawn_daemon(args):
subprocess.Popen(
['_ros2_daemon'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what happens if I do?:

$ ROS_DOMAIN_ID=42 ros2 topic list
$ ROS_DOMAIN_ID=43 ros2 topic list

@dirk-thomas
Copy link
Member Author

dirk-thomas commented Jun 13, 2017

Also there are some design choices that I wish there had been consensus on before doing this pr. I don't think this should block this pr at all (because of the timeline for the beta), but I think it's a poor approach to development on a team (program first, discuss later). These items could have easily been settled with a design doc, no matter how minimal:

I absolutely agree that writing a design document before would have been the better choice. But the decision to skip the design document for now (because of the goal to integrate these changes for beta 2) was made in the ROS 2 team meeting. So the "poor approach" (to cite your words) have been chosen there - not by me.

why use xmlrpc for the daemon?

As mentioned inline XML-RPC was chosen for now since it is directly supported in Python and enabled the communication between the command line tool and the daemon with very little code and effort. In the future the protocol can be replaced with any alternative suitable. Even DDS if we have the ability to avoid the overhead of the distributed discovery phase.

why use long running daemon?
when does it shutdown?

The goal of the daemon is to provide faster results to queries which require to be part of the DDS network. Hence the daemon needs to be running before the command line script is invoked in order to have these information already. Currently the daemon run indefinitely. We could also make the daemon shutdown automatically, e.g. after a certain period of inactivity (obviously with the downside that another command line invocation after that timeout with be slower again)?

why not use (or extend) osrf_pycommon's verb pattern (which is used by ament_tools)?

I think the code in this repo is much shorter, easier to read and maintain, and doesn't need some of the "extra" features implemented in osrf_pycommon. Therefore I didn't see the benefit of depending on that package. Maybe you can point to any significant code block in osrf_pycommon which could be used here which would reduce the code size or complexity in this repo.

since you're not using osrf_pycommon's verb patter, why extension points?

I am not sure I understand the question. The extension points are being used to enable arbitrary packages to contribute commands and/or verbs to the command line tools.

they may seem like the obvious solution, but
they're known to be slow and make issues for tab completion: catkin/catkin_tools#297

I haven't experience any unacceptable slowdown and tab completion seemed snappy enough when I tried it. Since the command line tools are designed to be extensible I chose the default mechanism in Python to realize that: enty points. Alternatives like "fast entry points" can always be considered if necessary and sufficiently mature / supported.

we could have considered using ament_index, which I think ought to be faster

I don't think that the ament index should be used instead of Python entry points. While the ament index is certainly very valuable for various cases I consider Python entry points to be the "native" way of doing stuff in Python. Replacing a commonly known think with a custom concept imo increases the bar for external users / developers. Also when installing these Python package I would rather avoid to have the need for an amend index (and e.g. an AMENT_PREFIX_PATH environment variable).

why not iterate on the existing cli tools repo and rename it?

The current tools are standalone scripts. I didn't see any benefit (like saving time) by reusing them. I did reuse the code snippets containing the logic though when creating this PR.

we're loosing the history from there

The repo doesn't have any valuable history imo. The Python scripts have just been created from what was C++ code before (maybe plus a single commit for some parts).

There seems to be a pattern of starting over, which I think is going to mean more technical debt in the long run, as we've punted the consolidating of the verb pattern further down the road. If you think there are improvements to be made, then that's fine, but imo we should be building on existing implementations where appropriate rather than continuously restarting from scratch just because it's an option.
For example, you could have iterated on osrf_pycommon.verb_pattern either on a branch or by duplicating the module as osrf_pycommon.verb_pattern_v2 or even by just copying the code into this repo. With the intention of reintegrating in the future, but without incurring the cost of doing so now (having to update ament_tools, et al). Perhaps there is a reason not to start with osrf_pycommon, but again that's lost here because there was no discussion in the first place.

I have reused existing code (like the plugin system, the logic if the cli tools etc.) for exactly that reason. I don't see the value of depending on osrf_pycommon though. An regarding ament_tools we have already concluded that we don't want to spend time into developing that package any further but at some point in time focus on a build tool which can satisfy all current use cases (ROS 2, ROS 1, non-ROS packages).

This pattern of starting over is echoed by completely avoiding the existing cli_tools repo, for which I cannot come up with a technical reason and has the consequence of discarding contributions from others on the team unnecessarily. Personally, I really feel that this pr should be applied to that repository, even if the result is just deleting those packages in that repo. At least the contributor stats would have been preserved. But again, I won't make my review of this pr contingent on that.

Imo reuse is beneficial if you gain efficiency from it. Just reusing an existing repo for the sake of reusing it (since you mentioned removing all content before adding the new code).

I am not sure how the contributor stats play in here. Are you worried that users who have committed to cli_tools are worried that their contributions "diappear" when the repo is being deleted? We are talking about less then a dozen commits of history here. If that is the only concern I am happy to make a new PR to cli_tools which first wipes the repo clean and then applies the commits from this repo.

@dirk-thomas dirk-thomas force-pushed the initial_features branch 4 times, most recently from dfd174b to ae8a638 Compare June 13, 2017 23:51
@dirk-thomas
Copy link
Member Author

I made the port of the forked daemon dependent on the ROS_DOMAIN_ID: https://github.com/ros2/ros2cli/pull/1/files#diff-0db5b41e1e3495a1cb622aa6f10bdf78R54

I will add a timeout to the daemon to self destruct after a certain amount of inactivity... Suggestions for a default value are welcome.

@wjwwood
Copy link
Member

wjwwood commented Jun 14, 2017

So the "poor approach" (to cite your words) have been chosen there - not by me.

You don't have to do a full design doc to communicate your intentions to the rest of the team. Take my rviz plan I posted to the mailing list. It's not a full design doc, but it hits the high points on the plan for doing the work as well as some key decisions. The purpose of which is to coordinate with others, inform them so they can collaborate if they want to, or raise concerns if they see them.

My understanding of the meeting was to implement something that was the bare minimum for the requirements since we didn't have time to do a design first, not to make key design decisions without any discussion or design doc.

As mentioned inline XML-RPC was chosen for now since it is directly supported in Python and enabled the communication between the command line tool and the daemon with very little code and effort. In the future the protocol can be replaced with any alternative suitable. Even DDS if we have the ability to avoid the overhead of the distributed discovery phase.

👍

The goal of the daemon is to provide faster results to queries which require to be part of the DDS network. Hence the daemon needs to be running before the command line script is invoked in order to have these information already. Currently the daemon run indefinitely. We could also make the daemon shutdown automatically, e.g. after a certain period of inactivity (obviously with the downside that another command line invocation after that timeout with be slower again)?

That all sounds fine. I don't think it should live for ever, but also it should live long enough to make the issue of having to spin it up again not that common.

I think the code in this repo is much shorter, easier to read and maintain, and doesn't need some of the "extra" features implemented in osrf_pycommon. Therefore I didn't see the benefit of depending on that package.

Perhaps instead you could list the "extra" features that osrf_pycommon has that you don't need and why that matters? The point is that it implements exactly what you've reimplemented here. And since you were starting from scratch, all of the features were useful until you rewrote them here.

I agree that this code is easy to read and has some advantages over the cli_utils in osrf_pycommon, but these are all improvements that could have been put into osrf_pycommon as well. I don't see a reason not to do that.

Maybe you can point to any significant code block in osrf_pycommon which could be used here which would reduce the code size or complexity in this repo.

Well it handles the registration, listing, adding of arguments to parsers for verbs (all of which is done in this repo and would not need to be). It also provides utilities for sub parsers and for extracting subsets of arguments using -- as a delimiter.

Your code does more, it has a cleaner interface for registering extensions, and goes beyond verbs to commands. It also integrates with argcomplete, which is awesome. I just wish you would push that into osrf_pycommon so it could be used by catkin_tools, ament_tools, or other python command line tools we might need to make, rather than just plop it here.

Despite the fact that we want to replace catkin_tools and ament_tools with something better, there is no plan for that and I think it's a weak reason to avoid improving osrf_pycommon.

I am not sure I understand the question. The extension points are being used to enable arbitrary packages to contribute commands and/or verbs to the command line tools.

Sorry I meant entry_points.

I haven't experience any unacceptable slowdown and tab completion seemed snappy enough when I tried it. Since the command line tools are designed to be extensible I chose the default mechanism in Python to realize that: enty points. Alternatives like "fast entry points" can always be considered if necessary and sufficiently mature / supported.

I believe it is a matter of scale. As you increase the number of entry_points and as the code which each entry_point imports becomes more complex, it gets a lot slower. It's also possible that later versions of Python have since improved the performance.

I'm fine with the decision, but again it's just one of the things that ideally would have been discussed first.

I don't think that the ament index should be used instead of Python entry points. While the ament index is certainly very valuable for various cases I consider Python entry points to be the "native" way of doing stuff in Python. Replacing a commonly known think with a custom concept imo increases the bar for external users / developers. Also when installing these Python package I would rather avoid to have the need for an amend index (and e.g. an AMENT_PREFIX_PATH environment variable).

That's fine. I agree adding the dependency is unnecessary (even though ament_flake8 is already a dependency).

This is just another thing that should be in the "alternatives" section of a design doc.

The repo doesn't have any valuable history imo.

Are you worried that users who have committed to cli_tools are worried that their contributions "diappear" when the repo is being deleted?

Well, perhaps @mikaelarguedas and @Karsten1987 don't mind, but I think it would be the considerate thing to do to try and preserve their commits in the history. They spent time on these things and deserve credit for them.

I have reused existing code (like the plugin system, the logic if the cli tools etc.) for exactly that reason.

👍, but it would be better with commit history to show that.

I don't see the value of depending on osrf_pycommon though.

The value is in collaborating, and improving things for all of our command line tools, not just this set of tools in this one place. The benefits of collaborating is shared time on documentation and testing (for which osrf_pycommon has much more already) and for fixing common bugs (rather than running into the same issues in N places).

An regarding ament_tools we have already concluded that we don't want to spend time into developing that package any further but at some point in time focus on a build tool which can satisfy all current use cases (ROS 2, ROS 1, non-ROS packages).

That really doesn't have anything to do with it. catkin_tools also uses it, and this other build tool you're talking about should also, ideally, make use of it (with any necessary improvements or changes to make that acceptable). This is all part of trying to arrive a common solutions for similar problems. Instead we're constantly forking away from each other. This is not the ideal collaboration we should be having.


For me, the default case should be to try and use what is already there, improving it if necessary. And only with justification should we opt to build something new that does the same thing.

Please @ros2/team agree or disagree with me here, because I think this is a core creative decision that we should all have a consensus on.

@wjwwood
Copy link
Member

wjwwood commented Jun 14, 2017

I will add a timeout to the daemon to self destruct after a certain amount of inactivity... Suggestions for a default value are welcome.

Thanks, two hours?

@dirk-thomas
Copy link
Member Author

You don't have to do a full design doc to communicate your intentions to the rest of the team. Take my rviz plan I posted to the mailing list. It's not a full design doc, but it hits the high points on the plan for doing the work as well as some key decisions. The purpose of which is to coordinate with others, inform them so they can collaborate if they want to, or raise concerns if they see them.
My understanding of the meeting was to implement something that was the bare minimum for the requirements since we didn't have time to do a design first, not to make key design decisions without any discussion or design doc.

As mentioned before that decision was made not by me. And I also want to mention that sharing more information before the implementation phase is valuable - especially for larger development efforts. But in this case the work only took a couple of days so it is a balance of efficiency and overhead. Also nothing is set in stone and can be iterated on.

Perhaps instead you could list the "extra" features that osrf_pycommon has that you don't need and why that matters?

I tend to approach this the other way around. Instead of arguing what a possible dependency has which I don't need I ask myself how much does it safe me by using a dependency (tested code, reduction of code, etc.). Since I don't "see" the benefit I decided not to use it. We might have a different perspective how much benefit the extra dependency would provide. Without a patch showing the potential difference it is difficult to objectively judge the pros and cons. And as mentioned before I already had a code base which provided the functionality and therefore I reused that.

I believe it is a matter of scale. As you increase the number of entry_points and as the code which each entry_point imports becomes more complex, it gets a lot slower. It's also possible that later versions of Python have since improved the performance.
I'm fine with the decision, but again it's just one of the things that ideally would have been discussed first.

Something like using entry points is in my opinion an implementation detail. I didn't even consider that it would result in any discussion to use the one mechanism in Python for implementing a plugin system. If there is evidence that the entry points are not suitable for this use case they can easily be replaced with something better.

@dirk-thomas
Copy link
Member Author

I will add a timeout to the daemon to self destruct after a certain amount of inactivity... Suggestions for a default value are welcome.

Thanks, two hours?

I added a default timeout which shuts down the daemon after 2h of inactivity, see https://github.com/ros2/ros2cli/pull/1/files#diff-0db5b41e1e3495a1cb622aa6f10bdf78R23

Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

There are some lingering comments about the daemon, but I think it's ok or only needs minor adjustment.

self.node = DaemonNode(args)
else:
spawn_daemon(args)
self.node = DirectNode(args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the DirectNode always wait? If so, this is fine. The only case I was worried about was ros2 topic list returning the wrong data on the first shot because it didn't wait for enough input before returning.

See my other comment about the timeout being pretty short by default.

# TODO(mikaelarguedas) revisit this once it's specified
HIDDEN_NODE_PREFIX = '_'

DEFAULT_TIMEOUT = 0.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem very long, why did you arrive at this value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous incarnation of the tool used 0.01s (see https://github.com/ros2/cli_tools/blob/de404fdf6fc18deedf247c2ab2cabff4f532731f/cli_tools_py/rosnode_list.py#L27). When I tested the command line tools I sometimes didn't get the complete result with that short timeout and therefore increased it. With 0.1s I got complete results for all of my tests. That certainly depends in the network topology and size of the graph. If you think we should use a different default value please propose one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any less arbitrary number in mind. 0.5, 1.0 seconds? I guess we can just wait and see. Someone should run for a while without the daemon to see what number may be most appropriate with a realistic system, like the turtlebot demo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't benchmark it but I often used more than a second on the non very beefy arm boards to be sure to have all topics. Then it will vary a lot according to the nature of the system so I'm fine with whatever value as long as it's easy to override it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the default timeout to 0.5s.

@wjwwood
Copy link
Member

wjwwood commented Jun 14, 2017

argcomplete needs to be added to the installation instructions and CI I think. When I first tried this out I got:

% ros2 --help
Traceback (most recent call last):
  File "/Users/william/ros2_ws/install/bin/ros2", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3036, in <module>
    @_call_aside
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3020, in _call_aside
    f(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3049, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 654, in _build_master
    ws.require(__requires__)
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 968, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 854, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'argcomplete' distribution was not found and is required by ros2cli

@dirk-thomas
Copy link
Member Author

dirk-thomas commented Jun 14, 2017

argcomplete needs to be added to the installation instructions and CI I think.

Should I make argcomplete a "hard" dependency or remove it from the setup.py and make sure that it is used only when available (like here https://github.com/ros2/ros2cli/pull/1/files#diff-67470fc4e48a44a62467192fc8be262bR47)? I would prefer the latter...

@wjwwood
Copy link
Member

wjwwood commented Jun 14, 2017

I don't mind either way. I don't have a problem making it a hard dependency so long as there is a new enough apt package, or you're willing to make it optional just for the deb.

@dirk-thomas
Copy link
Member Author

I will go with making it optional than. If the user doesn't have argcomplete there is simply no completion (gracefully falling back of course). Once argcomplete is available the tools will provide completion. This will take a little bit...

@dirk-thomas
Copy link
Member Author

dirk-thomas commented Jun 14, 2017

@wjwwood There is something wrong with the last commit: 917fca3 Did you commit something else before which was overwritten?

@wjwwood
Copy link
Member

wjwwood commented Jun 14, 2017

@dirk-thomas sorry, I meant for just the gitignore.

Want me to force push, or make a new commit.

@wjwwood
Copy link
Member

wjwwood commented Jun 14, 2017

I'm seeing some strange behavior on my machine with respect to the daemon. (that's the debugging stuff I accidentally committed to the .gitignore)

@dirk-thomas
Copy link
Member Author

Make a new commit for now since I already committed additional stuff. I will squash / rebase before the merge.

@wjwwood
Copy link
Member

wjwwood commented Jun 14, 2017

I think the check for daemon is not working on my machine. I ran the daemon in a separate terminal, added a suffix to the node name when the daemon creates it, and added a print message when the tool trys to spawn a daemon. I get this:

william@farl ~/ros2_ws (git:master:f5c69a4)
% time ros2 node list -a
spawning daemon
_ros2cli_node
_ros2cli_node_daemon
_ros2cli_node_daemon
ros2 node list -a  0.40s user 0.06s system 47% cpu 0.973 total

william@farl ~/ros2_ws (git:master:f5c69a4)
% time ros2 node list -a
spawning daemon
_ros2cli_node
_ros2cli_node_daemon
_ros2cli_node_daemon
ros2 node list -a  0.39s user 0.05s system 46% cpu 0.953 total

william@farl ~/ros2_ws (git:master:f5c69a4)
% time ros2 node list -a
spawning daemon
_ros2cli_node
_ros2cli_node_daemon
_ros2cli_node_daemon
ros2 node list -a  0.40s user 0.06s system 47% cpu 0.968 total

william@farl ~/ros2_ws (git:master:f5c69a4)
% time ros2 node list -a
spawning daemon
_ros2cli_node
_ros2cli_node_daemon
_ros2cli_node_daemon
ros2 node list -a  0.38s user 0.06s system 46% cpu 0.948 total

I haven't looked into why the daemon is not properly detected.

@wjwwood
Copy link
Member

wjwwood commented Jun 14, 2017

That seems to have fixed it.

@dirk-thomas
Copy link
Member Author

I passed the domain id explicitly to the daemon (efb834de7aca25462d0c263fa4ba9ea6f29c154e). That makes it more visible in e.g. ps which domain id the daemon belongs to.

After merging in #2 which enables passing a node name suffix I appended the pid to the direct node name / the domain id to the node name in the daemon (43028ff4441b325229a1cd1f09d979fc1922d803).

@dirk-thomas
Copy link
Member Author

This should be ready for re-re-review 😉

Once I have a 👍 for merging I will rebase some of the commits to keep a clean history.

@dirk-thomas dirk-thomas merged commit 9fce6a0 into master Jun 14, 2017
@dirk-thomas dirk-thomas deleted the initial_features branch June 14, 2017 21:10
@dirk-thomas dirk-thomas removed the in review Waiting for review (Kanban column) label Jun 14, 2017
esteve pushed a commit to esteve/ros2cli that referenced this pull request Dec 16, 2022
…ros2#1)

Signed-off-by: William Woodall <william@osrfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants