Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timestamps to message info #214

Merged
merged 3 commits into from
Apr 23, 2020
Merged

Add timestamps to message info #214

merged 3 commits into from
Apr 23, 2020

Conversation

iluetkeb
Copy link
Contributor

See ros2/design#259

This replaces #200 as discussed.

See ros2/design#259

Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>
Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>
rotu added a commit to RoverRobotics-forks/rmw_cyclonedds that referenced this pull request Apr 16, 2020
rotu added a commit to RoverRobotics-forks/rmw_cyclonedds that referenced this pull request Apr 16, 2020
rotu added a commit to RoverRobotics-forks/rmw_cyclonedds that referenced this pull request Apr 18, 2020
@rotu
Copy link

rotu commented Apr 19, 2020

BTW, I think rmw_time_t represents a duration - and it might be more appropriate to use rcutils_time_point_value_t.

rmw/include/rmw/types.h Outdated Show resolved Hide resolved
@rotu
Copy link

rotu commented Apr 20, 2020

Agree on send_response, but why should they not be applicable to take_response?

Because these are request ID timestamps (being in the rmw_request_id_t struct). If you mean timestamps for the response message in a rmw_message_info_t, I support that.

Using rmw_message_info_t would duplicate the sender ID

I have to think about duplicating the sender ID. I think the intent is to identify the service, which might or might not correspond with a publisher ID. AFAICT, nobody defined how “services” map to DDS entities, and I don’t know what sender IDs are actually used for nor how this is supposed to behave if a service has multiple servers available

Personally, I'm not a big fan of the with_info methods.

👍 maybe there’s a principled reason they exist, but I hate them too. I’d rather keep the number of RMW methods down. Leave them null or just ignore the value. (But please document if the argument is nullable).

@rotu
Copy link

rotu commented Apr 20, 2020

I’m implementing, it occurs that this breaks the time abstraction https://design.ros2.org/articles/clock_and_time.html since ROS Time is not defined at the RMW level. I’m not sure what if anything should be done about this.

@iluetkeb
Copy link
Contributor Author

I’m implementing, it occurs that this breaks the time abstraction

Yeah, I've worried a bit about that as well. It's not so easy to fix that.

However, since we're not planning on giving these timestamps to the application, it should not really matter. They are only used by the executor and there, only ordering does matter, really.

In any case, if we intend to implement the clock abstraction at some point, we should be able to do so without breaking ABI. Therefore, right now I'm more thinking about whether we need to change from rmw_time_t to rcutils_time_point_value_t.

@wjwwood wjwwood added this to In review in Foxy API Freeze Apr 20, 2020
@wjwwood wjwwood self-assigned this Apr 20, 2020
Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see the changes to services separated into a different pr, so we don't put the message info changes at schedule risk.

rmw/include/rmw/types.h Outdated Show resolved Hide resolved
rmw/include/rmw/types.h Outdated Show resolved Hide resolved
@rotu
Copy link

rotu commented Apr 21, 2020

so we don't put the message info changes at schedule risk

@wjwwood for Earth 2.0, let’s enable use_sim_time, okay?

@iluetkeb
Copy link
Contributor Author

iluetkeb commented Apr 21, 2020

for Earth 2.0, let’s enable use_sim_time, okay?

Can you start a design issue on that? Otherwise we probably forget. Personally I'm not convinced it's necessary, so I'm not going to drive it, but I will gladly participate in a discussion.

@wjwwood
Copy link
Member

wjwwood commented Apr 21, 2020

AFAICT, nobody defined how “services” map to DDS entities, and I don’t know what sender IDs are actually used for nor how this is supposed to behave if a service has multiple servers available

Yeah, I didn't write that part of the rmw interface, but my guess is that "writer_guid" should be "requester_guid" and it should identify the sender of the request, this might be a DDS entity guid, like a data writer, or it might be something else, but the goal is that with the writer/requester id and a sequence number that it uniquely identifies a single request.

Separately we might need a responder id so the requester can know who responded, but that's not strictly needed I think. The response does need to include the writer/requester id and the sequence number so it can ensure the response is intended for themselves and to which request the response should be delivered, in the case of concurrent requests.

I’m implementing, it occurs that this breaks the time abstraction https://design.ros2.org/articles/clock_and_time.html since ROS Time is not defined at the RMW level. I’m not sure what if anything should be done about this.

I don't think we need source and received timestamps in ROS time, I think system time is ok instead of ROS time.

However, we have to keep in mind that system time can slew and jump, and also that it has leap seconds. Perhaps there should be a third timestamp which is steady_received_timestamp which uses a monotonically increasing steady clock? You could imagine a fourth steady_source_timestamp, but that would only be useful for comparing time deltas in multiple messages as it would not be associated with a specific system time stamp, though it should be "close" to the source_timestamp.

I've long wanted to have an option in our software stack for TAI time, which is both a time point (uses unix epoch) but also is monotonically increasing (no leap seconds).

In any case, if we intend to implement the clock abstraction at some point, we should be able to do so without breaking ABI. Therefore, right now I'm more thinking about whether we need to change from rmw_time_t to rcutils_time_point_value_t.

Since these times are time points and not durations (because we're using system time), using rcutils_time_point_value_t is ok. I'm not sure why we are still using the split struct style in rmw, maybe because that's what the messages are using, or maybe I just never had time to push down my "single variable timestamp" ideas into rmw from rcl.

I would recommend doing a #define rmw_time_point_value_t rcutils_time_point_value_t and similar aliases for the related time functions, and then use rmw_time_point_value_t in the struct. Just in case we need to refactor and not use rcutils (or change the name or something) in the future.

Signed-off-by: Luetkebohle Ingo (CR/AEX3) <ingo.luetkebohle@de.bosch.com>
@iluetkeb
Copy link
Contributor Author

Since these times are time points and not durations (because we're using system time), using rcutils_time_point_value_t is ok.

Done in b5d636a

@wjwwood
Copy link
Member

wjwwood commented Apr 22, 2020

Ok, here's CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

This CI includes these pull requests:

And builds up to demo_nodes_cpp and demo_nodes_py, just to check things out quickly.

Also, it only builds with Fast-RTPS static, as the cyclone pr doesn't look to be updated with the new timestamp type.

@wjwwood
Copy link
Member

wjwwood commented Apr 23, 2020

I was going to try and do the message info along with the service info pr's, but there's too much missing still for the service info, e.g. connext is not working, and I cannot edit the bosch forks, so I have been spending a lot of time making my own. Also, I cannot edit the cyclone pr that @iluetkeb put forward, but there are issues with it, so I won't take it as-is either.

Instead, here is CI building and testing everything including only this pr, the associated ones above it, and the fast rtps changes:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

To be specific this includes:

@wjwwood
Copy link
Member

wjwwood commented Apr 23, 2020

So, maybe services with Connext is further along than I thought, but I was missing this pull request: ros2/rosidl_typesupport_connext#53

Either way, I think trying to get messages tested before nightly's run was the best bet.

@wjwwood
Copy link
Member

wjwwood commented Apr 23, 2020

Ok another round of CI, building and testing with --packages-up-to test_rclcpp demo_nodes_cpp demo_nodes_py, including cyclone and Connext too:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

This one now includes @iluetkeb's cyclone pr:

@iluetkeb
Copy link
Contributor Author

This has a patch failure with connext. Is that my failure?

@iluetkeb
Copy link
Contributor Author

@wjwwood This fails with a connext patch failure, but I didn't change rmw_connext for these PRs, so it shouldn't be me...

@wjwwood
Copy link
Member

wjwwood commented Apr 23, 2020

Not sure I’m checking.

@wjwwood
Copy link
Member

wjwwood commented Apr 23, 2020

I don’t think it was my ci setting or your changes. I’m raising it up the chain.

@iluetkeb
Copy link
Contributor Author

#212 seemed to have a similar issue.

@wjwwood
Copy link
Member

wjwwood commented Apr 23, 2020

@iluetkeb there are linter failures...

https://ci.ros2.org/job/ci_linux-aarch64/5943/testReport/junit/(root)/projectroot/flake8/
https://ci.ros2.org/job/ci_linux-aarch64/5943/testReport/junit/(root)/projectroot/uncrustify/

Are you still around to have a look, or should I?

Please try to run the tests (at least the linters) locally before pushing things. Our CI turn around is very long to catch things like this which can easily be caught locally.

@wjwwood
Copy link
Member

wjwwood commented Apr 23, 2020

The failure on Linux is apparently some kind of known flaky build issue. I'll just have to re-run it after we fix the linter issues.

@iluetkeb
Copy link
Contributor Author

I'm still around.

there are linter failures...

The links you provided don't contain errors... Are they the right ones?

Please try to run the tests (at least the linters) locally before pushing things.

I do, at least before asking for CI. Sometimes I forget, but I think in this case there may be something else going on.

@wjwwood
Copy link
Member

wjwwood commented Apr 23, 2020

No, sorry that's yet another bug in Jenkins, sometimes it links you to a different result, here you can check the arm64 linux job directly, under failing tests:

https://ci.ros2.org/job/ci_linux-aarch64/5943/

@wjwwood
Copy link
Member

wjwwood commented Apr 23, 2020

I do, at least before asking for CI. Sometimes I forget, but I think in this case there may be something else going on.

Of course, I just want to remind you in case you weren't. It's not obvious to everyone.

@iluetkeb
Copy link
Contributor Author

So, the rclpy style issues should be addressed now, and I also added returning 0 for received timestamp in rmw_cyclonedds.

@wjwwood
Copy link
Member

wjwwood commented Apr 23, 2020

New CI with everything, but with --packages-up-to rclpy on Linux to see that we fixed linting issues:

Build Status

Windows and macOS looked ok though.

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros2-middleware-change-proposal/15863/7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants