Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic fix rcl lifecycle test issue #715

Merged

Conversation

Barry-Xu-2018
Copy link
Contributor

Related issue is #713

Copy link
Contributor

@Karsten1987 Karsten1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you could tackle the two issues reported in #713 individually. That makes reviewing/merging a tick faster.

I would recommend having this PR tackle the rcutils_reset_error findings and a follow up PR with the memory leaks.

@@ -161,17 +161,21 @@ rcl_lifecycle_transition_fini(

rcl_ret_t ret = RCL_RET_OK;

if (rcl_lifecycle_state_fini(transition->start, allocator) != RCL_RET_OK) {
ret = RCL_RET_ERROR;
if (transition->start) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that needed? Calling fini/deallocate on a nullptr should be totally fine.

Copy link
Contributor Author

@Barry-Xu-2018 Barry-Xu-2018 Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Behavior is correct.
But I think it is better for rcl_lifecycle_transition_fini() to take charge of checking its members transition->start instead of depending on other function. ( Of course, this is not good to be included this commit)

Like below codes. It doesn't depend on rcl_yaml_node_struct_fini() checking.

rcl/rcl/src/rcl/arguments.c

Lines 626 to 630 in 944068f

// Drop parameter overrides if none was found.
if (0U == args_impl->parameter_overrides->num_nodes) {
rcl_yaml_node_struct_fini(args_impl->parameter_overrides);
args_impl->parameter_overrides = NULL;
}

@@ -326,6 +332,7 @@ TEST(TestRclLifecycle, state_machine) {
// Node is null
ret = rcl_lifecycle_state_machine_fini(&state_machine, nullptr, &allocator);
EXPECT_EQ(ret, RCL_RET_ERROR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect to see << rcl_get_error_string().str here as well.

Copy link
Contributor Author

@Barry-Xu-2018 Barry-Xu-2018 Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXPECT_EQ(ret, RCL_RET_ERROR) << rcl_get_error_string().str;

It means rcl_get_error_string().str is output if ret != RCL_RET_ERROR.
That is, ret is RCL_RET_OK. There is no error message.
So I think we need not to add <<rcl_get_error_string().str.
What do you think ?

rcl_lifecycle/test/test_rcl_lifecycle.cpp Show resolved Hide resolved
rcl_lifecycle/test/test_rcl_lifecycle.cpp Show resolved Hide resolved
@Barry-Xu-2018
Copy link
Contributor Author

It would be great if you could tackle the two issues reported in #713 individually. That makes reviewing/merging a tick faster.

I would recommend having this PR tackle the rcutils_reset_error findings and a follow up PR with the memory leaks.

Since rcutils_reset_error() should be called as your said, I update title of #713. #713 is for one issue on memory leaks since missing call *_fini().

@Barry-Xu-2018 Barry-Xu-2018 force-pushed the topic-fix-rcl-lifecycle-test-issue branch from e5162a4 to 73de356 Compare July 15, 2020 09:47
@Barry-Xu-2018
Copy link
Contributor Author

@Karsten1987

Please check again.
Since change is a little big, I use 'force-pushed'.

@Barry-Xu-2018 Barry-Xu-2018 force-pushed the topic-fix-rcl-lifecycle-test-issue branch from 73de356 to 7718ecb Compare July 16, 2020 01:27
@Karsten1987
Copy link
Contributor

A couple of things:

When running this, I still see the following output:

1: [ RUN      ] TestDefaultStateMachine.default_init
1: 
1: >>> [rcutils|error_handling.c:108] rcutils_set_error_state()
1: This error state is being overwritten:
1: 
1:   'invalid allocator, at /home/karsten/ros2_ws/src/ros2/rcl/rcl_lifecycle/src/transition_map.c:92'
1: 
1: with this new error message:
1: 
1:   'can't free transition map, no allocator given
1: , at /home/karsten/ros2_ws/src/ros2/rcl/rcl_lifecycle/src/transition_map.c:57'
1: 
1: rcutils_reset_error() should be called after error handling to avoid this.
1: <<<
1: [       OK ] TestDefaultStateMachine.default_init (4 ms)

Is there a chance you could address these as well? Looks like there's another rcutils_reset_error missing.

But then further, when run with valgrind I see that this test is leaking around 360 bytes on Linux:

==5764== LEAK SUMMARY:
==5764==    definitely lost: 360 bytes in 5 blocks
==5764==    indirectly lost: 264 bytes in 3 blocks
==5764==      possibly lost: 0 bytes in 0 blocks
==5764==    still reachable: 26,347 bytes in 79 blocks
==5764==         suppressed: 0 bytes in 0 blocks

Granted, looking at the output, I can't really claim that all of them were introduced by this patch. But when running the same test on master I see only 284 bytes lost. That we leak memory is a problem by itself, but I would like you to verify that this patch indeed introduced another memory leak.

@Barry-Xu-2018
Copy link
Contributor Author

A couple of things:

When running this, I still see the following output:

1: [ RUN      ] TestDefaultStateMachine.default_init
1: 
1: >>> [rcutils|error_handling.c:108] rcutils_set_error_state()
1: This error state is being overwritten:
1: 
1:   'invalid allocator, at /home/karsten/ros2_ws/src/ros2/rcl/rcl_lifecycle/src/transition_map.c:92'
1: 
1: with this new error message:
1: 
1:   'can't free transition map, no allocator given
1: , at /home/karsten/ros2_ws/src/ros2/rcl/rcl_lifecycle/src/transition_map.c:57'
1: 
1: rcutils_reset_error() should be called after error handling to avoid this.
1: <<<
1: [       OK ] TestDefaultStateMachine.default_init (4 ms)

Is there a chance you could address these as well? Looks like there's another rcutils_reset_error missing.

About the cause of this problem, please look at #713 (comment).
I provide one solution. Please look at #713 (comment).
Do you think this fixing is suitable for this commit ?

But then further, when run with valgrind I see that this test is leaking around 360 bytes on Linux:

==5764== LEAK SUMMARY:
==5764==    definitely lost: 360 bytes in 5 blocks
==5764==    indirectly lost: 264 bytes in 3 blocks
==5764==      possibly lost: 0 bytes in 0 blocks
==5764==    still reachable: 26,347 bytes in 79 blocks
==5764==         suppressed: 0 bytes in 0 blocks

Granted, looking at the output, I can't really claim that all of them were introduced by this patch. But when running the same test on master I see only 284 bytes lost. That we leak memory is a problem by itself, but I would like you to verify that this patch indeed introduced another memory leak.

Thanks for your information.
OK. I will investigate it.

@Barry-Xu-2018
Copy link
Contributor Author

@Karsten1987

In my environment, I execute valgrind before and apply this change.
Besides, I find the result is unstable. So I execute 5 times for each state.

  • Before (commit: 944068f)

    root@81614b33588d:~/ros2_rolling# tail 1.log
    ==1184== LEAK SUMMARY:
    ==1184==    definitely lost: 700 bytes in 14 blocks
    ==1184==    indirectly lost: 846 bytes in 20 blocks
    ==1184==      possibly lost: 133,920 bytes in 124 blocks
    ==1184==    still reachable: 2,567,391 bytes in 1,905 blocks
    ==1184==         suppressed: 0 bytes in 0 blocks
    ==1184== Rerun with --leak-check=full to see details of leaked memory
    ==1184== 
    ==1184== For lists of detected and suppressed errors, rerun with: -s
    ==1184== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
    root@81614b33588d:~/ros2_rolling# tail 2.log
    ==1198== LEAK SUMMARY:
    ==1198==    definitely lost: 628 bytes in 11 blocks
    ==1198==    indirectly lost: 582 bytes in 17 blocks
    ==1198==      possibly lost: 133,920 bytes in 124 blocks
    ==1198==    still reachable: 2,567,304 bytes in 1,905 blocks
    ==1198==         suppressed: 0 bytes in 0 blocks
    ==1198== Rerun with --leak-check=full to see details of leaked memory
    ==1198== 
    ==1198== For lists of detected and suppressed errors, rerun with: -s
    ==1198== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
    root@81614b33588d:~/ros2_rolling# tail 3.log
    ==1211== LEAK SUMMARY:
    ==1211==    definitely lost: 700 bytes in 14 blocks
    ==1211==    indirectly lost: 846 bytes in 20 blocks
    ==1211==      possibly lost: 115,928 bytes in 117 blocks
    ==1211==    still reachable: 2,564,433 bytes in 1,873 blocks
    ==1211==         suppressed: 0 bytes in 0 blocks
    ==1211== Rerun with --leak-check=full to see details of leaked memory
    ==1211== 
    ==1211== For lists of detected and suppressed errors, rerun with: -s
    ==1211== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
    root@81614b33588d:~/ros2_rolling# tail 4.log
    ==1240== LEAK SUMMARY:
    ==1240==    definitely lost: 676 bytes in 13 blocks
    ==1240==    indirectly lost: 758 bytes in 19 blocks
    ==1240==      possibly lost: 137,280 bytes in 129 blocks
    ==1240==    still reachable: 2,567,369 bytes in 1,902 blocks
    ==1240==         suppressed: 0 bytes in 0 blocks
    ==1240== Rerun with --leak-check=full to see details of leaked memory
    ==1240== 
    ==1240== For lists of detected and suppressed errors, rerun with: -s
    ==1240== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 0 from 0)
    root@81614b33588d:~/ros2_rolling# tail 5.log
    ==1226== LEAK SUMMARY:
    ==1226==    definitely lost: 748 bytes in 16 blocks
    ==1226==    indirectly lost: 1,022 bytes in 22 blocks
    ==1226==      possibly lost: 135,968 bytes in 128 blocks
    ==1226==    still reachable: 2,575,836 bytes in 1,927 blocks
    ==1226==         suppressed: 0 bytes in 0 blocks
    ==1226== Rerun with --leak-check=full to see details of leaked memory
    ==1226== 
    ==1226== For lists of detected and suppressed errors, rerun with: -s
    ==1226== ERROR SUMMARY: 7 errors from 3 contexts (suppressed: 0 from 0)

    Definitely lost: 628~748
    Indirectly lost: 582~1022

  • After (apply this change)

    root@81614b33588d:~/ros2_rolling# tail 1.log 
    ==1617== LEAK SUMMARY:
    ==1617==    definitely lost: 360 bytes in 5 blocks
    ==1617==    indirectly lost: 264 bytes in 3 blocks
    ==1617==      possibly lost: 0 bytes in 0 blocks
    ==1617==    still reachable: 23,073 bytes in 70 blocks
    ==1617==         suppressed: 0 bytes in 0 blocks
    ==1617== Rerun with --leak-check=full to see details of leaked memory
    ==1617== 
    ==1617== For lists of detected and suppressed errors, rerun with: -s
    ==1617== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
    root@81614b33588d:~/ros2_rolling# tail 2.log 
    ==1630== LEAK SUMMARY:
    ==1630==    definitely lost: 360 bytes in 5 blocks
    ==1630==    indirectly lost: 264 bytes in 3 blocks
    ==1630==      possibly lost: 0 bytes in 0 blocks
    ==1630==    still reachable: 23,073 bytes in 70 blocks
    ==1630==         suppressed: 0 bytes in 0 blocks
    ==1630== Rerun with --leak-check=full to see details of leaked memory
    ==1630== 
    ==1630== For lists of detected and suppressed errors, rerun with: -s
    ==1630== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
    root@81614b33588d:~/ros2_rolling# tail 3.log 
    ==1643== LEAK SUMMARY:
    ==1643==    definitely lost: 432 bytes in 8 blocks
    ==1643==    indirectly lost: 528 bytes in 6 blocks
    ==1643==      possibly lost: 0 bytes in 0 blocks
    ==1643==    still reachable: 23,073 bytes in 70 blocks
    ==1643==         suppressed: 0 bytes in 0 blocks
    ==1643== Rerun with --leak-check=full to see details of leaked memory
    ==1643== 
    ==1643== For lists of detected and suppressed errors, rerun with: -s
    ==1643== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
    root@81614b33588d:~/ros2_rolling# tail 4.log 
    ==1657== LEAK SUMMARY:
    ==1657==    definitely lost: 408 bytes in 7 blocks
    ==1657==    indirectly lost: 440 bytes in 5 blocks
    ==1657==      possibly lost: 0 bytes in 0 blocks
    ==1657==    still reachable: 23,073 bytes in 70 blocks
    ==1657==         suppressed: 0 bytes in 0 blocks
    ==1657== Rerun with --leak-check=full to see details of leaked memory
    ==1657== 
    ==1657== For lists of detected and suppressed errors, rerun with: -s
    ==1657== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
    root@81614b33588d:~/ros2_rolling# tail 5.log 
    ==1670== LEAK SUMMARY:
    ==1670==    definitely lost: 384 bytes in 6 blocks
    ==1670==    indirectly lost: 352 bytes in 4 blocks
    ==1670==      possibly lost: 0 bytes in 0 blocks
    ==1670==    still reachable: 23,073 bytes in 70 blocks
    ==1670==         suppressed: 0 bytes in 0 blocks
    ==1670== Rerun with --leak-check=full to see details of leaked memory
    ==1670== 
    ==1670== For lists of detected and suppressed errors, rerun with: -s
    ==1670== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

    Definitely lost: 360~432
    Indirectly lost: 264~528

Based on above test result, memory leak is decreased obviously.
I don't know why there is big difference on original checking.
About the remaining memory leak issue, I think it is better to create another issue to further check.
What do you think ?

@Karsten1987
Copy link
Contributor

Karsten1987 commented Jul 20, 2020

@Barry-Xu-2018 thanks for verifying. I think it makes sense what you propose, to open up a separate issue for the memory leaks.

As for the prints (overwriting the rcutils error), I think it makes sense to apply your changes here as well. It's best to address that issue completely here within the same PR. Also having it in a PR allows me to review it more easily.

@Barry-Xu-2018
Copy link
Contributor Author

@Karsten1987

I think it makes sense what you propose, to open up a separate issue for the memory leaks.

Thank you.

As for the prints (overwriting the rcutils error), I think it makes sense to apply your changes here as well. It's best to address that issue completely here within the same PR. Also having it in a PR allows me to review it more easily.

OK. I submit new commit 871601e to this PR.
Please review it.

Comment on lines 663 to 664
char * init_error_string = "";
char * fini_error_string = "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider renaming this to init_error_message and fini_error_message to stay consistent with fail_error_message.
Also I would recommend to add a bit of a docstring on top of each of the three error messages to get an idea on what's going on here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why do we need 3 strings here?
I would assume 2 should be sufficient.

  • init_error_message: The original cause or error why we land in fail:
  • fini_error_message: A conglomerate of failures happening within fail:.

fini_error = rcl_get_error_string().str;
fini_error_string = rcutils_strdup(rcl_get_error_string().str, default_allocator);
if (fini_error_string != NULL) {
fini_error = rcl_get_error_string().str;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stay consistent with the logic above

Suggested change
fini_error = rcl_get_error_string().str;
fini_error = fini_error_string;

Copy link
Contributor Author

@Barry-Xu-2018 Barry-Xu-2018 Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.
But as above your suggestion, we will only use 2 variables.
So need not to this assignment-action.

@@ -659,6 +660,9 @@ rcl_lifecycle_init_default_state_machine(
rcl_ret_t ret = RCL_RET_ERROR;
// Used for concatenating error messages in the fail: block.
const char * fail_error_message = "";
char * init_error_string = "";
char * fini_error_string = "";
rcl_allocator_t default_allocator;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not initializing it directly? That would say it later on.

Suggested change
rcl_allocator_t default_allocator;
rcl_allocator_t default_allocator = rcl_get_default_allocator();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_allocator is only used in failure case.
I think it happens seldom.
So need not call rcl_get_default_allocator() each time.
What do you think ?

Comment on lines 706 to 708
if (init_error_string != NULL) {
fail_error_message = init_error_string;
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary. If init_error_string is null, then so be it. Only set the fini_error_message to the strdup failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If init_error_string is null, then so be it. Only set the fini_error_message to the strdup failure.

Sorry. I don't understand your comments.
Your comments describe what to do for else code.

    if (init_error_string != NULL) {
      fail_error_message = init_error_string;
    } else {
      fail_error_message = "Error while duplicating strings by rcutils_strdup() !";
    }

}

if (rcl_lifecycle_transition_map_fini(&state_machine->transition_map, allocator) != RCL_RET_OK) {
const char * fini_error = "";
if (rcl_error_is_set()) {
fini_error = rcl_get_error_string().str;
fini_error_string = rcutils_strdup(rcl_get_error_string().str, default_allocator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this should not be a strdup but rather a snprintf concatenating two error messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The below code has concatenating two error messages. So not do this here.

RCL_SET_ERROR_MSG_WITH_FORMAT_STRING(
"Freeing transition map failed while handling a previous error. Leaking memory!"
"\nOriginal error:\n\t%s\nError encountered in rcl_lifecycle_transition_map_fini():\n\t%s\n",
fail_error_message, fini_error);

@Barry-Xu-2018 Barry-Xu-2018 force-pushed the topic-fix-rcl-lifecycle-test-issue branch from f6dfe65 to cbb0132 Compare July 22, 2020 08:22
…_lifecycle

Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Barry Xu <barry.xu@sony.com>
@Barry-Xu-2018 Barry-Xu-2018 force-pushed the topic-fix-rcl-lifecycle-test-issue branch from cbb0132 to e861e2f Compare July 22, 2020 09:01
@Barry-Xu-2018
Copy link
Contributor Author

@Karsten1987

I rebase branch.
Based on your review comments, please look at commit e861e2f.

@Karsten1987
Copy link
Contributor

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

Copy link
Contributor

@Karsten1987 Karsten1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating with me over this.

@Karsten1987 Karsten1987 merged commit edf7cdf into ros2:master Jul 22, 2020
@Barry-Xu-2018
Copy link
Contributor Author

@Karsten1987

Thank you.

sloretz pushed a commit that referenced this pull request Sep 9, 2020
* Fix missing call fini() for lifecycle_transition and node in test_rcl_lifecycle

Signed-off-by: Barry Xu <barry.xu@sony.com>

* Fix error overwritten while allocator is Nullptr.

Signed-off-by: Barry Xu <barry.xu@sony.com>

* Optimize used variables

Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Shane Loretz <sloretz@osrfoundation.org>
sloretz added a commit that referenced this pull request Sep 9, 2020
* Fix missing call fini() for lifecycle_transition and node in test_rcl_lifecycle

Signed-off-by: Barry Xu <barry.xu@sony.com>

* Fix error overwritten while allocator is Nullptr.

Signed-off-by: Barry Xu <barry.xu@sony.com>

* Optimize used variables

Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Shane Loretz <sloretz@osrfoundation.org>

Co-authored-by: Barry Xu <barry.xu@sony.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants