Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error handling when copying C sequence messages #671

Merged
merged 2 commits into from
Apr 1, 2022

Conversation

hidmic
Copy link
Contributor

@hidmic hidmic commented Mar 21, 2022

This is an issue I spotted while reviewing #669. We shouldn't hold reallocated (i.e. invalidated) data pointers if any new sequence item initialization fails. It's quite hard to run into this problem (e.g. RAM exhaustion), but it is theoretically possible.

CI up to rosidl_generator_c and rosidl_runtime_c:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

Do not hold reallocated (i.e. invalidated) data pointer
if any new sequence item initialization fails

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
@hidmic hidmic added the bug Something isn't working label Mar 21, 2022
@hidmic hidmic requested a review from sloretz March 22, 2022 11:07
@clalancette
Copy link
Contributor

@ros-pull-request-builder retest this please

Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a couple of comments on how to clean things up here. Let me know what you think.

return false;
}
}
output->data = data;
output->capacity = input->size;
}
output->size = input->size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't directly related to what you are fixing here, but should we move this after the loop below that copies the individual fields? That way if one of them fails, we aren't updating output->size erroneously (this is probably never a problem in practice, but I still thought I would point it out).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clalancette even more so, we should probably be incrementing it as we copy.

}
allocator.deallocate(data, allocator.state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree with the error handling here. Let's assume for a minute that we started copying the elements here one at a time, and one of them failed, so this method returned false. In that case, the most reasonable thing for a caller of these functions to do is to call the __fini method on this array. The problem with that sequence, though, is that we've already freed up some of the elements here, and the __fini method will try to free those elements again, leading to a double-free. What I think we should do here is probably set output->size to 0 on error, and then have the __fini method free only up to output->size; that way we won't double-free things. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, the most reasonable thing for a caller of these functions to do is to call the __fini method on this array.

fini the input or the output?

What I think we should do here is probably set output->size to 0 on error

I think this loop is only finalizing the things beyond output's capacity before copy was called, so if size was set to 0 then all of output needed to be finalized in this function too.

and then have the __fini method free only up to output->size; that way we won't double-free things. What do you think?

It looks like all elements up to capacity can be initialized at the moment. Say there are two arrays foo and bar. foo has size and capacity 5 and bar has size and capacity 7. Say foo is copied into bar. Because bar has excess capacity, the whole if(output->capacity < input->size) block is skipped. foo's 5 elements get copied into bar and bar now has size 5 and capacity 7. If fini only free's up to output->size then the last two elements in bar will be leaked because copy didn't free them.

Finalizing up to size seems like it could be made to work, but everything needs to make sure only elements up to size are initialized. There would need to be another loop in copy to fini output's elements that go beyond the new size when input->size < output->size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this loop is only finalizing the things beyond output's capacity before copy was called, so if size was set to 0 then all of output needed to be finalized in this function too.

That's correct.

The problem with that sequence, though, is that we've already freed up some of the elements here, and the __fini method will try to free those elements again, leading to a double-free.

A double-free isn't a possible failure mode, as message finalization functions are typically idempotent. We're not consistent about that across core packages though.

What I think we should do here is probably set output->size to 0 on error, and then have the __fini method free only up to output->size; that way we won't double-free things.

Finalizing up to size seems like it could be made to work, but everything needs to make sure only elements up to size are initialized.

__fini functions operate under the assumption that elements up to capacity will have been default initialized. __copy functions simply follow suit. We can revisit that assumption though, to allocate up to capacity and initialize up to size instead.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hidmic was kind enough to work through this with me, and I now understand a lot more about what we are doing here. In short, if the realloc on line 469 succeeds, we must assign it to output->data, because that memory may have been moved and now the original pointer is bogus. So that explains the first line of the patch.

The rest falls out of that. __fini expects that up to ->capacity is valid memory that has been default initialized. If we failed to initialize one of the elements during the copy, then we roll back the new elements we've copied, leaving the existing ones alone. We also do not update ->capacity. That fulfills __fini's requirements (we end up wasting the additional memory that we just allocated, but we don't lose it or leak it).

So with all of that said, I think this refactor is the correct thing. The only thing I'll request is a small comment somewhat explaining the situation, since it is not obvious (at least, it wasn't obvious to me).

I'm still going to approve, since the code itself seems correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hidmic was kind enough to work through this with me

Thank you @clalancette for sparing the time to look at this.

Added comments in 3a09837. PTAL!

}
allocator.deallocate(data, allocator.state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hidmic was kind enough to work through this with me, and I now understand a lot more about what we are doing here. In short, if the realloc on line 469 succeeds, we must assign it to output->data, because that memory may have been moved and now the original pointer is bogus. So that explains the first line of the patch.

The rest falls out of that. __fini expects that up to ->capacity is valid memory that has been default initialized. If we failed to initialize one of the elements during the copy, then we roll back the new elements we've copied, leaving the existing ones alone. We also do not update ->capacity. That fulfills __fini's requirements (we end up wasting the additional memory that we just allocated, but we don't lose it or leak it).

So with all of that said, I think this refactor is the correct thing. The only thing I'll request is a small comment somewhat explaining the situation, since it is not obvious (at least, it wasn't obvious to me).

I'm still going to approve, since the code itself seems correct.

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
@hidmic
Copy link
Contributor Author

hidmic commented Apr 1, 2022

Since 3a09837 is comments only and the Rpr job passed, I'll merge.

@hidmic hidmic merged commit 6486f6b into master Apr 1, 2022
@delete-merged-branch delete-merged-branch bot deleted the hidmic/fix-c-sequence-copies-on-error branch April 1, 2022 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants