-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[[clang::unsafe_buffer_usage]] in libc++ #107904
Comments
I'd like to see this happen so that:
|
CC @var-const I could see this being useful, however we'd have to agree on what needs to be marked. For example, we support an ABI configuration under which iterators are bounded and will trap on OOB access. Would we consider an iterator-based API to be "unsafe" under that configuration? |
IMO this would need an RFC, since it's a major change within libc++. That RFC should mention
Also some things that will probably come up:
|
A few things here:
|
Is iterator hardening always sufficient to prevent UB? void f(std::random_access_iterator auto begin, std::random_access_iterator auto end) {
if (std::distance(begin, end) < 10) { ... }
} In this example if the iterators come from different containers (point to different allocations), we get UB, and the iterator checks don't help, right? llvm-project/libcxx/include/__iterator/bounded_iter.h Lines 191 to 194 in b3d2d50
|
They actually help quite a bit because an iterator from a different container will naturally be out-of-bounds for the other container. It's just a matter of remembering to check that. The only way this doesn't work is when it's two overlapping spans of the same buffer. But even then, it's more of a technical UB than a practical UB, because the underlying operation is kind of valid anyway given that it's the same buffer(?) In any case, looks like there's a TODO about that specifically: https://libcxx.llvm.org/Hardening.html#assertion-categories
|
Keep in mind that there are still some gaps in bounded iterators. They're fixable gaps, but someone needs to go through and fix them. #78771 comes to mind. But they're also simply bugs in to fix in bounded iterators, not a problem in the approach. To that end, I think the issue @danakj points out is simply a bug in bounded iterators, not a reason to reject the idea. |
TBH I can probably just cite all the bugs I've opened, as almost of them have been motivated in some form by bounded iterators. 😄 (But I don't have all that much time to hack on stuff, so I file more bugs than I have time to send patches for.) |
@danakj pointed out a case where we're kinda stuck with unbounded iterators. (Or perhaps we get |
With the standard library, it's possible for the compiler to do this in a nice way, I agree. What about for other library authors? From what I can see with current APIs, this would require every method that takes an iterator to have an overload taking a pointer, so that the latter can be annotated For example: template <std::input_iterator I>
void insert(I begin, I end); Would need to become something like: template <std::input_iterator I>
void insert(I begin, I end);
[[clang::unsafe_buffer_usage]]
void insert(value_type* begin, value_type* end); Having to double the number of functions in APIs like this to get the unsafe-buffers warning only when working with a pointer is really unfortunate. And it seems like the compiler could do better here for other libraries beyond the stdlib. One possibility, an optional boolean on the attribute. template <std::input_iterator I>
[[clang::unsafe_buffer_usage(std::is_pointer_v<I>)]]
void insert(I begin, I end); How else could we get warnings for pointers in these cases, in a consistent way between std and elsewhere? |
It'd actually need to become something like: template <std::input_iterator I>
void insert(I begin, I end) {
// body
}
void insert(value_type* begin, value_type* end) {
#pragma clang unsafe_buffer_usage begin // or address some of those warnings
// body
#pragma clang unsafe_buffer_usage end
} because the attribute doesn't suppress the warning inside the function. It only notifies the caller that this function should not be used at all. Addressing the warnings inside the function may still make your code strictly safer, even if it doesn't eliminate the unsafety entirely. Side note, On the other hand, if the user of the library includes the library through So in order to display any warning at all in this scenario, we'll need to do something very unusual: emit the warning in user code and unwrap the instantiation stack in reverse in order to put a note where the actual unsafe operation is. (It's probably not enough to simply put a note at the unsafe operation. Without the instantiation stack it may become incomprehensible.) Once we implement that, it may be true that the header doesn't need But the point still stands: if the function is unsafe, there are often a few ways to make it slightly safer, even if it continues to be unsafe. In this sense, a separate instantiation is often necessary. Not always though. So:
I still really like this. It may be exactly what we need in some cases. We could make the pragma conditional too?
|
I like the boolean on the attribute too. Should it be a type trait or something, so we can distinguish I mean really what we want is to say something like "this is safe as long as |
Closing, since the only actionable task here is to create an RFC, which isn't the task of libc++ contributors (necessarily). |
@philnik777 I think this should be reopened. The problem stands and it seems there is plenty of discussion on how we might solve it. It is a thorny problem and will likely require some deeper discussion on how best to solve it, but thorny problems are still problems. Particularly when they impact memory safety. |
@davidben These are all valid points, but nothing that could come out of that discussion is actionable for us. This is much too big of a project for this to be part of regular maintenance, which means that there has to be an RFC and a volunteer to actually implement this. For that reason I don't think this is the right forum for such a discussion. A discourse thread would be a much better option, and reaches a far wider audience. |
I see. So, to confirm, libc++'s stance is that you all do not consistently track problems in GitHub issues and instead move discussions in between GitHub and Discourse threads depending on how difficult they end up being to solve? Is this stance documented somewhere? @ldionne, is this consistent with your understanding of the project's process? This definitely would have been useful context for me in figuring out how best to track safety gaps (a key, user-facing, security-critical problem in C++ implementations today) in libc++'s STL implementation. This also seems like a problematic strategy for making progress on issues, which should ultimately be the goal. When a user-facing problem is discovered, it is often not obvious from the start whether it will end up being simple or require significant changes. That means issues will naturally start in GitHub. To have them be closed and we hope someone else manually transfers it to Discourse means losing track of a lost of past discussion. If libc++ wishes to use this strategy, do you all have an automatic process for synchronizing the two? Asking people to manually transfer issues between GitHub and Discourse is a good way for them to slip between the cracks. For every GitHub issue someone files, there are many more people who go to file a GitHub issue, search for existing ones, see it's already been filed, and skip filing one. All those people's needs end up slipping through as a result of this strategy. I would suggest a better strategy might be to keep the GitHub issues open, as they remain issues. This one's next step is simply "come up with a proposal and start a discourse thread". But in that critical time period in between, it's important to keep the ticket open so people can find it, add thoughts to it, and hopefully eventually lead to a solution. |
My understanding of our process is that Discourse RFCs are used to "cement" decisions that have wide impact, such as this one. This is kind of documented here, but to be fair it's not suuuuper clear. That being said, I personally don't see a problem with a discussion taking place in a Github issue if it is making progress and if the relevant people are in the thread. All that I would ask is that at the end of said discussion, a Discourse RFC be written to follow our consensus-seeking process: that RFC could explain what is being proposed (which would be the result of the discussion here), and could also rationalize the proposed design by summarizing the conclusions drawn here. I am speculating a bit, but I think @philnik777 is coming at this from a slightly different angle, where issues are supposed to track concrete actionable things, otherwise we end up accumulating a ton of "issues" and our bug tracker becomes a mess. I think that is also a valid point, and so we should try not to make issues like this linger for too long. We could also perhaps have a label like So, in summary, I'm fine with this discussion continuing here if it's the most productive way of moving forward, but if it loses momentum, that's probably a signal that it's time for a concrete Discourse RFC where we can get consensus on an actual approach and get started. Going back to technical bits. About
My understanding is that the compiler should flag the point at which a raw pointer is obtained from a data structure (way before it gets passed to on of our APIs like If we go for something like this:
then that means that basically every single iterator-based API in the library needs to be annotated, since all these APIs accept pointers, which are a kind of iterator. That raises a flag in my mind, since there ought to be a mechanism that doesn't require something so mechanical and repetitive. If we made the assumption that iterator-based APIs are in fact safe (since we can harden iterators), we would only have to flag uses of raw pointer arithmetic in the compiler, no? We'd mark such arithmetic inside libc++ hardened iterators as "trust me that's safe", and the compiler would do all the rest. Where am I going astray here (I'm certain this approach has been considered before)? |
libc++ is system headers so unsafe-buffers warning does not fire in the headers. We need the warning to happen on the call to the libc++ function when the caller is passing something unbounded (I think the discussion here is assuming iterators are bounded, so a pointer). |
Yes, but the user code that creates the pointer in the first place is not in a system header, right?
I think there's a typo here. You either mean "iterators are not bounded, so a pointer", or "iterators are bounded, so not a pointer". Which one did you mean? |
I mean that the discussion is assuming all iterators are bounded, so the warning should fire when passing a pointer to a function that will do arithmetic on it within. That's the It feels a bit wishful to think that code using libc++ will be unable to create a pointer pair that would go OOB without hitting a warning, thanks to the vast ecosystem of pointer code. The function that would go OOB if given bad pointers is responsible for marking its API as such and generating a warning.
|
Definitely a bit wishful, but then again what's the benefit to the user code if the only unsafe usage of raw pointers that we flag are the ones in the standard library, but none of their own code using raw pointers unsafely gets flagged (unless they too adopt the attribute)? Stepping back, I think there's ambiguity in
Basically every function that accesses a buffer via raw pointers can lead to an OOB access if it gets passed invalid information (e.g. an invalid length, an invalid pointer, etc). Is that really useful to flag though? At that point, shouldn't we be flagging the access to the buffer itself, and then we're back to my suggestion above? Another (much more targeted) take on this would be that we only want to flag APIs that are fundamentally unsafe, for example, the 3-legged template <class InputIterator1, class InputIterator2>
bool equal(InputIterator1 first1, InputIterator1 last1, InputIterator2 first2); That API should basically not be used since a 4-iterators overload now exists: template <class InputIterator1, class InputIterator2>
bool equal(InputIterator1 first1, InputIterator1 last1, InputIterator2 first2, InputIterator2 last2); The 3-legged variant is a clear example of an unsafe API, no matter what kind of iterators we have. I think I'm not convinced (yet) about the value of applying this attribute to all Standard library APIs that accept iterators:
It could be that I'm not seeing the big picture. I'll note, however, that if we were to do this, it could probably tie into a solution for #78771 since every algorithm will likely have to be taught about the concept of "iterator unwrapping" to solve that issue, and this could be a place where we diagnose that we actually unwrapped from something unsafe. Just a thought. |
There are many methods in libc++ which can cause out-of-bounds issues when given incorrect inputs, such as any method that takes one or more iterators as its inputs, or that takes a pointer input.
Will libc++ be annotating such methods with
[[clang::unsafe_buffer_usage]]
? Is the project open to adding such annotations on methods that receive iterators (instead of ranges)?Concrete example:
std::ranges::subrange::subrange(iterator, sentinel)
if given invalid inputs will create a subrange that goes out of bounds. This is similar tostd::span(first, size)
, which is currently hard-coded in the compiler as-if it were marked with[[clang::unsafe_buffer_usage]]
. Other examples:std::span::span(first, last)
,std::vector::insert(pos, first, last)
,std::memcpy(dest, src, count)
.Putting such annotations in libc++ will help callers avoid unsafe APIs and transition to safer ones.
We would need need all
[[clang::unsafe_buffer_usage]]
to live behind a config define to allow enabling it separately from rolling libc++ though.Thoughts? Is this something we could do now? At some future time? Explicitly undesirable?
cc: @haoNoQ @ziqingluo-90 @jkorous-apple @ldionne
The text was updated successfully, but these errors were encountered: