Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upZero first byte of CString on drop #36264
Conversation
rust-highfive
assigned
aturon
Sep 4, 2016
This comment has been minimized.
This comment has been minimized.
|
r? @aturon (rust_highfive has picked a reviewer for you, use r? to override) |
matklad
force-pushed the
matklad:zeroing-cstring
branch
from
817b00c
to
c5d1a6e
Sep 4, 2016
This comment has been minimized.
This comment has been minimized.
|
I don't know what stability attribute I should put here :) |
This comment has been minimized.
This comment has been minimized.
|
@matklad stability attributes on impls don't do anything anyway, so I believe they are usually marked stable. |
durka
reviewed
Sep 4, 2016
| // memory unsafe code from working by accident. | ||
| impl Drop for CString { | ||
| fn drop(&mut self) { | ||
| unsafe { *self.inner.get_unchecked_mut(0) = 0; } |
This comment has been minimized.
This comment has been minimized.
durka
Sep 4, 2016
Contributor
Is it guaranteed that a CString always points to something? It can't be empty?
This comment has been minimized.
This comment has been minimized.
matklad
Sep 4, 2016
Author
Member
Yes, even unsafe constructors like from_raw can't break this invariant.
This comment has been minimized.
This comment has been minimized.
|
I'm worried that this will mask the failure, because accessing the empty string won't crash. Would it be better to set the pointer to a known invalid value so as to cause a segfault as soon as possible? |
matklad
force-pushed the
matklad:zeroing-cstring
branch
from
c5d1a6e
to
3e9fe0c
Sep 4, 2016
This comment has been minimized.
This comment has been minimized.
|
You've already given the pointer away - there's nothing you can do to it anymore. |
This comment has been minimized.
This comment has been minimized.
I wish we could do that, but looks like we can't? It's to late to change the pointer in the Drop, because when we created a |
matklad
force-pushed the
matklad:zeroing-cstring
branch
from
3e9fe0c
to
d20fa49
Sep 4, 2016
This comment has been minimized.
This comment has been minimized.
|
Ah, good point. If we do this it should be tagged relnotes because it's a silent behavior change. |
matklad
force-pushed the
matklad:zeroing-cstring
branch
from
d20fa49
to
e9923f7
Sep 4, 2016
This comment has been minimized.
This comment has been minimized.
Behaviour change wrt. what? This memory is deallocated right after and it is invalid to use the pointer after that. All things considered, I kinda like this PR. |
This comment has been minimized.
This comment has been minimized.
|
This seems like a neat trick to me to help a footgun in many cases. My only worry would be about performance where this is adding code where previously there was none (and may not be easy to optimize out?) @matklad could you try benchmarking the time it takes to deallocate a |
This comment has been minimized.
This comment has been minimized.
|
I made some benchmarks, the difference is within the error:
benchmarking just drop is somewhat difficult, because our benchmarking framework does not provide a way to initialize iteration state without counting time for it. |
This comment has been minimized.
This comment has been minimized.
|
I think the only way this could cause performance to suffer is if the memory for the CString has been evicted from the cache or paged out and freeing that memory does not involve paging it back in or pulling it into cache, so that writing that single null byte would result in a cache miss or page fault that wouldn't otherwise occur. However, this sort of situation is kinda rare, and I think the benefits of this PR definitely outweigh the performance costs of writing a single byte. |
matklad
force-pushed the
matklad:zeroing-cstring
branch
from
e9923f7
to
a4e0630
Sep 6, 2016
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton I sure hope it doesn't get optimized out :) |
This comment has been minimized.
This comment has been minimized.
|
Should we add a note to the |
This comment has been minimized.
This comment has been minimized.
This is my concern as well. I've verified manually that without this impl you are able to recover the contents of the string after the drop and that with the impl you get an empty string back. But I think that any such test will definitely invoke dereferencing a freed pointer, and I assume that we should not add such tests.
I don't think so because I can't think of a proper formulation for the docs, this is too hacky to be explained nicely. And hopefully if you have read the docs for |
This comment has been minimized.
This comment has been minimized.
|
I've tried adding a run-fail test. It fails with the current compiler, as expected, but I don't have time to wait for |
matklad
force-pushed the
matklad:zeroing-cstring
branch
from
8cc7691
to
c25447f
Sep 7, 2016
alexcrichton
reviewed
Sep 7, 2016
| fn is_hello(s: *const c_char) -> bool { | ||
| let s = unsafe { CStr::from_ptr(s) }; | ||
| let hello = CString::new("Hello").unwrap(); | ||
| s == hello.as_ref() |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Sep 7, 2016
Member
I think this is technically undefined behavior, so perhaps this test could be isolated into a separate process? That is, it's fine if the process segfaults or otherwise aborts abnormally, but it's not fine for it to return successfully.
This comment has been minimized.
This comment has been minimized.
matklad
Sep 7, 2016
Author
Member
Hm, I though that run-fail tests are already executed as a separate processes?
This comment has been minimized.
This comment has been minimized.
alexcrichton
Sep 7, 2016
Member
They are, yeah, but it's asserted that this test's output will always contain the text "assertion failed" which I don't think is guaranteed to happen (due to this being UB)
This comment has been minimized.
This comment has been minimized.
matklad
Sep 7, 2016
Author
Member
I would like to omit error-pattern here precisely for this reason, but the test infra does not allow me to do so :(
This comment has been minimized.
This comment has been minimized.
alexcrichton
Sep 7, 2016
Member
Yeah it's fine to just ship this over to a run-pass test which re-executes itself (there's a few other tests that do that)
This comment has been minimized.
This comment has been minimized.
matklad
Sep 7, 2016
Author
Member
there's a few other tests that do that
Can you point me to a particular example?
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
matklad
Sep 7, 2016
Author
Member
Wow, Command::new(&args[0]) is stunningly beautiful, never heard of this trick!
ollie27
reviewed
Sep 7, 2016
| fn into_inner(mut self) -> Box<[u8]> { | ||
| unsafe { | ||
| let mut result = mem::uninitialized(); | ||
| mem::swap(&mut self.inner, &mut result); |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
matklad
added some commits
Sep 4, 2016
matklad
force-pushed the
matklad:zeroing-cstring
branch
from
c25447f
to
f9a3408
Sep 7, 2016
alexcrichton
added
the
T-libs
label
Sep 7, 2016
This comment has been minimized.
This comment has been minimized.
|
Looks good to me, thanks @matklad! |
This comment has been minimized.
This comment has been minimized.
|
And I've verified that the new test fails if the |
bluss
added
the
relnotes
label
Sep 8, 2016
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Sep 13, 2016
This comment has been minimized.
This comment has been minimized.
bors
merged commit f9a3408
into
rust-lang:master
Sep 13, 2016
This comment has been minimized.
This comment has been minimized.
|
@matklad |
This comment has been minimized.
This comment has been minimized.
|
Would it make sense to do this exclusively in debug mode, rather than in release mode? That would avoid even the minor performance hit, while ensuring that tests run in debug mode would fail. |
This comment has been minimized.
This comment has been minimized.
|
@joshtriplett std is compiled in release mode, so it would always be disabled, regardless of whether the Rust program you're running is using debug or release, so that idea won't work. |
This comment has been minimized.
This comment has been minimized.
It is formally possible just because it invokes UB. I repeat test ten times just because it fails on my machine if I use only one string. Is there perhaps some way to add a hook to rust memory deallocation function to make this test UB-free? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I think it would probably be a good idea to make do some benchmarking to make sure that the cost of writing a single byte is noticeable in any way compared to the other work going on to e.g. deallocate the CString and do whatever work the CString was created for in the first place. |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
@sfackler see #36264 (comment) for benchmarks. We probably should remove the test as is. If we had a way to specify allocator for data inside CString, then it would be possible to use some Vec-based ptr-bump allocator which makes sure that data, once allocated, never gets overwritten, making sure the the whole test is not one big glaring UB hole. |
This comment has been minimized.
This comment has been minimized.
Would something like this work: |
This comment has been minimized.
This comment has been minimized.
|
Yes, it would. The only thing you really need to do is to override |
This comment has been minimized.
This comment has been minimized.
|
If the purpose of the test is to check the destructor behavior, then
This fixes the test too. |
This comment has been minimized.
This comment has been minimized.
|
@petrochenkov the drop glue of |
This comment has been minimized.
This comment has been minimized.
|
Yep. Less UB is still UB. Allocator override would certainly be better. |
This comment has been minimized.
This comment has been minimized.
|
I'll send a PR with a test fixed later this week, unless someone wants to step in ;) |
This comment has been minimized.
This comment has been minimized.
|
LLVM can, in theory, optimize the write away: https://godbolt.org/g/mfFfKu This doesn't affect us currently since LLVM specifically looks for the One possible workaround would be to use |
This comment has been minimized.
This comment has been minimized.
|
This appears to have succeeded at catching at least one bug in the wild: https://www.reddit.com/r/rust/comments/5r6ccm/an_unsafe_gotcha/ |
This comment has been minimized.
This comment has been minimized.
|
|
matklad commentedSep 4, 2016
•
edited
Hi! This is one more attempt to ameliorate
CString::new("...").unwrap().as_ptr()problem (related RFC: rust-lang/rfcs#1642).One of the biggest problems with this code is that it may actually work in practice, so the idea of this PR is to proactively break such invalid code.
Looks like writing a
nullbyte at the start of the CString should do the trick, and I think is an affordable cost: zeroing a single byte inDropshould be cheap enough compared to actual memory deallocation which would follow.I would actually prefer to do something like
because Cthulhu error should be much easier to google, but unfortunately this would be too expensive in release builds, and we can't implement things
cfg(debug_assertions)conditionally in stdlib.Not sure if the whole idea or my implementation (I've used
transmutemem::unitializedto workaround move out of Drop thing) makes sense :)