Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove str trailing nulls #8296

Closed
wants to merge 32 commits into from

Conversation

erickt
Copy link
Contributor

@erickt erickt commented Aug 5, 2013

This PR fixes #7235 and #3371, which removes trailing nulls from str types. Instead, it replaces the creation of c strings with a new type, std::c_str::CString, which wraps a malloced byte array, and respects:

  • No interior nulls
  • Ends with a trailing null

@thestinger
Copy link
Contributor

I agree 100% with removing the legacy trailing \0 from our strings for the reasons given in my comments on the issue report.

It doesn't have to happen as part of this pull request, but we should really switch Path to using [u8] internally. At the moment it's not possible to handle files without Unicode paths, despite many filesystems allowing it. The same thing applies to environment variables, as they are not necessarily UTF-8.

I think we'll find most places using str for interaction with C were doing so incorrectly, especially since it's impossible for a function using \0-termination to support all of UTF-8. File paths are one of the few things really guaranteed to never contain \0.

I'll hold off on an r+ because this deserves more attention than most pull requests.

/// # Failure
///
/// Fails if the CString is null.
pub fn as_bytes(&self) -> &'self [u8] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be like this?

fn as_bytes<'r>(&'r self) -> &'r [u8]

@brson
Copy link
Contributor

brson commented Aug 5, 2013

Really awesome effort.

With the CString type, when it doesn't own it's buffer is it true that any operation on it is effectively unsafe? So CString has a sort of safe/unsafe modality? If so that's kind of frightening.

@erickt
Copy link
Contributor Author

erickt commented Aug 5, 2013

@brson: Maybe I didn't document my intentions of CString owning or not owning the returned pointer well enough. One of the more obnoxious things to deal with in C is who owns the pointer. strcpy obviously returns a pointer that should be freed, but getenv returns a pointer that should not be freed. I wanted to allow library writers embed this information into CString.

That said, this api may be a little half baked at the moment and I'd be okay with removing this feature for now. Perhaps instead I should follow extra::c_vec and instead allow a user to pass in a destructor function to allow CString to work with other allocators.

@erickt erickt mentioned this pull request Aug 5, 2013
@brson
Copy link
Contributor

brson commented Aug 5, 2013

@erickt this is great progress, so even if the design needs to continue to evolve I'm pretty happy merging it. I think we probably need to bring it up in the meeting tomorrow though since this is a major language change.

@sfackler
Copy link
Member

sfackler commented Aug 6, 2013

This will probably need to be rebased on top of #8289 and str::raw::push_byte tweaked or the test that that commit added will fail.

@brson
Copy link
Contributor

brson commented Aug 6, 2013

@erickt can you merge again?

@graydon
Copy link
Contributor

graydon commented Aug 7, 2013

Merging because this is a big annoying bitrotty change and feels like the main API surface of it won't change much in the future.

I too am a little concerned about ownership being controlled by a flag, and by the dtor calling free. I think this difference may be significant enough to represent at the type level? I'm unsure. I can think of a number of other ways of representing it, few of which seem like they are particularly safe / footgun-proof. I'd love to be able to leverage lifetimes for this, such that (say) a CStringOwner can produce a CStringRef<'r> that points to the former and couples their lifetimes together. Or something? This is basically akin to the slice problem we had in normal strings, but for strings-that-C-will-like. I'm not the best at rust API design, sadly. But it seems to me the shorter the duration raw pointers emerge from the API surface, the better.

Also somewhat concerned about str::to_c_str: it calls as_bytes().to_c_str() which doesn't wind up verifying that the string in question lacks internal nulls, or is valid in the target locale-encoding, both of which are motivating concerns for this work.

Also ergonomically a bit concerned with to_c_str().with_ref since it's a long and chatty invocation, and hits a path that definitely allocates an owning buffer even though it's clearly a temporary use. Could we add a to_c_str_ref() combination function that allocates into a temporary buffer (and say, uses a small stack-local array for strings less than 128 bytes? that's likely most of them..)

bors added a commit that referenced this pull request Aug 10, 2013
This PR fixes #7235 and #3371, which removes trailing nulls from `str` types. Instead, it replaces the creation of c strings with a new type, `std::c_str::CString`, which wraps a malloced byte array, and respects:

*  No interior nulls
* Ends with a trailing null
@bluss
Copy link
Member

bluss commented Aug 10, 2013

mini-regression, the recent function signature change of std::str::raw::slice_bytes is reverted. Can be fixed later.

@erickt
Copy link
Contributor Author

erickt commented Aug 10, 2013

@blake2-ppc: thanks for catching that. I folded a fix for it into #8430 to cut down on the bors queue.

fn drop(&self) {
if self.owns_buffer_ {
unsafe {
libc::free(self.buf as *libc::c_void)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Windows there is no guarantee that "free" is the correct function to use to deallocate an OS-supplied buffer. Typically each API or set of APIs will have its own free-equivalent (but you have to call the correct one, or everything may break).

Raymond Chen (at the Old New Thing blog) has lots more details: http://blogs.msdn.com/b/oldnewthing/archive/2006/09/15/755966.aspx

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Erik-S I'm afraid that nobody will see a comment on a closed pull request, can you file an issue for this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I opened issue #8465

bors added a commit that referenced this pull request Sep 27, 2013
One of the downsides with `c_str` is that it always forces an allocation, and so this could add unnecessary overhead to various calls. This PR implements one of the suggestions @graydon made in #8296 for `vec.with_c_str` in that for a short string can use a small stack array instead of a malloced array for our temporary c string. This ends up being twice as fast for small strings.

There are two things to consider before landing this commit though. First off, I arbitrarily picked the stack array as 32 bytes, and I'm not sure if this a reasonable amount or not. Second, there is a risk that someone can keep a reference to the interior stack pointer, which could cause mayhem if someone were to dereference the pointer. Since we also can easily grab and return interior pointers to vecs though, I don't think this is that much of an issue.
@alexisvl
Copy link

"Oh, you can just malloc a new string with the null termination", said the developers who have never seen a microcontroller. "Screw anybody using Rust in an embedded environment and needing to interoperate with C."

Seriously, sometimes RAM is literally that tight. This does nothing but go out of your way to hassle people using Rust in low memory environments. Thanks, now I have to go write a special allocator to copy small strings and terminate them without fragmenting my handful of kB.

flip1995 pushed a commit to flip1995/rust that referenced this pull request Sep 7, 2023
new lint: `missing_asserts_for_indexing`

Fixes rust-lang#8296

This lint looks for repeated slice indexing and suggests adding an `assert!` beforehand that helps LLVM elide bounds checks. The lint documentation has an example.

I'm not really sure what category this should be in. It seems like a nice lint for the `perf` category but I suspect this has a pretty high FP rate, so it might have to be a pedantic lint or something.
I'm also not sure about the name. If someone knows a better name for this lint, I'd be fine with changing it.

changelog: new lint [`missing_asserts_for_indexing`]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove the trailing null from strings
10 participants