Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement `From<Vec<char>>` and `From<&'a [char]>` for `String` #35054

Merged
merged 1 commit into from Aug 2, 2016

Conversation

Projects
None yet
8 participants
@pwoolcoc
Copy link
Contributor

pwoolcoc commented Jul 26, 2016

Though there are ways to convert a slice or vec of chars into a string,
it would be nice to be able to just do String::from(&['a', 'b', 'c']),
so this PR implements From<Vec<char>> and From<&'a [char]> for
String.

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Jul 26, 2016

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @brson (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@pwoolcoc

This comment has been minimized.

Copy link
Contributor Author

pwoolcoc commented Jul 26, 2016

I know that String::with_capacity(v.len() * size_of::<char>()) will probably over-allocate in most cases, but I thought that it would be better to do that, then to just use ::new() and have to reallocate a lot in the loop. Let me know if there is something smarter I could do here.

@tbu-

This comment has been minimized.

Copy link
Contributor

tbu- commented Jul 26, 2016

String::with_capacity(v.len()) is probably more reasonable, it will allocate the minimum amount and the allocation strategy of String should deal with the non-minimal case quite well (it doesn't allocate for each character added).

@brson

This comment has been minimized.

Copy link
Contributor

brson commented Jul 26, 2016

Agree that v.len() is probably a better initial capacity. @tbu- can you update it? Edit: Sorry, I meant @pwoolcoc.

This patch makes sense to me. f? @rust-lang/libs

s.push(c);
}
s
}

This comment has been minimized.

@brson

brson Jul 26, 2016

Contributor

It probably makes sense to delegate to the &[char] impl here instead of duplicating the code.

It's possible we could do something clever to reuse the Vecs buffer, though I don't think it's worth thinking about in this PR.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Jul 26, 2016

Agree that v.len() is probably a better initial capacity.

That's an unstable optimization for purely ASCII text. A single non-ascii character in the string of any size (think of © or «) and the buffer is guaranteed to be reallocated. It may be reasonable to use some small correction (1.1 or so, needs estimation) to make the buffer resistant to small non-ascii noise.
(The coefficient is still text-dependent, for example 2.0 is perfect for Cyrillic texts, I actually used this as an optimization couple of times.)

@tbu-

This comment has been minimized.

Copy link
Contributor

tbu- commented Jul 26, 2016

@petrochenkov Yes, a single non-ASCII character will change that. However, this is what we do everywhere, allocations are always based on the minimum capacity that will be necessary (it's always the Iterator::size_hint().0 that is used for initial capacity).

@pwoolcoc

This comment has been minimized.

Copy link
Contributor Author

pwoolcoc commented Jul 26, 2016

Thanks for the suggestions everyone. I'll be back online in a couple hours & I'll push some changes.

-------- Original message --------
From: tbu- notifications@github.com
Date: 7/26/16 6:49 PM (GMT-05:00)
To: rust-lang/rust rust@noreply.github.com
Cc: Paul Woolcock paul@woolcock.us, Mention mention@noreply.github.com
Subject: Re: [rust-lang/rust] implement From<Vec<char>> and From<&'a [char]> for String (#35054)

@petrochenkov Yes, a single non-ASCII character will change that. However, this is what we do everywhere, allocations are always based on the minimum capacity that will be necessary (it's always the Iterator::size_hint().0 that is used for initial capacity).


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jul 26, 2016

Sounds reasonable to me!

@pwoolcoc pwoolcoc force-pushed the pwoolcoc:stringfromchars branch from 4f775cc to b3f2a2f Jul 27, 2016

@pwoolcoc

This comment has been minimized.

Copy link
Contributor Author

pwoolcoc commented Jul 27, 2016

Is it ok to leave the impl as impl<'a> From<&'a [char]> or should it be something more flexible like impl<T: AsRef<[char]>> From<T>?

@pwoolcoc

This comment has been minimized.

Copy link
Contributor Author

pwoolcoc commented Jul 27, 2016

r? @brson

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jul 27, 2016

I feel like the From<Vec<char>> implementation is very naive. At the very least it could be made to do no allocations at all:

impl From<Vec<char>> for String {
    fn from(mut v: Vec<char>) -> String {
        unsafe {
            let ptr = v.as_mut_ptr() as *mut u8;
            let mut bytes = 0;
            {
            let mut rest = v.as_mut_slice();
            while let Some((chr, rest_)) = {rest}.split_first_mut() {
                for byte in chr.encode_utf8() {
                    *ptr.offset(bytes) = byte;
                    bytes += 1;
                }
                rest = rest_;
            }
            }
            let cap = v.capacity();
            ::std::mem::forget(v);
            String::from_raw_parts(ptr, bytes as usize, cap)
        }
    }
} 
// Perhaps this code could be made better, I didn’t ponder much on it.
@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jul 27, 2016

@nagisa as @brson mentioned earlier we could indeed do things like reuse the buffer, but for now it doesn't seem worth the unsafe complexity when no one's clamboring for it.

@pwoolcoc yeah I think it's best to stay concrete and avoid generics for From impls where conflicts are sometimes difficult to avoid.

@pwoolcoc

This comment has been minimized.

Copy link
Contributor Author

pwoolcoc commented Jul 27, 2016

thanks @alexcrichton, I think it is ready to go but I am unable to replicate the test failure that travis is reporting

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jul 27, 2016

Ah yeah that's ok, if you rebase on master it should fix it as the PR to solve that problem went in a few hours ago

implement `From<Vec<char>>` and `From<&'a [char]>` for `String`
Though there are ways to convert a slice or vec of chars into a string,
it would be nice to be able to just do `String::from(['a', 'b', 'c'])`,
so this PR implements `From<Vec<char>>` and `From<&'a [char]>` for
String.

@pwoolcoc pwoolcoc force-pushed the pwoolcoc:stringfromchars branch from b3f2a2f to ac73335 Jul 27, 2016

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jul 27, 2016

but for now it doesn't seem worth the unsafe complexity when no one's clamboring for it.

Seems to go at odds with the philosophy of From conversions being cheap to me.

I’d like to point out that I implemented the code I pasted above manually two or three times already in various locations and so far the desire to reuse the allocation was pretty strong in each use-case. The implementation as proposed by the PR is plain useless as far as I’m and my code are concerned, which is exactly why I am complaining.

Shall I send a PR against this PR?

@brson

This comment has been minimized.

Copy link
Contributor

brson commented Jul 27, 2016

@pwoolcoc The next steps are to wait for the libs team to approve the new APIs. Typically this takes until next tuesday, though if enough of them chime in here it could go faster.

@nagisa I recognize that optimization is desirable, but still prefer to do it as a follow up, for a few reasons: unsafe optimizations require a different set of eyes and more thorough review; I want to lower barriers to contribution, not frustrate contributors by expanding the scope of PRs, make landing small contributions faster; generally, I'd like to hold up fewer issues by waiting for perfection, and be more willing to settle for incremental progress.

(On the subject of making small / first-time contributions faster - it is quite frustrating that any minor lib feature enhancements have a week turnaround waiting for the libs team to meet face-to-face. Very bad contributor experience.)

@pwoolcoc

This comment has been minimized.

Copy link
Contributor Author

pwoolcoc commented Jul 27, 2016

@brson ok, thanks!

@brson

This comment has been minimized.

Copy link
Contributor

brson commented Aug 1, 2016

@bors r+ libs team is happy

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Aug 1, 2016

📌 Commit ac73335 has been approved by brson

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Aug 2, 2016

⌛️ Testing commit ac73335 with merge 19765f2...

bors added a commit that referenced this pull request Aug 2, 2016

Auto merge of #35054 - pwoolcoc:stringfromchars, r=brson
implement `From<Vec<char>>` and `From<&'a [char]>` for `String`

Though there are ways to convert a slice or vec of chars into a string,
it would be nice to be able to just do `String::from(&['a', 'b', 'c'])`,
so this PR implements `From<Vec<char>>` and `From<&'a [char]>` for
String.

@bors bors merged commit ac73335 into rust-lang:master Aug 2, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

bors added a commit that referenced this pull request Sep 24, 2016

Auto merge of #36685 - brson:rev-string-from, r=sfackler
Revert "implement `From<Vec<char>>` and `From<&'a [char]>` for `String`"

This reverts commit ac73335.

This is a revert of #35054, which resulted in at least 7 known regressions, reported [here](https://internals.rust-lang.org/t/regression-report-stable-2016-08-16-vs-beta-2016-09-21/4119) and [here](#36352), which will hit stable next week.

I think this breakage was somewhat unanticipated, and we did not realize so many crates were broken until this week, so reverting is the conservative thing to do until we figure out how not to cause so much breakage. I've run crater on the revert and did not find any new breakage from the revert.

Fixes #36352

cc @pwoolcoc @rust-lang/libs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.