Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"".to_string() is slower than String::new() #18404

Closed
cgaebel opened this issue Oct 28, 2014 · 19 comments
Closed

"".to_string() is slower than String::new() #18404

cgaebel opened this issue Oct 28, 2014 · 19 comments
Labels
A-collections Area: std::collections. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@cgaebel
Copy link
Contributor

cgaebel commented Oct 28, 2014

In a lot of rust code, I see "".to_string() being used as the default method of creating a new string. This would normally be fine, but it's actually noticeably slower than String::new() because it goes through the formatting infrastructure.

It'd be nice if this pattern could be special-cased in String::to_string's implementation, or the optimizer made to do this transformation better.

@thestinger
Copy link
Contributor

It also allocates a lot of memory (128 bytes minimum) even for small / empty strings.

#16415

@thestinger thestinger added I-slow Issue: Problems and improvements with respect to performance of generated code. A-libs labels Oct 28, 2014
@reem
Copy link
Contributor

reem commented Dec 2, 2014

This appears to be fixed. "".to_string().capacity() == 0.

@dotdash
Copy link
Contributor

dotdash commented Dec 14, 2014

Only the capacity part of it. The code that creates the String instance is still awful. Apparently there's more inlining going on now, so the code for to_string from an old rustc (0.13.0-dev (7e43f41 2014-11-15 13:22:24 +0000)) and from a current one (0.13.0-dev (3a9305c 2014-12-14 09:22:24 +0000) isn't really comparable, but it's far far off from what you get from either String::from_str("") or String::new() (those two produce the same asm).

pub fn from_str() -> String {
    String::from_str("")
}
    .section    .text._ZN8from_str20h167891935c11e273laaE,"ax",@progbits
    .globl  _ZN8from_str20h167891935c11e273laaE
    .align  16, 0x90
    .type   _ZN8from_str20h167891935c11e273laaE,@function
_ZN8from_str20h167891935c11e273laaE:
    .cfi_startproc
    movq    $1, (%rdi)
    movq    $0, 16(%rdi)
    movq    $0, 8(%rdi)
    movq    %rdi, %rax
    retq
.Ltmp44:
    .size   _ZN8from_str20h167891935c11e273laaE, .Ltmp44-_ZN8from_str20h167891935c11e273laaE
    .cfi_endproc

OTOH, this

pub fn to_string() -> String {
    "".to_string()
}

gives you almost 200 lines of asm (including labels), and that still calls into fmt().

@dotdash
Copy link
Contributor

dotdash commented Dec 14, 2014

extern crate test;

use test::{Bencher, black_box};

#[bench]
pub fn to_string(b: &mut Bencher) {
    b.iter(|| {
        black_box("".to_string());
    });
}

#[bench]
pub fn from_str(b: &mut Bencher) {
    b.iter(|| {
        black_box(String::from_str(""));
    });
}
running 2 tests
test from_str  ... bench:         0 ns/iter (+/- 1)
test to_string ... bench:        35 ns/iter (+/- 3)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured

@thestinger
Copy link
Contributor

A naive micro-benchmark won't be measuring the cost of reallocations. In the real world, it will rarely have room to expand in-place every time.

@dotdash
Copy link
Contributor

dotdash commented Dec 14, 2014

Neither of those should allocate. You get a String without any allocated capacity in both cases.

@thestinger
Copy link
Contributor

I'm aware. I'm pointing it out in advance so it can be done properly, because I'm sure measuring the cost of the repeated reallocations is going to be done too.

@kmcallister kmcallister added the A-collections Area: std::collections. label Jan 16, 2015
@SimonSapin
Copy link
Contributor

What is the preferred way to do this these days? servo/servo#4682 is changing Servo to use to_owned, but it implies importing the ToOwned trait all over the place. Such imports for such a common operation should really be in the prelude.

@lambda-fairy
Copy link
Contributor

Note that we can't just special case ToString for str, since that would lead to an overlapping instance.

I have another idea though. How about we implement to_string() in std::fmt instead? By putting it in std::fmt we can deal with the internal Formatter directly, cutting out all that indirection.

That won't solve the problem of allocating too much, but it might lessen the code bloat a bit.

@lambda-fairy
Copy link
Contributor

After some thought, I came up with another, simpler, idea.

First, add a method to Display:

pub trait Display {
    // ...
    #[inline]
    fn as_str_slice(&self) -> Option<&str> { None }
}

This method will return None for all types except str, where it returns Some(self):

impl Display for str {
    // ...
    #[inline]
    fn as_str_slice(&self) -> Option<&str> { Some(self) }
}

Now, here's where the magic lies. In ToString, before invoking the format machinery we check as_str_slice first. If this returns Some, we simply copy the slice, skipping the machinery altogether:

impl<T: Display + ?Sized> ToString for T {
    fn to_string(&self) -> String {
        match self.as_str_slice() {
            Some(s) => s.to_owned(),
            None => // as before
        }
    }
}

Since .as_str_slice() is inlined, the branch should be optimized out (I'm not an expert on this, so correct me if I'm wrong). So it'll work the same as before for non-string types, but for str it'll use the fast .to_owned().

Thoughts? I guess it's a bit icky having an extra method there, but if we hide it from the docs it should be unobtrusive enough.

@lambda-fairy
Copy link
Contributor

Okay, so I prototyped my idea. Initial benchmarks look very promising, with "".to_string() optimizing down to single call to .to_owned(), and 123.to_string() almost unaffected by the change.

There's a problem though with smart pointer types, specifically Arc<T>. Arc implements Display, which gives it a ToString instance via the blanket impl. Problem is, if T is not a string, then .to_string() will now borrow the Arc twice: once when calling .as_str_slice(), and once when calling format!(). The compiler may be able to optimize that out, but it still feels icky.

This decision should really be made on the type level, using either associated types or negative bounds. Unfortunately the associated types solution doesn't work yet (#20400), and negative bounds are still in the pipeline (rust-lang/rfcs#586).

So I guess we'll have to wait.

@huonw
Copy link
Member

huonw commented Jan 30, 2015

Problem is, if T is not a string, then .to_string() will now borrow the Arc twice: once when calling .as_str_slice(), and once when calling format!().

Borrowing a pointer out of an Arc is very cheap (only a pointer offset compared to a plain Box or &); the expensive part is construction/copying (.clone()) and destruction when the reference counts are modified atomically.

@lambda-fairy
Copy link
Contributor

Borrowing a pointer out of an Arc is very cheap (only a pointer offset compared to a plain Box or &); the expensive part is construction/copying (.clone()) and destruction when the reference counts are modified atomically.

Ah, I stand corrected! That makes my solution practical, though it still feels like a hack.

I guess if we treat this as a temporary solution, to be removed when negative bounds land, it should be okay.

@ranma42
Copy link
Contributor

ranma42 commented Jan 22, 2016

cc me

@ahmedcharles
Copy link
Contributor

Is this still relevant/reproducible?

@frewsxcv
Copy link
Member

Specialization has landed. This can be worked on now?

@japaric
Copy link
Member

japaric commented Mar 30, 2016

@frewsxcv there's already a PR: #32586

@apasel422
Copy link
Contributor

Was this effectively fixed by #32586?

@steveklabnik
Copy link
Member

It should have been, yes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: std::collections. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests