Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fmt size_hints #583

Closed
wants to merge 5 commits into from

Conversation

Projects
None yet
@seanmonstar
Copy link
Contributor

seanmonstar commented Jan 14, 2015

Add a size_hint method to each of the fmt traits, allowing a buffer to allocate with the correct size before writing.

Rendered

cc @alexcrichton

@erickt

This comment has been minimized.

Copy link

erickt commented Jan 14, 2015

I'm all for adding a hint for formatting. It would be nice in this RFC to mention unifying this with iterator size hints.

cc-ing @kballard and @thestinger, who I believe collaborated on adding Iterator::size_hint.

Does anyone know of anything using the Iterator::size_hint upper bound? I did a scan through the rust code base, and the only users I could find was other size_hint methods and tests. While semantically it'd be nice to have the lower and upper bounds of a collection, it's a bit of a shame we might be doing work that no one ever uses. Perhaps we should split things into a size_hint(&self) -> uint and a size_bounds(&self) -> (uint, Option<uint>)?

#46 is somewhat related to this RFC.

@lilyball

This comment has been minimized.

Copy link
Contributor

lilyball commented Jan 14, 2015

I don't know offhand of anyone using the Iterator::size_hint upper bound. I have a recollection a discussion a while back in which someone claimed that, after looking through the rust source, there were no users of it. At the time I defended its existence with theoretical use-cases that wouldn't be found in the standard libraries but might be found in third-party code, such as providing progress estimates for an operation that processes the results of an iterator chain. Or perhaps more applicable, there may be some use in having an Extend implementation that allocates more than the lower bound ahead of time, with the expectation that the "true" iterator count lies somewhere between the lower and upper bounds.

All that said, if the size_hint for fmt is just used for the allocation created by fmt itself, then if it's only using the lower bound, there doesn't seem to be much use in having the upper bound. Or is there a use-case for calling this size_hint in third-party code? Alternatively, would the fmt module perhaps want to experiment with pre-allocating some size in between the lower and upper bounds? I'm thinking some simple rule like allocating min(lower_bound * 1.5, upper_bound).

@lilyball

This comment has been minimized.

Copy link
Contributor

lilyball commented Jan 14, 2015

A drawback to this proposal: all manual implementations of fmt traits now have a second method to implement that must be kept in sync with the fmt() method.

Something to consider: A macro like write_size_hint!() that takes a format string and arguments, just like write!(), does, and generates a SizeHint from it. You'd need to keep the format string in sync with the write!() call but it would make implementing size_hint() easier in most cases.

SizeHint {
min: self.min + other.min,
max: match (self.max, other.max) {
(Some(left), Some(right)) => Some(left + right),

This comment has been minimized.

@lilyball

lilyball Jan 14, 2015

Contributor

This implementation doesn't handle overflow. It needs to look something like

SizeHint {
    min: self.min.saturating_add(other.min),
    max: if let (Some(left), Some(right)) = (self.max, other.max) {
        Some(left.checked_add(right))
    } else {
        None
    }
}

This comment has been minimized.

@seanmonstar

seanmonstar Jan 14, 2015

Author Contributor

Ah yes, will adjust.

@lilyball

This comment has been minimized.

Copy link
Contributor

lilyball commented Jan 14, 2015

Offhand, I am in favor of making formatting strings more efficient, especially if that makes "foo".to_string() more efficient.

@seanmonstar

This comment has been minimized.

Copy link
Contributor Author

seanmonstar commented Jan 14, 2015

That's exactly what I had thought with the range: if the range is large, it
might perform faster picking somewhere in the middle.

I looked at Iters size_hint for design, but they're a little different.
Iters hint changes depending on the position of the Iter. Also, fmt can
have nested calls, so those must be added.

@kballard perhaps debug_asserts could be added to ensure at the end that
the string is within the bounds, to catch when you forget to change both?

On Wed, Jan 14, 2015, 11:12 AM Kevin Ballard notifications@github.com
wrote:

A drawback to this proposal: all manual implementations of fmt traits now
have a second method to implement that must be kept in sync with the fmt()
method.

Something to consider: A macro like write_size_hint!() that takes a
format string and arguments, just like write!(), does, and generates a
SizeHint from it. You'd need to keep the format string in sync with the
write!() call but it would make implementing size_hint() easier in most
cases.


Reply to this email directly or view it on GitHub
#583 (comment).

}
```

Add a `SizeHint` type, with named properties, instead of using tuple indexing. Include an `Add` implementation for `SizeHint`, so they can be easily added together from nested properties.

This comment has been minimized.

@alexcrichton

alexcrichton Jan 14, 2015

Member

This is an interesting deviation from the size hints of iterators. I would personally expect the two to have the same signature, and this may want to at least discuss the discrepancy between the two return values.

This comment has been minimized.

@seanmonstar

seanmonstar Jan 14, 2015

Author Contributor

It started because I wanted to implement Add, and it felt odd to impl on such a generic tuple.


# Drawbacks

I can't think of a reason to stop this.

This comment has been minimized.

@alexcrichton

alexcrichton Jan 14, 2015

Member

There's always at least a drawback of "this is extra backwards-compatible work before 1.0" as this RFC will require both implementation effort as well as design/review effort to push through.

test bench_short_memcpy ... bench: 33 ns/iter (+/- 3)
```

# Detailed design

This comment has been minimized.

@alexcrichton

alexcrichton Jan 14, 2015

Member

I think this section is currently missing details about how this is actually going to be implemented in relation to format_args! unless we're only optimizing the .to_string() case. For example, these two calls are fairly different in the paths that they take:

foo.to_string();
format!("{}", foo);

In the first case, we call to_string on the ToString trait which has a reference to the concrete type T, allowing an invocation of size_hint to preallocate the String. In the latter case we call std::fmt::format with an Arguments structure. In Arguments the type T has been erased, and there is not an obvious location to call .size_hint().

I would be curious to see more details about how this is handled and how far these size hints are being propagated.

This comment has been minimized.

@pczarn

pczarn Feb 5, 2015

The format_args! macro can get size hints of all template pieces and pass the sum to Arguments. I think it's not worthwhile. format! is often used for making short-lived strings, for which size hints would only add considerable bloat. In other cases, better write! to a preallocated buffer or call shrink_to_fit afterwards.

This comment has been minimized.

@seanmonstar

seanmonstar Feb 6, 2015

Author Contributor

@pczarn the benchmarks say differently. Even smaller strings receive a noticeable improvement.

@lilyball

This comment has been minimized.

Copy link
Contributor

lilyball commented Jan 15, 2015

I've got a different approach that does not change the API of Show or String (or anything else that third-party code is expected to implement). It's focused around optimizing the case of "foo".to_string(). The approach here is to add a parameter to std::fmt::Writer::write_str() called more_coming that is true if the caller believes it will be calling write_str() again. The flag is tracked in the Formatter struct so calls to write!() inside an implementation of Show/String don't incorrectly claim there will be no writes if their caller still has more to write. Ultimately, the flag is used in the implementation of fmt::Writer on String to use reserve_exact() if nothing more is expected to be written.

With this approach I'm seeing similar numbers for your to_string() benchmarks that I get with your code (once I re-enable the call to shrink_to_fit() that you commented out in your code). The downside is the "nested" benchmark ends up about 3% slower with my code compared to the stdlib version, although I haven't tried to figure out why yet.

test test::bench_long          ... bench:        64 ns/iter (+/- 3)
test test::bench_long_memcpy   ... bench:        36 ns/iter (+/- 3)
test test::bench_long_stdlib   ... bench:       110 ns/iter (+/- 5)
test test::bench_med           ... bench:        53 ns/iter (+/- 3)
test test::bench_med_memcpy    ... bench:        27 ns/iter (+/- 2)
test test::bench_med_stdlib    ... bench:        96 ns/iter (+/- 5)
test test::bench_nested        ... bench:       211 ns/iter (+/- 11)
test test::bench_nested_stdlib ... bench:       204 ns/iter (+/- 21)
test test::bench_short         ... bench:        53 ns/iter (+/- 3)
test test::bench_short_memcpy  ... bench:        28 ns/iter (+/- 1)
test test::bench_short_stdlib  ... bench:        76 ns/iter (+/- 1)
@seanmonstar

This comment has been minimized.

Copy link
Contributor Author

seanmonstar commented Jan 15, 2015

@kballard @alexcrichton I've updated the RFC with your comments.

@lilyball

This comment has been minimized.

Copy link

lilyball commented on text/0000-fmt-size-hint.md in 5e0ad26 Jan 15, 2015

Well, it focuses on improvements only when the formatting operation ends up making a single call to write_str() on the fmt::Writer. Though it does have the potential for better edge case behavior, e.g. when the final format piece causes the fmt::Writer to reallocate, my approach allows it to allocate the exact size.

Ultimately, the idea is that, if the format destination is going to shrink_to_fit(), then we want to avoid overallocating in the first place if we can do that without requiring advance knowledge of the sizes of subsequent format pieces.

@mahkoh

This comment has been minimized.

Copy link
Contributor

mahkoh commented Jan 16, 2015

Please benchmark number formatting. rust-lang/rust#19218

value: &'a Void,
formatter: fn(&Void, &mut Formatter) -> Result,
hint: fn(&Void) -> SizeHint,
}

This comment has been minimized.

@alexcrichton

alexcrichton Jan 16, 2015

Member

It may be worth profiling the impact of this representation as it's inflating the size from 2 words to 3 words. Formatting is normally not necessarily on hot code paths, but it does affect code size and stack size and we've had to optimize it in the past. In theory an Argument is just a trait object so the formatter/hint pair could point to a "vtable" which could just be constructed manually instead of as a trait object itself.

Regardless though, it's a pretty minor point, just something to think about :)

This comment has been minimized.

@seanmonstar

seanmonstar Jan 16, 2015

Author Contributor

I was thinking about this as well. Currently, any function can be passed here, but if we didn't want that flexibility, we could create an enum Format { Show, String, LowerHex, ... } and change Argument to be struct Argument { formatter: Format, value: &Void }.

This comment has been minimized.

@alexcrichton

alexcrichton Jan 16, 2015

Member

At some point we do need to force rustc to monomorphize an implementation of Show as how would you actually call the fmt function given just formatter and value? (e.g. a function pointer needs to be somewhere)

This comment has been minimized.

@seanmonstar

seanmonstar Jan 16, 2015

Author Contributor

I imagined being able to look them up from the enum:

impl Format {
    fn get_fns(&self) -> (fn(&Void, &mut Formatter) -> Result, fn(&Void) -> SizeHint) {
        match *self {
            Format::Show => (Show::fmt, Show::size_hint),
            // ...
        }
    }
}

This comment has been minimized.

@alexcrichton

alexcrichton Jan 16, 2015

Member

You'll need type inference to drive inference for the Self type of Show::fmt:

use std::fmt::Show;

fn main() {
    let f = Show::fmt;
}
foo.rs:4:13: 4:22 error: type annotations required: cannot resolve `_ : core::fmt::Show`
foo.rs:4     let f = Show::fmt;
                     ^~~~~~~~~
foo.rs:4:13: 4:22 note: required by `core::fmt::Show::fmt`
foo.rs:4     let f = Show::fmt;
                     ^~~~~~~~~
error: aborting due to previous error
@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jan 16, 2015

Thanks for the additions @seanmonstar! Could you also add a part explicitly saying that everything will be #[unstable] to start out with? In theory that should allow us to immediately improve "foo".to_string() while preventing locking us in to SizeHint immediately as defining your own size_hint would be unstable by default.

@reem

This comment has been minimized.

Copy link

reem commented Jan 16, 2015

I've also wondered if we could just add length hinting methods to fmt::Formatter, like hint_size(&mut self, usize) which avoids changing the trait at all.

@seanmonstar

This comment has been minimized.

Copy link
Contributor Author

seanmonstar commented Jan 16, 2015

@reem By the time you can call that, the buffer has already been allocated. And if the object has properties that are also going to be formatted, those properties can't tell the Formatter their hint until part way through the formatting.

@seanmonstar

This comment has been minimized.

Copy link
Contributor Author

seanmonstar commented Jan 16, 2015

@alexcrichton I added unstable attributes to the struct and method.

```rust
impl<T: fmt::String> ToString for T {
fn to_string(&self) -> String {
format!("{}", self)

This comment has been minimized.

@seanmonstar

seanmonstar Jan 17, 2015

Author Contributor

I started implementing this, and ran into that the format macro is defined in libstd, and this trait is in libcollections. The reason it's in std is because it needs String to use as a buffer.

What if I moved this macro lib collections, and had it call String::format instead of std::fmt::format?

This comment has been minimized.

@alexcrichton

alexcrichton Jan 18, 2015

Member

I think it's fine to define the format! macro in libcollections and then use macro_reexport in the standard library (like it does for vec! already)

This comment has been minimized.

@kmcallister

kmcallister Feb 4, 2015

Contributor

That is exactly my plan, c.f. rust-lang/rust#21912 and #810.

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Feb 5, 2015

@seanmonstar

  1. Could you add "all manual implementations of fmt traits now have a second method to implement that must be kept in sync with the fmt() method." to the Drawbacks section, as mentioned by @kballard above? (At least, as I understand it, your choice is either to inherit the default impl, which is always correct but maximally inefficient, or to maintain this second method.)
  2. Would it be possible for you to add number formatting to your benchmark suite, as requested by @mahkoh above? (See also rust-lang/rust#19218 )

Though really, I don't think either of these modifications really needs to block this RFC. I'll try to ping @alexcrichton about getting it in front of the core team.

@seanmonstar

This comment has been minimized.

Copy link
Contributor Author

seanmonstar commented Feb 6, 2015

@pnkfelix I added the drawback. Regarding numbers, I imagine a minor improvement, since this reduces the number of allocations for the buffer. However, it seems the slowness in formatting numbers has more to do with that implementation, no?

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Feb 6, 2015

@seanmonstar yeah probably

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Mar 5, 2015

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Mar 10, 2015

Hm I've been thinking about this recently, and I'm somewhat worried about the code size implications here. For example the size_hint on an iterator is almost never a concern because it's rarely used in a trait object so we can eliminate those that are never used.

With formatting, however, trait objects are its bread and butter, so we're forced to keep all size_hint implementations as well as have a little extra code to move them into place during format_args!. Could you do some analysis about the code size implications here? Some specific questions I have are:

  • What's the impact on the format_args! macro for moving size_hint onto the stack and passing it?
  • What's the average impact for a crate that just uses #[derive(Debug)]? Presumably these size_hint implementations cannot be optimized away and it's quite a common trait to implement.

I don't think that this will turn up any showstoppers, but I'd like to have a handle on what we're getting into. Otherwise I think that this is basically good to go so I'll see what we can do to get it merged soon afterwards.

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Apr 3, 2015

@rprichard

This comment has been minimized.

Copy link

rprichard commented Apr 12, 2015

The RFC's current size_hint function does not know the formatting parameters, so presumably it must assume they are default. For a str, the width and precision can truncate or extend the output.

If there are format parameters, then the size_hint function for Arguments defined in the RFC can produce an incorrect hint, AFAICT. Either the lower or upper bound can be wrong. It can be fixed by returning (0, None) if Arguments::fmt is Some(..).

I wonder if size_hint should have a parameter giving it the format parameters. It could have &Formatter type or perhaps some new struct type.

I'm inclined to think the size hint should be Option<usize>, because as the RFC mentions, it's not clear how to use both bounds.

@pczarn

This comment has been minimized.

Copy link

pczarn commented Apr 12, 2015

I tried to sum size hints in the expansion of format_args!. Most libraries were bloated by a few %. I think doing it with dynamic dispatch could be a slowdown with less bloat. Implementing size_hint for Arguments is not worth it either way. Take a look at rust-lang/rust#16544.

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Apr 15, 2015

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Apr 15, 2015

@seanmonstar do you have any updates on the impact on code size here?

@seanmonstar

This comment has been minimized.

Copy link
Contributor Author

seanmonstar commented Apr 15, 2015

I do not. The branch I started is several months old, and I haven't had time to rebase. @rprichard raises some points worth considering, though.

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented May 27, 2015

Note: this RFC entered its Final Comment Period as of Yesterday; 6 days remain before a final decision will be made.

@bluss

This comment has been minimized.

Copy link

bluss commented May 30, 2015

Don't make the decision until the discussion about size_hint deprecation for iterators is done. It would be silly to tag along on a deprecated name, if it goes that way.

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented May 30, 2015

Accepting this RFC does not require setting the name in stone. Though we all love to bikeshed names, I think we can all agree they're not really the most important part of this proposal.

Implementing this of course takes time, and landing it in stable even more.

@seanmonstar

This comment has been minimized.

Copy link
Contributor Author

seanmonstar commented Jun 1, 2015

Another alternative would be instead of including this method, for fmt to use a thread-local LruCache of sizes when formatting types. This would result in the first case not getting the speed benefit, but would mean that all implementations of a fmt trait can use a size hint, without requiring the size_hint method to be kept in sync with the implementation.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jun 2, 2015

Having now gone through the final comment period, it's time to make a decision on this RFC. For now I'm going to close this RFC for the following reasons:

Overall it seems best to start this RFC fresh again with a new look beyond microbenchmarks to the statistics measured here, and also perhaps await the outcome of #1034. Thanks regardless for the RFC @seanmonstar!

@bluss

This comment has been minimized.

Copy link

bluss commented Jun 2, 2015

like gankro said, the name is a non-issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.