Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fmt: add Append, Appendf, Appendln #47579

Closed
seebs opened this issue Aug 6, 2021 · 59 comments
Closed

fmt: add Append, Appendf, Appendln #47579

seebs opened this issue Aug 6, 2021 · 59 comments

Comments

@seebs
Copy link
Contributor

@seebs seebs commented Aug 6, 2021

What version of Go are you using (go version)?

1.16

Does this issue reproduce with the latest release?

Sure.

What operating system and processor architecture are you using (go env)?

N/A

What did you do?

Profiled something that was using Sprintf in a minor way.

What did you expect to see?

Less Sprintf in the profile.

What did you see instead?

So much Sprintf.

The issue here is that most of what's happening is WAY cheaper than allocations. strconv has AppendInt, but that can't do formatting.

It makes sense, for most uses, that fmt.Sprintf is an allocator that produces a string, but there are times when you want formatted-output, but want to write into a []byte. And you can do that with bytes.Buffer and fmt.Fprintf, but creating a bytes.Buffer around a []byte is... an allocation. And fmt.Fprintf does an allocation. (Curiously, if I do fmt.Fprintf into a bytes.Buffer I just made for that purpose, I only get the one allocation.)

What I want: Something like Sprintf, but that can write into a []byte, and can fail gracefully if there's not enough space to write things.

Proposed name: fmt.Snprintf, because snprintf is what you call when you already have a buffer you want written to and you have a length limit in mind.

So, fmt.Snprintf(dest []byte, fmt string, args ...interface{}) (int, error), perhaps. The C standard's answer to "what if n isn't big enough" is "you report the n you would have used if n had been big enough", which exists to allow a single-pass process to figure out how much space you actually need. Alternatively, it could return number of bytes actually written, and if it didn't fit, an error with a concrete type that indicates space needed.

I note, browsing source, that fmt already has fmtBytes, although this doesn't do quite the thing this would need.

@gopherbot gopherbot added this to the Proposal milestone Aug 6, 2021
@ianlancetaylor ianlancetaylor changed the title proposal: Sprintf-into-bytes proposal: fmt: Sprintf-into-bytes Aug 6, 2021
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 6, 2021

Edited to add: this doesn't work, as the fmt package pools buffers and will reuse the byte slice.

You could do this instead of adding new API;

package x_test

import (
	"bytes"
	"fmt"
	"testing"
)

type sliceStealer struct {
	b []byte
}

func (ss *sliceStealer) Write(s []byte) (int, error) {
	if len(ss.b) > 0 {
		panic("multiple calls to Write")
	}
	ss.b = s
	return len(s), nil
}

func FprintfToBytes(format string, a ...interface{}) []byte {
	var ss sliceStealer
	fmt.Fprintf(&ss, format, a...)
	return ss.b
}

func TestFprintf(t *testing.T) {
	s := FprintfToBytes("%d", 10)
	if !bytes.Equal(s, []byte("10")) {
		t.Errorf(`FprintfToBytes("%%d", 10) = %q, want %q`, s, "10")
	}
}

func BenchmarkFprintf(b *testing.B) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		FprintfToBytes("%d", 10)
	}
}

@DeedleFake
Copy link

@DeedleFake DeedleFake commented Aug 6, 2021

~~It looks like an allocation can be eliminated by creating a simpler type locally instead of using bytes.Buffer. Does the allocation with bytes.Buffer happen because of the return of a pointer from inside of bytes.NewBuffer()?

Edit: Ah, darn. @ianlancetaylor beat me to it.~~

Edit 2: Never mind. See #47579 (comment) below.

@seebs
Copy link
Contributor Author

@seebs seebs commented Aug 7, 2021

Is there any guarantee that Fprintf does only one write? Also, while that allows me to steal the buffer fprintf used, it doesn't let me write into an existing buffer that I have, which is the really interesting case. For instance, to put a 0-padded value in filename[5:9] or something. The distinction being, with that, BenchmarkFprintf could report 0 allocs/op, theoretically.

So right now, I don't think it's possible to loop on Fprintf without at least one allocation per call...

@josharian
Copy link
Contributor

@josharian josharian commented Aug 7, 2021

Also, while that allows me to steal the buffer fprintf used, it doesn't let me write into an existing buffer that I have, which is the really interesting case.

The Write method of your io.Writer could copy the buffer's contents into your existing buffer.

It does mean an extra copy of the data, which is cheap but not free. In any case that should let you avoid the allocation.

Also, I'm not sure the slice stealer is safe. IIRC package fmt pools its buffers, so the buffer passed to Write will be used again by subsequent fmt work.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 7, 2021

Also, I'm not sure the slice stealer is safe. IIRC package fmt pools its buffers, so the buffer passed to Write will be used again by subsequent fmt work.

Good point.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 7, 2021

fmt.Snprintf(dest []byte, fmt string, args ...interface{}) (int, error)

I think we can simplify the issues about exceeding the size of the buffer by making this

fmt.Snprintf(dest []byte, fmt string, args ...interface{}) ([]byte, error)

That is, just return the byte slice, which will be a new slice if the original cap is not large enough. People who need to limit the size of the output, which I suspect is less common, can use Fprintf with a custom Writer. (But with this modification Snprintf is not the right name.)

As it happens, I think this might be fairly simple to implement: replace the pp.buf field with the passed in buffer, and then remove it and restore the original pp.buf field before releasing the pp.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Aug 7, 2021
@josharian
Copy link
Contributor

@josharian josharian commented Aug 7, 2021

If Sprintf is string-printf, then it seems to me that bytes-printf should be Bprintf.

@seebs
Copy link
Contributor Author

@seebs seebs commented Aug 8, 2021

Bprintf would also make sense to me. Hmm. "if cap isn't large enough, then allocate" seems like a reasonable choice, but I don't see a pretty/clean way to express "if this didn't fit in the space I provided". I guess something like if &provided[0] != &returned[0] maybe? (This isn't quite safe in all theoretical cases, because the returned len could be 0, but if the returned len is 0, I don't have to worry about the allocation, do I?)

I'd thought about creating a thing which just copies into an existing buffer, but then fmt's still allocating -- but if it's pooling those, I guess I care less about that than I thought I did.

got it! fmt.Apprendf

@josharian
Copy link
Contributor

@josharian josharian commented Aug 8, 2021

I don’t think the function would return an error; fmt functions only return errors when writes fail. So the signature would be

func Bprintf(dest []byte, format string, args ...interface{}) []byte

@josharian
Copy link
Contributor

@josharian josharian commented Aug 8, 2021

I don't see a pretty/clean way to express "if this didn't fit in the space I provided".

How about cap(provided) != cap(returned)? Can’t grow without modifying cap.

@seebs
Copy link
Contributor Author

@seebs seebs commented Aug 8, 2021

Yeah. I just dislike the lack of a way to express "is this a different thing" separately from "does this thing have different characteristics". And I suppose I could imagine a case where the desired outcome would be "if for some reason it doesn't fit, fail rather than allocating", but that feels like a different thing.

@tdakkota
Copy link

@tdakkota tdakkota commented Aug 9, 2021

@DeedleFake I think your benchmark is not correct

        var buf [128]byte
	allocs := testing.AllocsPerRun(128, func() {
		fmt.Fprintf(bytes.NewBuffer(buf[:0]), "This is a test.") // buf[:] -> buf[:0]
	})
	fmt.Println(allocs)

makes only one allocation (bytes.Buffer leaks to heap).

fmt.Fprintf(bytes.NewBuffer(buf[:]), "This is a test.")

does not re-use buf .
It creates a new 128-byte buffer and appends This is a test..

@DeedleFake
Copy link

@DeedleFake DeedleFake commented Aug 9, 2021

@tdakkota

You're right, I don't know how I missed that. Whoops.

@bcmills
Copy link
Member

@bcmills bcmills commented Aug 9, 2021

@ianlancetaylor

fmt.Snprintf(dest []byte, fmt string, args ...interface{}) ([]byte, error)

I'm confused by this signature. Does the proposed function append to dest (as the strconv.Append* functions do) up to cap(dest), or does it write to the slice from 0 to len(dest) and truncate output if the length is too small (as copy does), or does it do a third thing (like writing from 0 to cap(dest) and returning the final length)?

If it is like copy or strconv.Append*, how is that different from using a bytes.Buffer today, perhaps with a LimitedWriter as discussed in #17688 (comment)?

If it is identical except for allocation behavior, would it be better to improve the compiler's inlining and escape analysis instead? Then the only new API surface we would need is io.LimitedWriter, which seems much more generally useful than just a new fmt function.

(https://play.golang.org/p/aBpCGXDSmmw)

@bcmills
Copy link
Member

@bcmills bcmills commented Aug 9, 2021

@josharian

If Sprintf is string-printf, then it seems to me that bytes-printf should be Bprintf.

Sprintf is “print to a string”, which must return the string because Go strings are immutable.

Since slices are mutable, I would expect a function named Bprintf to print to the passed-in slice and return the number of bytes “printed”, in the same way that Printf and Fprintf do:

// Bprintf prints up to len(dst) bytes to dst, returning the number of bytes printed.
// If the output does not fit within len(dst), Bprintf returns len(dst), io.ErrShortBuffer.
func Bprintf(dst []byte, format string, args ...interface{}) (int, error)

For a function with the meaning “append the formatted bytes to the passed-in slice”, I would expect the name Appendf, because “appending” is not “printing”.

// Appendf appends the formatted args to dst and returns the extended buffer.
func Appendf(dst []byte, format string, args ...interface{}) []byte

@rsc
Copy link
Contributor

@rsc rsc commented Aug 11, 2021

fmt.Fprintf guarantees to do a single write to the writer.
Why is it not good enough to pass a bytes.Buffer (or your own custom writer) to Fprintf?

@rsc
Copy link
Contributor

@rsc rsc commented Aug 11, 2021

/cc @robpike

@rsc rsc moved this from Incoming to Active in Proposals Aug 11, 2021
@rsc
Copy link
Contributor

@rsc rsc commented Aug 11, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@robpike
Copy link
Contributor

@robpike robpike commented Aug 11, 2021

I agree with @rsc's comment that Fprintf already provides enough flexibility.

@rsc
Copy link
Contributor

@rsc rsc commented Aug 18, 2021

It seems like we are still waiting on an answer to why fmt.Fprintf into an appropriate appending writer is not good enough.

@tdakkota
Copy link

@tdakkota tdakkota commented Aug 19, 2021

Passing a byte slice using bytes.Buffer as io.Writer causes allocation due to type-to-interface conversion, but fmt.Appendf([]byte, ...) []byte will not.

Also Appendf does not need fmt pools buffers at all.

@clausecker
Copy link

@clausecker clausecker commented Aug 22, 2021

Independently of the escape analysis issue, I like the idea of a Bprintf function for convenience.

@josharian
Copy link
Contributor

@josharian josharian commented Aug 22, 2021

It'd also help with buffer re-use. Right now there's no way to get the slice back out of a bytes.Buffer when you're done with it, which makes it hard to re-use.

@rsc rsc moved this from Active to Likely Accept in Proposals Sep 22, 2021
@rsc
Copy link
Contributor

@rsc rsc commented Sep 22, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@seebs
Copy link
Contributor Author

@seebs seebs commented Sep 23, 2021

I don't object to the spelled-out names. That said, I do like the option of a limited write, even though I only sometimes want it.

Perhaps something like
func AppendPrintfN(dst []byte, int n, format string, a ...interface{}) ([]byte, int)

to allow a specified overall maximum number of bytes written, even if that means truncating.

... except actually, come to think of it, n is useless here because we could just use the buffer's length. But there's no good way to distinguish between "I want the usual append semantics" and "I would actually like this to truncate and stop".

The utility of the "truncate at this point" is when the slice you pass it is in the middle of a larger slice, mostly. It's not that you're writing at the end, it's that you specifically want the bytes to go into this one location here. On the other hand, I suppose you can check whether the returned slice fits, and if not, copy parts of it back in, if you really want that, and it's a rare case, or should be.

@tdakkota
Copy link

@tdakkota tdakkota commented Sep 23, 2021

AppendPrintfN

Maybe FillPrintf?

@seebs
Copy link
Contributor Author

@seebs seebs commented Sep 23, 2021

So, thinking about it more, these feel like two slightly different use cases. One is that I know where I want something, and I'm handing out a slice that may be in the middle of a larger slice, and it's absolutely useless to me to allocate-and-write-elsewhere. The other is just the standard append behavior.

The case I was originally thinking of was closer to a LimitWriter or something, where there was a definite specific limit and if the data can't fit, I'd rather get an error of some sort than get the data put somewhere else. Like, the concern isn't just "I want to avoid allocation", it's "I want the data written to be in exactly this location which is probably next to other data that it's going to be used with".

I think Append implies "first byte written happens at dst[len(dst)], and if cap(dst) isn't larger, that means an allocation", and Fill implies "bytes can be written only into dst[0:len(dst)]. So, Fill is more like what an io.Reader does with a slice, rather than like an Append operation. It's true that other APIs that append always expand, but I think on reflection that AppendF` may not be the right name for what I'm suggesting.

Possibly fmt.WritePrintf(dst []byte, ...) and fmt.AppendPrintf(dst []byte...), where WritePrintf writes to dst[0:len(dst)] only, and does not allocate or go further, and starts at 0 rather than appending. Or possibly a LimitedWriter, or a bytes.FixedBuffer.

Oh!

fmt.CopyF(dst, ...)

That sort of makes sense to me. copy is the thing which writes to dst until it runs out of space, then stops.

@seebs
Copy link
Contributor Author

@seebs seebs commented Sep 23, 2021

I think I've got a better illustration of part of my concern about an append-semantics thing for the use case I had in mind. If I'm trying to overwrite part of an existing slice, and the appender decides it's going to grow the slice, it has now created a new slice.

So what happens to the original storage that I handed it? Does anything? Does it write some of the bytes into the buffer I had available? Does it always write bytes into that buffer until it runs out of room, and only then allocate? Does it reallocate and stop writing into that buffer as soon as it can tell that it will need more room?

Using a quick and dirty faked AppendPrintf implementation of one sort:

https://play.golang.org/p/PSwABNdOcUP

In this case, the answer is that if the new data won't fit, it doesn't modify the existing buffer at all. But also, there's no way for me to express the "but stop at this point" part of the write; I have to use [4:4] because otherwise append will start after the existing material. I guess I could use [4:4:N] where N is the point I want it to stop at. So I could do something like [4:4:12] to express that I want to write at most 8 characters -- but then it can actually end up not copying anything in, which turns out not to be what I want either.

For instance, I could have imagined that the last case would actually write the year and month into the field, and then leave the rest of the data untouched because it could, at that point, tell that it had run out of room and needed to grow.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 23, 2021

For a limited write to an existing []byte, you can use a LimitedWriter as @bcmills suggests in #47579 (comment). Do we need fmt support for that? It seems like a less common case.

@seebs
Copy link
Contributor Author

@seebs seebs commented Sep 23, 2021

I think LimitedWriter is a better conceptual fix, although I guess it'd need to actually get made. I sort of want a bytes.Writer parallel to bytes.Reader, for the same reason, but I think that either of those, plus Fprintf, is probably as good a fit as a hypothetical CopyPrintf would be, and may be easier to generalize from.

I do think, though, that I would probably prefer that AppendPrintf make an explicit statement about what if anything will have been done to the previous buffer if it does in fact allocate. Even if the statement is "it is unspecified". But otherwise I think that some people will assume that it will have written right up to the end, and some will assume it will have changed nothing if it ended up allocating, and then they'll be sad. And they'll blame me because it was my proposal. cries

@as
Copy link
Contributor

@as as commented Sep 24, 2021

What is the problem with just calling this:

func Append(dst []byte, a ...interface{}) []byte
func Appendf(dst []byte, format string, a ...interface{}) []byte
func Appendln(dst []byte, a ...interface{}) []byte

Or

func Bprint(dst []byte, a ...interface{}) []byte
func Bprintf(dst []byte, format string, a ...interface{}) []byte
func Bprintln(dst []byte, a ...interface{}) []byte

Are the words Print and Append really adding that much clarity that they must be used together in the name?

@akavel
Copy link
Contributor

@akavel akavel commented Sep 25, 2021

Personally, fmt.Append and fmt.Appendf do indeed look lovingly sweet to me: the fmt. prefix seems to be clarifying enough already, and the name seems to also fit with the other "modern" addition already in the package, fmt.Errorf. And on the other hand it's as clearly as possible alluding to the builtin append — basically, what it means to me is: "like builtin append, but with fmt magic on top"... which seems to perfectly encapsulate exactly what it does.

edit: On top of this, in fact fmt.AppendPrint is sincerely harder for me to understand. More meanings clumped together, more to decode, analyze and understand. I was somewhat confused as to whether I really get what's going on in this PR, until I finally landed on the fmt.Append suggestion, which was the a-ha moment for me: it's just like builtin append + fmt "magic", nothing more, nothing less.

@rsc
Copy link
Contributor

@rsc rsc commented Oct 6, 2021

I do think, though, that I would probably prefer that AppendPrintf make an explicit statement about what if anything will have been done to the previous buffer if it does in fact allocate. Even if the statement is "it is unspecified".

I'm quite confident that's what the statement will be - it is unspecified.

@rsc rsc moved this from Likely Accept to Accepted in Proposals Oct 6, 2021
@rsc
Copy link
Contributor

@rsc rsc commented Oct 6, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: fmt: add AppendPrint, AppendPrintf, AppendPrintln fmt: add AppendPrint, AppendPrintf, AppendPrintln Oct 6, 2021
@rsc rsc removed this from the Proposal milestone Oct 6, 2021
@rsc rsc added this to the Backlog milestone Oct 6, 2021
@mvdan
Copy link
Member

@mvdan mvdan commented Oct 6, 2021

Was the point about shorter func names missed, perhaps? #47579 (comment)

@robpike
Copy link
Contributor

@robpike robpike commented Oct 6, 2021

I am a fan of Append, Appendf, and Appendln.

@robpike robpike self-assigned this Feb 22, 2022
@rsc rsc removed this from the Backlog milestone Feb 22, 2022
@rsc rsc added this to the Go1.19 milestone Feb 22, 2022
@rsc rsc changed the title fmt: add AppendPrint, AppendPrintf, AppendPrintln fmt: add Append, Appendf, Appendln Feb 22, 2022
@rsc
Copy link
Contributor

@rsc rsc commented Feb 22, 2022

Spoke to @robpike about this and I agree that Append, Appendf, Appendln fit better with fmt's overall API.
Rob is going to look into this for Go 1.19.

@gopherbot
Copy link

@gopherbot gopherbot commented May 15, 2022

Change https://go.dev/cl/406177 mentions this issue: fmt: add Append, Appenln, Appendf

@gopherbot
Copy link

@gopherbot gopherbot commented May 16, 2022

Change https://go.dev/cl/406357 mentions this issue: go/gcexportdata: don't assume that fmt names never change

gopherbot pushed a commit to golang/tools that referenced this issue May 17, 2022
For golang/go#47579

Change-Id: I25a873fb6da216d885c8faefda98c7fe027b6a4f
Reviewed-on: https://go-review.googlesource.com/c/tools/+/406357
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Rob Pike <r@golang.org>
gopls-CI: kokoro <noreply+kokoro@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Development

No branches or pull requests