Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash, crypto: add WriteByte, WriteString method to hash implementations #38776

Open
geraldss opened this issue Apr 30, 2020 · 23 comments
Open

hash, crypto: add WriteByte, WriteString method to hash implementations #38776

geraldss opened this issue Apr 30, 2020 · 23 comments

Comments

@geraldss
Copy link

@geraldss geraldss commented Apr 30, 2020

This proposal was initially for embedding io.ByteWriter in hash.Hash, or adding a WriteByte() method with the same signature.

This method is already added in the new maphash.Hash. Adding it elsewhere will extend the benefits in performance and usability to the other Hash implementations.

Per feedback of @ianlancetaylor below, I'm instead proposing the addition WriteByte() from io.ByteWriter to the standard library hash.Hash implementations, including:

adler32
crc32
crc64
fnv

@gopherbot gopherbot added this to the Proposal milestone Apr 30, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 30, 2020

Unfortunately, the proposed change would not be backward compatible. It would mean that existing types that satisfy the hash.Hash interface would no longer implement the interface. That could break working code, and violates the Go 1 compatibility guarantee (https://golang.org/doc/go1compat).

So, in short, we can't do this.

Loading

@geraldss
Copy link
Author

@geraldss geraldss commented Apr 30, 2020

Got it. How about adding another interface, or just adding the WriteByte() method to the standard library's hash implementations?

Hashing is sometimes part of performance-sensitive code paths, and it would be beneficial to avoid conversions to byte slices whose only purpose is to satisfy the API.

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 30, 2020

I'm not sure we need another interface, since people can always do a type assertion to io.ByteWriter.

Do you want to repurpose this proposal for adding WriteByte methods to various hash implementations?

Loading

@geraldss geraldss changed the title proposal: Add io.ByteWriter to hash.Hash proposal: Add io.ByteWriter to hash.Hash implementations Apr 30, 2020
@ulikunitz
Copy link
Contributor

@ulikunitz ulikunitz commented May 1, 2020

How does performance benefit? My experience with WriteByte is that it is slower than appending to a byte slice and use the classic Write method every 256 or 512 bytes.

Loading

@geraldss
Copy link
Author

@geraldss geraldss commented May 1, 2020

The performance benefit isn't for hashing byte slices. It's for hashing everything else: primitives, structs, maps, arrays, and combinations thereof.

Loading

@ulikunitz
Copy link
Contributor

@ulikunitz ulikunitz commented May 1, 2020

Can you provide some example code helping me to understand your statement?

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Jun 10, 2020

Usually hashes can operate much faster on a block of data than a single byte at a time.
One potential problem with adding WriteByte is that using it would be inherently slower
than passing in a larger slice of data.

What is the use case where WriteByte would be preferable over constructing a (presumably larger than one byte) slice and calling Write?

Loading

@rsc rsc added this to Incoming in Proposals Jun 10, 2020
@rsc rsc changed the title proposal: Add io.ByteWriter to hash.Hash implementations proposal: hash, crypto: add WriteByte method to hash implementations Jun 10, 2020
@geraldss
Copy link
Author

@geraldss geraldss commented Jun 10, 2020

type T struct {
    A byte
    B string
    C byte
    D string
}

func HashT(h hash.Hash, t *T) { ... }

To implement HashT(), it would be convenient if there were no conversions to byte slices. The current option is to use encoding/binary, but that API doesn't express the avoidance of byte slices when it calls a generic io.Writer. Ditto for supporting WriteString().

Loading

@ulikunitz
Copy link
Contributor

@ulikunitz ulikunitz commented Jun 11, 2020

I have combined bufio.Writer and hash.Hash to create a buffered hash

Test here: https://play.golang.org/p/IHx5GcvLW1v

package main

import (
	"bufio"
	"crypto/sha256"
	"fmt"
	"hash"
)

type BufferedHash struct {
	h hash.Hash
	*bufio.Writer
}

func NewBufferedHash(h hash.Hash) *BufferedHash {
	return &BufferedHash{
		h:      h,
		Writer: bufio.NewWriter(h),
	}
}

func (bh *BufferedHash) Sum(p []byte) []byte {
	if err := bh.Flush(); err != nil {
		panic(err)
	}
	return bh.h.Sum(p)
}

func (bh *BufferedHash) Reset() {
	bh.h.Reset()
	bh.Writer.Reset(bh.h)
}

func (bh *BufferedHash) Size() int {
	return bh.h.Size()
}

func (bh *BufferedHash) BlockSize() int {
	return bh.h.BlockSize()
}

type T struct {
	A byte
	B string
	C byte
	D string
}

func HashT(bh *BufferedHash, t T) {
	bh.WriteByte(t.A)
	bh.WriteString(t.B)
	bh.WriteByte(t.C)
	bh.WriteString(t.D)
}

func main() {
	bh := NewBufferedHash(sha256.New())

	t := T{A: 'A', B: "B", C: 'C', D: "D"}
	HashT(bh, t)

	fmt.Printf("hash(%+v): %x\n", t, bh.Sum(nil))
	bh.Reset()

	t = T{A: 'A', B: "B", C: 'C', D: "Dee"}
	HashT(bh, t)
	fmt.Printf("hash(%+v): %x\n", t, bh.Sum(nil))
}

Loading

@ulikunitz
Copy link
Contributor

@ulikunitz ulikunitz commented Jun 11, 2020

The proverb "If I Had More Time, I Would Have Written a Shorter Letter" applies here. There is no need for creating an extra type: https://play.golang.org/p/Pp6GVhLpEx_9

package main

import (
	"crypto/sha256"
	"fmt"
	"io"
)

type T struct {
	A byte
	B string
	C byte
	D string
}

func SerializeT(w io.Writer, t T) {
	fmt.Fprintf(w, "%c%s%c%s", t.A, t.B, t.C, t.D)
}

func main() {
	h := sha256.New()

	t := T{A: 'A', B: "B", C: 'C', D: "D"}
	SerializeT(h, t)

	fmt.Printf("hash(%+v): %x\n", t, h.Sum(nil))
	h.Reset()

	t = T{A: 'A', B: "B", C: 'C', D: "Dee"}
	SerializeT(h, t)
	fmt.Printf("hash(%+v): %x\n", t, h.Sum(nil))
}

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Jun 24, 2020

What's the context where you are hashing non-byte-slices with functions like sha256?
If you are building a hash table, hash/maphash is the package to use, and maphash.Hash does have WriteByte.
If you need a well-defined fixed hash function, that's almost always for use with a specific byte sequence.

I suppose the crypto/* hashes all buffer already and the hash/* function all operate byte at a time. But they all still run faster with large sequences.

Loading

@geraldss
Copy link
Author

@geraldss geraldss commented Jun 24, 2020

I'm building a relational database. I understand the reservations about changes / additions, but at high scale and high performance, it's important for APIs to not require avoidable overhead.

Loading

@rsc rsc moved this from Incoming to Active in Proposals Jul 8, 2020
@rsc
Copy link
Contributor

@rsc rsc commented Jul 15, 2020

but at high scale and high performance, it's important for APIs to not require avoidable overhead.

The argument I was trying to make against adding WriteByte is precisely that it really can't be very high performance. Arranging for larger Writes is always going to beat a WriteByte loop. The reservation about provided WriteByte is exactly that it would tempt people toward a less efficient path.

We may still want to add it for convenience, especially for cases that don't care about "high scale and high performance", but I don't think you'd want to use it in your relational database.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Aug 5, 2020

All the hashes have buffers underneath, so they can all implement WriteByte efficiently - well, as efficiently as anyone can implement WriteByte.

It's still more efficient to call Write with many bytes than to call WriteByte in a loop, but given that io.ByteWriter exists, it seems reasonable to make the hash.Hash implementations implement it.
(To be clear, we can't modify hash.Hash itself, as was originally proposed.)

Earlier this year we declined #14757 because the implementation would have to use unsafe, but @bradfitz points out that the buffer that enables WriteByte would also enable a safe implementation of WriteString. So maybe we should add WriteString at the same time, using safe code. (If passed a long string, WriteString would have to copy into the buffer, process the buffer, and repeat. That would still be a bit of copying, but not more than converting to a []byte.)

Will retitle this issue to be WriteByte and WriteString and leave open for another week, but this seems headed for likely accept.

Loading

@rsc rsc changed the title proposal: hash, crypto: add WriteByte method to hash implementations proposal: hash, crypto: add WriteByte, WriteString method to hash implementations Aug 5, 2020
@ulikunitz
Copy link
Contributor

@ulikunitz ulikunitz commented Aug 6, 2020

The premise that all hashes have buffers underneath is not correct. The non-cryptographic hashes in the adler32, crc32, crc64 and fnv packages in the hash directory of the standard library don't have buffers. It is of course possible to implement WriteByte and WriteString for those hashes based on the Write logic.

The cost of the proposal, implement WriteByte and WriteString for all hashes in the standard library, is increased code size and additional test code. Implementation will require the replication of the Write logic in
both new methods unless the methods are implemented as wrappers around Write.

The convenience argument for WriteByte still doesn't convince me. Why is it necessary to add a method to each hash function to do something that will result in slow code. Beginners will still struggle because they
have to know that they need to convert the hash to a ByteWriter and experienced developers will be able to use fmt.Fprintf(w, "%c", c) or write their own wrapper since performance cannot be the concern in that
case.

I wonder whether we should look at the more general problem: How can WriteByte and WriteString be supported for an io.Writer?

One option is to use bufio.Writer as a wrapper. But it complicates the program logic by requiring calls to Flush to ensure all data is written to the underlying writer.

For WriteString there is the io.WriteString function, which has the disadvantage that it allocates a new byte slice and copies the data from string. The package unsafe is probably not used because the requirement
that a Write method should not modify the slice cannot be enforced by the compiler. I suggest to provide an unsafe.WriteString function that assumes that the Writer doesn't modify the slice. It may be used in
cases where performance is critical.

For WriteByte an io.WriteByte convenience function would address the problem. Performance is not a concern here.

Both proposals still allow the implementation of WriteByte and WriteString by hashes, but wouldn't make it mandatory.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Aug 12, 2020

adler32, crc32, crc64, and fnv have no buffer because they are byte-at-a-time algorithms (the chunk size is 1 byte).
They can implement WriteByte and WriteString by calling the single-byte update function.
(That may involve refactoring Write, but inlining should be good enough now to keep from slowing down Write.)

An io.WriteByte convenience function would have to allocate on every byte in the fallback, like io.WriteString allocates on every call (but with many fewer calls in typical cases!). That's too expensive to hide in an innocuous-looking function.

Loading

@ulikunitz
Copy link
Contributor

@ulikunitz ulikunitz commented Aug 13, 2020

Thanks Ross for the response. I agree and I stated already it is possible to implement WriteByte and WriteString for adler32, etc. If a type supports the Write method it is always possible to implement
WriteByte and WriteString regardless whether the Write operation is buffered or not.

While I understand the performance argument for WriteString, I'm still not convinced about WriteByte. What is the actual use case requiring the implementation for all hashes? The original proposal cited the direct marshaling or serialization of a struct value into the hash. But that doesn't convince because the struct may include other types like larger integers and hashes will not support those directly.

There is also the question about consistency for writers in the standard library. After this proposal is implemented all hashes will support WriteByte and WriteString, but os..File supports WriteString but not WriteByte and net.TCPConn supports only Write. Shouldn't there be a general rule for supporting WriteByte and WriteString?

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Aug 26, 2020

% go doc io.ByteWriter
package io // import "io"

type ByteWriter interface {
	WriteByte(c byte) error
}
    ByteWriter is the interface that wraps the WriteByte method.

% go doc io.ByteReader
package io // import "io"

type ByteReader interface {
	ReadByte() (byte, error)
}
    ByteReader is the interface that wraps the ReadByte method.

    ReadByte reads and returns the next byte from the input or any error
    encountered. If ReadByte returns an error, no input byte was consumed, and
    the returned byte value is undefined.

    ReadByte provides an efficient interface for byte-at-time processing. A
    Reader that does not implement ByteReader can be wrapped using
    bufio.NewReader to add this method.

% 

The ByteWriter docs are not very helpful - there's nothing anywhere about what WriteByte means.
It's possible we should say the same in ByteWriter as in ByteReader: if you can implement it efficiently, then it's okay to have one. If not, then not.

These hashes can implement it efficiently enough and so it's probably worth doing.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Aug 26, 2020

Based on the discussion above, this seems like a likely accept.

Loading

@rsc rsc moved this from Active to Likely Accept in Proposals Aug 26, 2020
@ulikunitz
Copy link
Contributor

@ulikunitz ulikunitz commented Aug 31, 2020

Extending the ByteWriter document is surely helpful. The bufio Writer remark must however be modified because it requires the additional use of the Flush method.

I wrote a package that provides a wrioter wrapper returning a Writer having all three Write methods: Write, WriteByte and WriteString. All writes are direct so new flusing is required.

github.com/ulikunitz/xio.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Sep 2, 2020

No change in consensus, so accepted.

Loading

@rsc rsc moved this from Likely Accept to Accepted in Proposals Sep 2, 2020
@rsc rsc removed this from the Proposal milestone Sep 2, 2020
@rsc rsc added this to the Backlog milestone Sep 2, 2020
@rsc rsc changed the title proposal: hash, crypto: add WriteByte, WriteString method to hash implementations hash, crypto: add WriteByte, WriteString method to hash implementations Sep 2, 2020
@gopherbot
Copy link

@gopherbot gopherbot commented Mar 12, 2021

Change https://golang.org/cl/301189 mentions this issue: crypto/*, hash: add WriteString method to hash.Hash and all the algorithms

Loading

@odeke-em odeke-em self-assigned this Mar 20, 2021
josharian added a commit to tailscale/tailscale that referenced this issue May 11, 2021
The sha256 hash writer doesn't implement WriteString.
(See golang/go#38776.)
As a consequence, we end up converting many strings to []byte.

Wrapping a bufio.Writer around the hash writer lets us
avoid these conversions by using WriteString.

Using a bufio.Writer is, perhaps surprisingly, almost as cheap as using unsafe.
The reason is that the sha256 writer does internal buffering,
but doesn't do any when handed larger writers.
Using a bufio.Writer merely shifts the data copying from one buffer
to a different one.

Using a concrete type for Print and print cuts 10% off of the execution time.

name    old time/op    new time/op    delta
Hash-8    15.3µs ± 0%    11.5µs ± 0%  -24.84%  (p=0.000 n=10+10)

name    old alloc/op   new alloc/op   delta
Hash-8    2.82kB ± 0%    1.98kB ± 0%  -29.57%  (p=0.000 n=10+10)

name    old allocs/op  new allocs/op  delta
Hash-8       140 ± 0%        82 ± 0%  -41.43%  (p=0.000 n=10+10)

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
josharian added a commit to tailscale/tailscale that referenced this issue May 11, 2021
The sha256 hash writer doesn't implement WriteString.
(See golang/go#38776.)
As a consequence, we end up converting many strings to []byte.

Wrapping a bufio.Writer around the hash writer lets us
avoid these conversions by using WriteString.

Using a bufio.Writer is, perhaps surprisingly, almost as cheap as using unsafe.
The reason is that the sha256 writer does internal buffering,
but doesn't do any when handed larger writers.
Using a bufio.Writer merely shifts the data copying from one buffer
to a different one.

Using a concrete type for Print and print cuts 10% off of the execution time.

name    old time/op    new time/op    delta
Hash-8    15.3µs ± 0%    11.5µs ± 0%  -24.84%  (p=0.000 n=10+10)

name    old alloc/op   new alloc/op   delta
Hash-8    2.82kB ± 0%    1.98kB ± 0%  -29.57%  (p=0.000 n=10+10)

name    old allocs/op  new allocs/op  delta
Hash-8       140 ± 0%        82 ± 0%  -41.43%  (p=0.000 n=10+10)

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
josharian added a commit to tailscale/tailscale that referenced this issue May 11, 2021
The sha256 hash writer doesn't implement WriteString.
(See golang/go#38776.)
As a consequence, we end up converting many strings to []byte.

Wrapping a bufio.Writer around the hash writer lets us
avoid these conversions by using WriteString.

Using a bufio.Writer is, perhaps surprisingly, almost as cheap as using unsafe.
The reason is that the sha256 writer does internal buffering,
but doesn't do any when handed larger writers.
Using a bufio.Writer merely shifts the data copying from one buffer
to a different one.

Using a concrete type for Print and print cuts 10% off of the execution time.

name    old time/op    new time/op    delta
Hash-8    15.3µs ± 0%    11.5µs ± 0%  -24.84%  (p=0.000 n=10+10)

name    old alloc/op   new alloc/op   delta
Hash-8    2.82kB ± 0%    1.98kB ± 0%  -29.57%  (p=0.000 n=10+10)

name    old allocs/op  new allocs/op  delta
Hash-8       140 ± 0%        82 ± 0%  -41.43%  (p=0.000 n=10+10)

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
@tdakkota
Copy link

@tdakkota tdakkota commented Sep 1, 2021

Earlier this year we declined #14757 because the implementation would have to use unsafe, but @bradfitz points out that the buffer that enables WriteByte would also enable a safe implementation of WriteString.

We can make a fast safe implementation of WriteString using generics.

func write[T interface{ string | []byte }](d *digest, p T) (int, error) {
          // Write implementation...
}

func (d *digest) Write(p []byte) (int, error) { return write[[]byte](d, p) }
func (d *digest) WriteString(p string) (int, error) { return write[string](d, p) }

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants