hash, crypto: add WriteByte, WriteString method to hash implementations #38776
Comments
Unfortunately, the proposed change would not be backward compatible. It would mean that existing types that satisfy the hash.Hash interface would no longer implement the interface. That could break working code, and violates the Go 1 compatibility guarantee (https://golang.org/doc/go1compat). So, in short, we can't do this. |
Got it. How about adding another interface, or just adding the WriteByte() method to the standard library's hash implementations? Hashing is sometimes part of performance-sensitive code paths, and it would be beneficial to avoid conversions to byte slices whose only purpose is to satisfy the API. |
I'm not sure we need another interface, since people can always do a type assertion to Do you want to repurpose this proposal for adding |
How does performance benefit? My experience with WriteByte is that it is slower than appending to a byte slice and use the classic Write method every 256 or 512 bytes. |
The performance benefit isn't for hashing byte slices. It's for hashing everything else: primitives, structs, maps, arrays, and combinations thereof. |
Can you provide some example code helping me to understand your statement? |
Usually hashes can operate much faster on a block of data than a single byte at a time. What is the use case where WriteByte would be preferable over constructing a (presumably larger than one byte) slice and calling Write? |
To implement |
I have combined bufio.Writer and hash.Hash to create a buffered hash Test here: https://play.golang.org/p/IHx5GcvLW1v package main
import (
"bufio"
"crypto/sha256"
"fmt"
"hash"
)
type BufferedHash struct {
h hash.Hash
*bufio.Writer
}
func NewBufferedHash(h hash.Hash) *BufferedHash {
return &BufferedHash{
h: h,
Writer: bufio.NewWriter(h),
}
}
func (bh *BufferedHash) Sum(p []byte) []byte {
if err := bh.Flush(); err != nil {
panic(err)
}
return bh.h.Sum(p)
}
func (bh *BufferedHash) Reset() {
bh.h.Reset()
bh.Writer.Reset(bh.h)
}
func (bh *BufferedHash) Size() int {
return bh.h.Size()
}
func (bh *BufferedHash) BlockSize() int {
return bh.h.BlockSize()
}
type T struct {
A byte
B string
C byte
D string
}
func HashT(bh *BufferedHash, t T) {
bh.WriteByte(t.A)
bh.WriteString(t.B)
bh.WriteByte(t.C)
bh.WriteString(t.D)
}
func main() {
bh := NewBufferedHash(sha256.New())
t := T{A: 'A', B: "B", C: 'C', D: "D"}
HashT(bh, t)
fmt.Printf("hash(%+v): %x\n", t, bh.Sum(nil))
bh.Reset()
t = T{A: 'A', B: "B", C: 'C', D: "Dee"}
HashT(bh, t)
fmt.Printf("hash(%+v): %x\n", t, bh.Sum(nil))
} |
The proverb "If I Had More Time, I Would Have Written a Shorter Letter" applies here. There is no need for creating an extra type: https://play.golang.org/p/Pp6GVhLpEx_9 package main
import (
"crypto/sha256"
"fmt"
"io"
)
type T struct {
A byte
B string
C byte
D string
}
func SerializeT(w io.Writer, t T) {
fmt.Fprintf(w, "%c%s%c%s", t.A, t.B, t.C, t.D)
}
func main() {
h := sha256.New()
t := T{A: 'A', B: "B", C: 'C', D: "D"}
SerializeT(h, t)
fmt.Printf("hash(%+v): %x\n", t, h.Sum(nil))
h.Reset()
t = T{A: 'A', B: "B", C: 'C', D: "Dee"}
SerializeT(h, t)
fmt.Printf("hash(%+v): %x\n", t, h.Sum(nil))
} |
What's the context where you are hashing non-byte-slices with functions like sha256? I suppose the crypto/* hashes all buffer already and the hash/* function all operate byte at a time. But they all still run faster with large sequences. |
I'm building a relational database. I understand the reservations about changes / additions, but at high scale and high performance, it's important for APIs to not require avoidable overhead. |
The argument I was trying to make against adding WriteByte is precisely that it really can't be very high performance. Arranging for larger Writes is always going to beat a WriteByte loop. The reservation about provided WriteByte is exactly that it would tempt people toward a less efficient path. We may still want to add it for convenience, especially for cases that don't care about "high scale and high performance", but I don't think you'd want to use it in your relational database. |
All the hashes have buffers underneath, so they can all implement WriteByte efficiently - well, as efficiently as anyone can implement WriteByte. It's still more efficient to call Write with many bytes than to call WriteByte in a loop, but given that io.ByteWriter exists, it seems reasonable to make the hash.Hash implementations implement it. Earlier this year we declined #14757 because the implementation would have to use unsafe, but @bradfitz points out that the buffer that enables WriteByte would also enable a safe implementation of WriteString. So maybe we should add WriteString at the same time, using safe code. (If passed a long string, WriteString would have to copy into the buffer, process the buffer, and repeat. That would still be a bit of copying, but not more than converting to a []byte.) Will retitle this issue to be WriteByte and WriteString and leave open for another week, but this seems headed for likely accept. |
The premise that all hashes have buffers underneath is not correct. The non-cryptographic hashes in the adler32, crc32, crc64 and fnv packages in the hash directory of the standard library don't have buffers. It is of course possible to implement WriteByte and WriteString for those hashes based on the Write logic. The cost of the proposal, implement WriteByte and WriteString for all hashes in the standard library, is increased code size and additional test code. Implementation will require the replication of the Write logic in The convenience argument for WriteByte still doesn't convince me. Why is it necessary to add a method to each hash function to do something that will result in slow code. Beginners will still struggle because they I wonder whether we should look at the more general problem: How can WriteByte and WriteString be supported for an io.Writer? One option is to use bufio.Writer as a wrapper. But it complicates the program logic by requiring calls to Flush to ensure all data is written to the underlying writer. For WriteString there is the io.WriteString function, which has the disadvantage that it allocates a new byte slice and copies the data from string. The package unsafe is probably not used because the requirement For WriteByte an io.WriteByte convenience function would address the problem. Performance is not a concern here. Both proposals still allow the implementation of WriteByte and WriteString by hashes, but wouldn't make it mandatory. |
adler32, crc32, crc64, and fnv have no buffer because they are byte-at-a-time algorithms (the chunk size is 1 byte). An io.WriteByte convenience function would have to allocate on every byte in the fallback, like io.WriteString allocates on every call (but with many fewer calls in typical cases!). That's too expensive to hide in an innocuous-looking function. |
Thanks Ross for the response. I agree and I stated already it is possible to implement WriteByte and WriteString for adler32, etc. If a type supports the Write method it is always possible to implement While I understand the performance argument for WriteString, I'm still not convinced about WriteByte. What is the actual use case requiring the implementation for all hashes? The original proposal cited the direct marshaling or serialization of a struct value into the hash. But that doesn't convince because the struct may include other types like larger integers and hashes will not support those directly. There is also the question about consistency for writers in the standard library. After this proposal is implemented all hashes will support WriteByte and WriteString, but os..File supports WriteString but not WriteByte and net.TCPConn supports only Write. Shouldn't there be a general rule for supporting WriteByte and WriteString? |
The ByteWriter docs are not very helpful - there's nothing anywhere about what WriteByte means. These hashes can implement it efficiently enough and so it's probably worth doing. |
Based on the discussion above, this seems like a likely accept. |
Extending the ByteWriter document is surely helpful. The bufio Writer remark must however be modified because it requires the additional use of the Flush method. I wrote a package that provides a wrioter wrapper returning a Writer having all three Write methods: Write, WriteByte and WriteString. All writes are direct so new flusing is required. |
No change in consensus, so accepted. |
Change https://golang.org/cl/301189 mentions this issue: |
This proposal was initially for embedding io.ByteWriter in hash.Hash, or adding a WriteByte() method with the same signature.
This method is already added in the new maphash.Hash. Adding it elsewhere will extend the benefits in performance and usability to the other Hash implementations.
Per feedback of @ianlancetaylor below, I'm instead proposing the addition
WriteByte()
from io.ByteWriter to the standard library hash.Hash implementations, including:adler32
crc32
crc64
fnv
The text was updated successfully, but these errors were encountered: