Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: hash, crypto: add WriteByte method to hash implementations #38776

Open
geraldss opened this issue Apr 30, 2020 · 13 comments
Open

proposal: hash, crypto: add WriteByte method to hash implementations #38776

geraldss opened this issue Apr 30, 2020 · 13 comments
Labels
Projects
Milestone

Comments

@geraldss
Copy link

@geraldss geraldss commented Apr 30, 2020

This proposal was initially for embedding io.ByteWriter in hash.Hash, or adding a WriteByte() method with the same signature.

This method is already added in the new maphash.Hash. Adding it elsewhere will extend the benefits in performance and usability to the other Hash implementations.

Per feedback of @ianlancetaylor below, I'm instead proposing the addition WriteByte() from io.ByteWriter to the standard library hash.Hash implementations, including:

adler32
crc32
crc64
fnv

@gopherbot gopherbot added this to the Proposal milestone Apr 30, 2020
@gopherbot gopherbot added the Proposal label Apr 30, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 30, 2020

Unfortunately, the proposed change would not be backward compatible. It would mean that existing types that satisfy the hash.Hash interface would no longer implement the interface. That could break working code, and violates the Go 1 compatibility guarantee (https://golang.org/doc/go1compat).

So, in short, we can't do this.

@geraldss
Copy link
Author

@geraldss geraldss commented Apr 30, 2020

Got it. How about adding another interface, or just adding the WriteByte() method to the standard library's hash implementations?

Hashing is sometimes part of performance-sensitive code paths, and it would be beneficial to avoid conversions to byte slices whose only purpose is to satisfy the API.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 30, 2020

I'm not sure we need another interface, since people can always do a type assertion to io.ByteWriter.

Do you want to repurpose this proposal for adding WriteByte methods to various hash implementations?

@geraldss geraldss changed the title proposal: Add io.ByteWriter to hash.Hash proposal: Add io.ByteWriter to hash.Hash implementations Apr 30, 2020
@ulikunitz
Copy link
Contributor

@ulikunitz ulikunitz commented May 1, 2020

How does performance benefit? My experience with WriteByte is that it is slower than appending to a byte slice and use the classic Write method every 256 or 512 bytes.

@geraldss
Copy link
Author

@geraldss geraldss commented May 1, 2020

The performance benefit isn't for hashing byte slices. It's for hashing everything else: primitives, structs, maps, arrays, and combinations thereof.

@ulikunitz
Copy link
Contributor

@ulikunitz ulikunitz commented May 1, 2020

Can you provide some example code helping me to understand your statement?

@rsc
Copy link
Contributor

@rsc rsc commented Jun 10, 2020

Usually hashes can operate much faster on a block of data than a single byte at a time.
One potential problem with adding WriteByte is that using it would be inherently slower
than passing in a larger slice of data.

What is the use case where WriteByte would be preferable over constructing a (presumably larger than one byte) slice and calling Write?

@rsc rsc added this to Incoming in Proposals Jun 10, 2020
@rsc rsc changed the title proposal: Add io.ByteWriter to hash.Hash implementations proposal: hash, crypto: add WriteByte method to hash implementations Jun 10, 2020
@geraldss
Copy link
Author

@geraldss geraldss commented Jun 10, 2020

type T struct {
    A byte
    B string
    C byte
    D string
}

func HashT(h hash.Hash, t *T) { ... }

To implement HashT(), it would be convenient if there were no conversions to byte slices. The current option is to use encoding/binary, but that API doesn't express the avoidance of byte slices when it calls a generic io.Writer. Ditto for supporting WriteString().

@ulikunitz
Copy link
Contributor

@ulikunitz ulikunitz commented Jun 11, 2020

I have combined bufio.Writer and hash.Hash to create a buffered hash

Test here: https://play.golang.org/p/IHx5GcvLW1v

package main

import (
	"bufio"
	"crypto/sha256"
	"fmt"
	"hash"
)

type BufferedHash struct {
	h hash.Hash
	*bufio.Writer
}

func NewBufferedHash(h hash.Hash) *BufferedHash {
	return &BufferedHash{
		h:      h,
		Writer: bufio.NewWriter(h),
	}
}

func (bh *BufferedHash) Sum(p []byte) []byte {
	if err := bh.Flush(); err != nil {
		panic(err)
	}
	return bh.h.Sum(p)
}

func (bh *BufferedHash) Reset() {
	bh.h.Reset()
	bh.Writer.Reset(bh.h)
}

func (bh *BufferedHash) Size() int {
	return bh.h.Size()
}

func (bh *BufferedHash) BlockSize() int {
	return bh.h.BlockSize()
}

type T struct {
	A byte
	B string
	C byte
	D string
}

func HashT(bh *BufferedHash, t T) {
	bh.WriteByte(t.A)
	bh.WriteString(t.B)
	bh.WriteByte(t.C)
	bh.WriteString(t.D)
}

func main() {
	bh := NewBufferedHash(sha256.New())

	t := T{A: 'A', B: "B", C: 'C', D: "D"}
	HashT(bh, t)

	fmt.Printf("hash(%+v): %x\n", t, bh.Sum(nil))
	bh.Reset()

	t = T{A: 'A', B: "B", C: 'C', D: "Dee"}
	HashT(bh, t)
	fmt.Printf("hash(%+v): %x\n", t, bh.Sum(nil))
}
@ulikunitz
Copy link
Contributor

@ulikunitz ulikunitz commented Jun 11, 2020

The proverb "If I Had More Time, I Would Have Written a Shorter Letter" applies here. There is no need for creating an extra type: https://play.golang.org/p/Pp6GVhLpEx_9

package main

import (
	"crypto/sha256"
	"fmt"
	"io"
)

type T struct {
	A byte
	B string
	C byte
	D string
}

func SerializeT(w io.Writer, t T) {
	fmt.Fprintf(w, "%c%s%c%s", t.A, t.B, t.C, t.D)
}

func main() {
	h := sha256.New()

	t := T{A: 'A', B: "B", C: 'C', D: "D"}
	SerializeT(h, t)

	fmt.Printf("hash(%+v): %x\n", t, h.Sum(nil))
	h.Reset()

	t = T{A: 'A', B: "B", C: 'C', D: "Dee"}
	SerializeT(h, t)
	fmt.Printf("hash(%+v): %x\n", t, h.Sum(nil))
}
@rsc
Copy link
Contributor

@rsc rsc commented Jun 24, 2020

What's the context where you are hashing non-byte-slices with functions like sha256?
If you are building a hash table, hash/maphash is the package to use, and maphash.Hash does have WriteByte.
If you need a well-defined fixed hash function, that's almost always for use with a specific byte sequence.

I suppose the crypto/* hashes all buffer already and the hash/* function all operate byte at a time. But they all still run faster with large sequences.

@geraldss
Copy link
Author

@geraldss geraldss commented Jun 24, 2020

I'm building a relational database. I understand the reservations about changes / additions, but at high scale and high performance, it's important for APIs to not require avoidable overhead.

@rsc rsc moved this from Incoming to Active in Proposals Jul 8, 2020
@rsc
Copy link
Contributor

@rsc rsc commented Jul 15, 2020

but at high scale and high performance, it's important for APIs to not require avoidable overhead.

The argument I was trying to make against adding WriteByte is precisely that it really can't be very high performance. Arranging for larger Writes is always going to beat a WriteByte loop. The reservation about provided WriteByte is exactly that it would tempt people toward a less efficient path.

We may still want to add it for convenience, especially for cases that don't care about "high scale and high performance", but I don't think you'd want to use it in your relational database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.