Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make UUID an array type #3

Closed
GoogleCodeExporter opened this issue Jun 3, 2015 · 22 comments
Closed

Make UUID an array type #3

GoogleCodeExporter opened this issue Jun 3, 2015 · 22 comments

Comments

@GoogleCodeExporter
Copy link

This is more of a question than a bug, but why not make the UUID a `[16]byte` 
type, instead of the slice, `[]byte`?

Alternatively, to make it backwards-compatible, it could be a pointer type, 
`*[16]byte`.

Original issue reported on code.google.com by attilaolah on 16 Dec 2014 at 2:58

@GoogleCodeExporter
Copy link
Author

Actually, it would even be sufficient to implement this method:

func (u UUID) Array() [16]byte {
    var b [16]byte
    copy(b[:], u)
    return b
}

This would allow people to efficiently use UUIDs as map keys, by converting 
them to an array, instead of wasting space by converting them to a string.

Original comment by attilaolah on 2 Jun 2015 at 9:03

@kbullaughey
Copy link

+1 I agree that being able to use UUIDs as map keys would greatly add many algorithms that need to build lookup tables of objects based on ID.

@bmatsuo
Copy link
Contributor

bmatsuo commented Feb 9, 2016

I would like to see benchmarks using [16]byte map keys instead of a string keys (in the context of uuids). I don't know what the result will be. But arrays may be slower. One thing about arrays in Go is that they are passed by value, not by reference as one might expect. To pass by reference, a pointer is used (e.g. *[16]byte). But, obviously, the value can no longer be used as a map key. Strings have the distinct benefit of being a pass-by-reference map key.

This playground link demonstrates what I mean here: http://play.golang.org/p/nItcogmGee

Anyway, I really would be interested in a benchmark if someone wanted to put one together. It's all about quantifying the overhead because, if it is slower, that may just mean the developer has to think about the tradeoff of parsing the strings and if they really need to translate keys back to UUIDs.

@pborman pborman closed this as completed Feb 10, 2016
@pborman pborman reopened this Feb 10, 2016
@pborman
Copy link
Owner

pborman commented Feb 10, 2016

On a 64 bit machine, a string is represented by 2 64 bit values, so 16 bytes, the number of bytes in a UUID. This would be breaking change, however, unless we have a parallel set of functions and make the standard functions simply return slices into the underlying array. A slice actually is 3 values, so 24 bytes on a 64 bit system. I will think about this some more.

@bmatsuo
Copy link
Contributor

bmatsuo commented Feb 10, 2016

Quite right. My mistake. It does seem like a [16]byte map key would be fast then.

In terms of maintaining the API maybe the proposal of a method (uuid UUID) Array() [16]byte is closer to the mark. It's minimal and really just requires (IMO) one other function to easily turn the array back into a UUID (slice).

func Array(raw [16]byte) UUID {
    return UUID(raw[:])
}

You could actually define a `type Array [16]byte. Meh. Not sure it's a big deal.

@pborman
Copy link
Owner

pborman commented Feb 10, 2016

For proof of concept, I created a new package, called uu whose basic type is "type ID [16]byte". It has nearly the same signatures as the current uuid package, though it returns errors rather than nil for bad UUIDs. It is as fast or faster than the current uuid package with the benchmarks in uuid_test. I then rewrote the existing uuid package to more or less be a wrapper for the array version (for example, all the node and time information is only found in the new uu package). It was pretty close in performance to the existing package.

I named the new type ID so packages importing uu would use the type uu.ID vs the existing uuid.UUID.

Here are the numbers I got from 5 runs of each, sorted by benchmark and time:

array__: BenchmarkNew-12                1000000  1507 ns/op 
array__: BenchmarkNew-12                1000000  1536 ns/op 
array__: BenchmarkNew-12                1000000  1547 ns/op 
array__: BenchmarkNew-12                1000000  1562 ns/op 
wrapped: BenchmarkNew-12                1000000  1562 ns/op 
wrapped: BenchmarkNew-12                1000000  1570 ns/op 
array__: BenchmarkNew-12                1000000  1585 ns/op 
current: BenchmarkNew-12                1000000  1592 ns/op 
wrapped: BenchmarkNew-12                1000000  1597 ns/op 
wrapped: BenchmarkNew-12                1000000  1598 ns/op 
current: BenchmarkNew-12                1000000  1608 ns/op 
current: BenchmarkNew-12                1000000  1627 ns/op 
wrapped: BenchmarkNew-12                1000000  1631 ns/op 
current: BenchmarkNew-12                1000000  1638 ns/op 
current: BenchmarkNew-12                1000000  1640 ns/op 

array__: BenchmarkParse-12              20000000 58.3 ns/op 
array__: BenchmarkParse-12              20000000 58.4 ns/op 
array__: BenchmarkParse-12              20000000 58.5 ns/op 
array__: BenchmarkParse-12              20000000 58.6 ns/op 
array__: BenchmarkParse-12              20000000 58.8 ns/op 
current: BenchmarkParse-12              20000000 86.4 ns/op 
current: BenchmarkParse-12              20000000 87.4 ns/op 
current: BenchmarkParse-12              20000000 88.6 ns/op 
current: BenchmarkParse-12              20000000 89.0 ns/op 
current: BenchmarkParse-12              20000000 90.6 ns/op 
wrapped: BenchmarkParse-12              20000000 94.8 ns/op 
wrapped: BenchmarkParse-12              20000000 97.2 ns/op 
wrapped: BenchmarkParse-12              20000000 97.9 ns/op 
wrapped: BenchmarkParse-12              20000000 99.2 ns/op 
wrapped: BenchmarkParse-12              20000000 99.4 ns/op 

wrapped: BenchmarkUUID_MarshalJSON-12   1000000  1046 ns/op 
wrapped: BenchmarkUUID_MarshalJSON-12   1000000  1068 ns/op 
array__: BenchmarkUUID_MarshalJSON-12   1000000  1069 ns/op 
current: BenchmarkUUID_MarshalJSON-12   1000000  1069 ns/op 
wrapped: BenchmarkUUID_MarshalJSON-12   1000000  1075 ns/op 
current: BenchmarkUUID_MarshalJSON-12   1000000  1086 ns/op 
array__: BenchmarkUUID_MarshalJSON-12   1000000  1088 ns/op 
array__: BenchmarkUUID_MarshalJSON-12   1000000  1091 ns/op 
array__: BenchmarkUUID_MarshalJSON-12   1000000  1095 ns/op 
array__: BenchmarkUUID_MarshalJSON-12   1000000  1108 ns/op 
wrapped: BenchmarkUUID_MarshalJSON-12   1000000  1108 ns/op 
current: BenchmarkUUID_MarshalJSON-12   1000000  1114 ns/op 
wrapped: BenchmarkUUID_MarshalJSON-12   1000000  1114 ns/op 
current: BenchmarkUUID_MarshalJSON-12   1000000  1120 ns/op 
current: BenchmarkUUID_MarshalJSON-12   1000000  1139 ns/op 

array__: BenchmarkUUID_String-12        10000000 131  ns/op 
current: BenchmarkUUID_String-12        10000000 132  ns/op 
current: BenchmarkUUID_String-12        10000000 133  ns/op 
current: BenchmarkUUID_String-12        20000000 133  ns/op 
array__: BenchmarkUUID_String-12        10000000 134  ns/op 
current: BenchmarkUUID_String-12        10000000 136  ns/op 
array__: BenchmarkUUID_String-12        10000000 137  ns/op 
current: BenchmarkUUID_String-12        10000000 137  ns/op 
array__: BenchmarkUUID_String-12        10000000 138  ns/op 
array__: BenchmarkUUID_String-12        10000000 140  ns/op 
wrapped: BenchmarkUUID_String-12        10000000 141  ns/op 
wrapped: BenchmarkUUID_String-12        10000000 141  ns/op 
wrapped: BenchmarkUUID_String-12        10000000 143  ns/op 
wrapped: BenchmarkUUID_String-12        10000000 143  ns/op 
wrapped: BenchmarkUUID_String-12        10000000 145  ns/op 

current: BenchmarkUUID_URN-12           10000000 137  ns/op 
array__: BenchmarkUUID_URN-12           10000000 138  ns/op 
current: BenchmarkUUID_URN-12           10000000 138  ns/op 
array__: BenchmarkUUID_URN-12           10000000 140  ns/op 
array__: BenchmarkUUID_URN-12           10000000 140  ns/op 
current: BenchmarkUUID_URN-12           10000000 140  ns/op 
array__: BenchmarkUUID_URN-12           10000000 141  ns/op 
current: BenchmarkUUID_URN-12           10000000 142  ns/op 
current: BenchmarkUUID_URN-12           10000000 142  ns/op 
array__: BenchmarkUUID_URN-12           10000000 143  ns/op 
wrapped: BenchmarkUUID_URN-12           10000000 145  ns/op 
wrapped: BenchmarkUUID_URN-12           10000000 146  ns/op 
wrapped: BenchmarkUUID_URN-12           10000000 146  ns/op 
wrapped: BenchmarkUUID_URN-12           10000000 149  ns/op 
wrapped: BenchmarkUUID_URN-12           10000000 153  ns/op 

array__: BenchmarkUUID_UnmarshalJSON-12 1000000  1273 ns/op 
current: BenchmarkUUID_UnmarshalJSON-12 1000000  1301 ns/op 
array__: BenchmarkUUID_UnmarshalJSON-12 1000000  1311 ns/op 
array__: BenchmarkUUID_UnmarshalJSON-12 1000000  1318 ns/op 
wrapped: BenchmarkUUID_UnmarshalJSON-12 1000000  1331 ns/op 
wrapped: BenchmarkUUID_UnmarshalJSON-12 1000000  1337 ns/op 
current: BenchmarkUUID_UnmarshalJSON-12 1000000  1339 ns/op 
current: BenchmarkUUID_UnmarshalJSON-12 1000000  1341 ns/op 
wrapped: BenchmarkUUID_UnmarshalJSON-12 1000000  1342 ns/op 
array__: BenchmarkUUID_UnmarshalJSON-12 1000000  1350 ns/op 
current: BenchmarkUUID_UnmarshalJSON-12 1000000  1360 ns/op 
current: BenchmarkUUID_UnmarshalJSON-12 1000000  1362 ns/op 
wrapped: BenchmarkUUID_UnmarshalJSON-12 1000000  1385 ns/op 
array__: BenchmarkUUID_UnmarshalJSON-12 1000000  1398 ns/op 
wrapped: BenchmarkUUID_UnmarshalJSON-12 1000000  1410 ns/op 

@bmatsuo
Copy link
Contributor

bmatsuo commented Feb 11, 2016

The biggest improvement seems to be in uuid.Parse(), probably because the number of allocations has gone down to zero (might be nice to run with -test.benchmem to see allocations and size). uuid.New() should also have one less allocation, but I think that it is somewhat washed out by I/O and string allocation (6% speedup overall).

Did you know that you can also increase the amount of iterations in a benchmark by setting -test.benchtime=5s (other than default of "1s")? It could reduce the number of runs you need.

@bmatsuo
Copy link
Contributor

bmatsuo commented Feb 11, 2016

I do wonder if it is a mistake to have uu.Parse() return an array and not a reference to an array. Likewise if a struct is defined without a reference to an array.

type Record struct {
    UUID uu.ID
}

r := &Record{}
// ...

I think the a nil zero-value for r.ID is more idiomatic than a zero value of "00000000-0000-0000-0000-000000000000", even though it may make some sense under mathematic reasoning. I just think its more practical to use a reference. Of course, if that were the case I think the benefits seen in BenchmarkNew and BenchmarkParse will go away to some degree.

@pborman
Copy link
Owner

pborman commented Feb 11, 2016

After a brief exchange with Russ (rsc) I have renamed uu to be uuid and ID to UUID so it looks like the current package, so github.com/pborman/uuid/uuid is the name of the package. Russ also suggested just changing the API and only having a single library with the new API as people should be using vendoring if they want the old behavior. I think vendoring is still too new and most people are not using it and the new API would absolutely break almost all uses of the package (mostly due to the change in Parse).

A slice is a reference (though it cannot be converted back to the underlying array). It does support the concept of a missing UUID and the NIL UUID (Note that a NIL UUID is defined as a UUID with all 0's).

Parse in the new package returns a UUID and an error. Here is the implementation of Parse in the wrapped (old) package:

func Parse(s string) UUID {
        id, err := uuid.Parse(s)
        if err != nil {
                return nil
        }
        return id[:]
}

On the negative side, the following code (from json_test.go)

        type S struct {
                ID1 UUID
                ID2 UUID
        }
        s1 := S{ID1: testUUID}
        data, err := json.Marshal(&s1)

Produces

{
  "ID1":"f47ac10b-58cc-0372-8567-0e02b2c3d479",
  "ID2":"00000000-0000-0000-0000-000000000000" 
}

with the new package and

{
  "ID1":"f47ac10b-58cc-0372-8567-0e02b2c3d479",
  "ID2":""
}

with the old package. I am not sure which is overall better at the moment.

@bmatsuo
Copy link
Contributor

bmatsuo commented Feb 11, 2016

I agree that vendoring is probably still too new, if only for the fact that some people think the Go compiler has gotten too slow and are stuck on go1.4 until it improves. Vendoring will always be enabled in go1.6 iirc.

I think the biggest problem with the JSON example is that the omitempty tag will no longer work. That is a demonstration of what I think is problematic about using arrays without indirecting through a pointer.

@pborman
Copy link
Owner

pborman commented Feb 11, 2016

Take a look at https://github.com/pborman/uuid2, it has all the new code.

I think for people using JSON, they are better off with the sliced version. Everyone else is probably better off with the array version.

@bmatsuo
Copy link
Contributor

bmatsuo commented Feb 11, 2016

I would rather there just be one version. But this is your project and I will critique your branch if this is how you would like to proceed.

  • Consider a function func Must(uuid UUID, err error) UUID so that you don't need Must* Variants for each type of UUID.
u := uuid.Must(uuid.NewUUID())
// ...
  • I think you could probably have a NewRandom function that returns an error. Then you can document that New() is an alias for uuid.Must(uuid.NewRandom()). I don't really know under what conditions rand.Reader.Read() returns an error but I am a fan of symmetry.
  • It makes sense for New() to return a UUID instead of a string in the array package.

I don't really notice anything else for now.

@freeformz
Copy link

@pborman The json encoding differences could also be "solved" by implementing the json Marshaler Interface.

@pborman
Copy link
Owner

pborman commented Feb 11, 2016

@freeformz: In the array case it cannot be solved because there is nothing to differentiate between a bad UUID and the NIL UUID.

@bmatsuo: I agree with having just one package, but I feel locked into the current API for the current package name. I did a brief search internally and I would have to fix a lot of code at Google if Parse changed and UUID was no longer a slice. Just adding Array() might end up being the best compromise.

@bmatsuo
Copy link
Contributor

bmatsuo commented Feb 11, 2016

I threw together some benchmarks of an Array() method on my branch, bmatsuo/array-conversions. The numbers are pretty bonkers (bottom two numbers, <20ns). I'm not sure if they are skewed due to the micro nature of the benchmarks.

PASS
BenchmarkUUID_MarshalJSON-4      2000000          1260 ns/op
BenchmarkUUID_UnmarshalJSON-4    2000000          1883 ns/op
BenchmarkParse-4                30000000           104 ns/op
BenchmarkNew-4                   2000000          1634 ns/op
BenchmarkUUID_String-4          20000000           138 ns/op
BenchmarkUUID_URN-4             20000000           144 ns/op
BenchmarkUUID_Array-4           200000000           17.1 ns/op
BenchmarkArray_UUID-4           300000000            8.65 ns/op
ok      github.com/pborman/uuid 32.356s

edit: The reason there are so many iterations is because I ran the benchmarks with -test.benchtime=2s

edit 2: The UUID_Array benchmark is higher because it makes an allocation slicing the resulting array to test its value. Strangely, removing the allocation increases the time slightly to ~18ns/op.

@bmatsuo
Copy link
Contributor

bmatsuo commented Feb 11, 2016

These benchmarks have made me a little concerned that byte-array comparison may not be as optimized as string/[]byte comparison. And that it may indeed not be as great for map keys as one could imagine (it still might be better, IDK).

@pborman
Copy link
Owner

pborman commented Feb 12, 2016

I simplified your benchmark to:

func BenchmarkEqual(b *testing.B) {
        a1 := A{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
        a2 := A{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
                if a1 != a2 {
                        b.Fatal("not equal")
                }
        }
}

func BenchmarkBytesEqual(b *testing.B) {
        a1 := A{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
        a2 := A{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
                if !bytes.Equal(a1[:], a2[:]) {
                        b.Fatal("not equal")
                }
        }
}

The results were:

BenchmarkEqual-12           200000000            6.02 ns/op
BenchmarkBytesEqual-12      300000000            4.90 ns/op

Which corresponds with what you saw, though not as an extreme difference. I tried running the same test built with gccgo and the results are:

BenchmarkEqual          300000000            4.37 ns/op
BenchmarkBytesEqual     200000000            6.59 ns/op

which is exactly the opposite!

@pborman
Copy link
Owner

pborman commented Feb 12, 2016

I did a benchmark of inserting 128 random UUIDs into a map (the same 128 for all tests). In one test I use the string form, the other used a 16 byte array. I did not include any time to generate them in the benchmark, just the insertion time. The results were:

BenchmarkTossInsertString-12      200000          5992 ns/op
BenchmarkTossInsertArray-12       300000          5311 ns/op

So there is a benefit in maps to use arrays.

@bmatsuo
Copy link
Contributor

bmatsuo commented Feb 12, 2016

Thanks for running these sets of benchmarks. I think they are conclusive as well.

Would you like for me to open a PR for the Array() method? You could just merge or reimplement it though. Whatever.

@pborman
Copy link
Owner

pborman commented Feb 12, 2016

Sure, why don't you put one together, along with the String method, of course. Thanks!

@bmatsuo
Copy link
Contributor

bmatsuo commented Feb 12, 2016

Good call on Array.String().

@pborman
Copy link
Owner

pborman commented Feb 19, 2016

It turns out I was not supposed to migrate from code.google.com to my personal github.com repository, rather, it was supposed to go into github.com/google.

This package has been forked into https://github.com/google/uuid. Since the new location is not really in use, it is a perfect time to change the API, including moving to an array type. I have done this. I welcome any comments you might have. Now is the time to change the API if it is going to be changed. If you would like to see changes in the API (nor promises), please add them to this new issue.

I am not removing this repository as many people are using it. It also will continue to have the same API. I probably will, at some point, make this version a wrapper around the forked version in
https://github.com/google/uuid (probably when I find I need to make a change to both of them).

Thanks again to everyone who has used and/or contributed to this package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants