Skip to content
{{ message }}

# golang / go

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

# proposal: spec: disallow NaN keys in maps #20660

Open
opened this issue Jun 13, 2017 · 30 comments
Open

# proposal: spec: disallow NaN keys in maps#20660

opened this issue Jun 13, 2017 · 30 comments
Labels
Milestone

## Comments

### bcmills commented Jun 13, 2017 • edited

 Motivation Go currently supports NaN keys in maps through some clever tricks. The current spec is ambiguous on this behavior. It says: [I]f the map contains an entry with key `x`, `a[x]` is the map value with key `x` and the type of `a[x]` is the value […] but "with key `x`" could plausibly mean "with key represented by `x`". (In practice, today it means "with key equal to `x`".) That leads to surprising and subtle bugs (such as #14427) and adds complexity even in correct code (#20138 (comment)). For example, the following two functions appear to be equivalent, but the latter version silently copies zeroes instead of the intended values for NaN keys: ```func copy(m map[float64]int) map[float64]int { c := make(map[float64]int, len(m)) for k, v := range m { c[k] = v } }``` ```func copy(m map[float64]int) map[float64]int { c := make(map[float64]int, len(m)) for k := range m { c[k] = m[k] } }``` Proposal We already reject incomparable values as map keys: If the key type is an interface type, these comparison operators must be defined for the dynamic key values; failure will cause a run-time panic. I propose that we should apply that standard to NaNs as well. Specifically, I propose that we add the following sentence to https://golang.org/ref/spec#Index_expressions: Assigning to an element of a map with a key that is not equal to itself causes a run-time panic. This proposal is along the lines of #19624, in that it produces panics for arithmetic operations which are otherwise likely to result in silent corruption in typical user code. Workarounds Users can easily avoid the proposed panics by calling `math.IsNaN(k)` checking whether the key is equal to itself before inserting a map entry. Users who expect NaNs with the same representation to map to the same element can transform the keys using `math.Float64bits` or `math.Float32bits`: ``` if k == 0 { // -0.0 or +0.0 m[math.Float64bits(0)] = v } else { m[math.Float64bits(k)] = v }``` Users who really do intend to store all of the entries for NaN keys separately can explicitly pair the map with a slice: ``` type nanElem struct { k float64 v V } var ( m map[float64]V nans []nanElem ) if k != k { nans = append(nans, nanElem{k, v}) } else { m[k] = v }```
added this to the Proposal milestone Jun 13, 2017
added the label Jun 13, 2017
added the label Jun 13, 2017
changed the title proposal: Go 2: disallow NaN keys in maps proposal: disallow NaN keys in maps Jun 13, 2017
added the label Jun 13, 2017

### randall77 commented Jun 13, 2017

 FWIW, there's a fair amount of code in the hashmap implementation that deals with the possibility of k!=k for keys k (search for reflexivekey in runtime/hashmap.go). It would be nice to strip that out. That said, I don't think it is a big performance impediment. In the common case it's just a load/compare/predictable branch in a few places.

### dsnet commented Jun 13, 2017

 The strange behavior with maps only because == is not symmetric when applied on NaNs. An alternative proposal would be to define == to be symmetric for NaNs only for maps. Technically there are multiple types of NaNs (signaling and quiet NaNs and a variety of valid bit sequences that represent NaNs). I similarly argue that they should all be equal to each other for maps in the same way that `-0.0 == +0.0`.

### bcmills commented Jun 13, 2017

 @randall77 It may be that this proposal would simplify the hashmap implementation, but I would prefer to focus on the correctness and complexity implications for user code.

### bcmills commented Jun 13, 2017 • edited

 @dsnet As @rsc notes in the blog post, allowing `m[k1]` and `m[k2]` to be same element when `k1 != k2` would also be surprising. It is straightforward to equate NaNs in user code if desired using the `math.FloatNbits` approach outlined above, and I would argue that the explicit code makes the change in equality less surprising. ```m := map[uint64]V … switch { case math.IsNan(k): m[math.Float64bits(math.NaN())] = v case k == 0: // +0.0 or -0.0 m[math.Float64bits(0)] = v default: m[math.Float64bits(k)] = v }```

### randall77 commented Jun 13, 2017

 @bcmills : I agree, just wanted to let people know what the effects on the implementation would be (tl;dr a small net positive).

### cznic commented Jun 14, 2017

 We already reject incomparable values as map keys: If the key type is an interface type, these comparison operators must be defined for the dynamic key values; failure will cause a run-time panic. All and every value of type {float32,float64} is comparable (guaranteed by the specs). The result of the comparison is irrelevant here. Map keys must be comparable is a simple, non-complicated rule. I object to making it not only more complex, but suprisingly non-orthogonal.

### rsc commented Jun 14, 2017

 Users can easily avoid the proposed panics by calling math.IsNaN(k) before inserting a map entry for k. Not true. NaNs can be inside arrays or structs. All our current restrictions are type-based, not value-based. Even the interface panic restriction is about the type inside the interface, not the value. I'd really like to avoid introducing and explaining new value-based restrictions.

### bcmills commented Jun 14, 2017

 Users can easily avoid the proposed panics by calling math.IsNaN(k) before inserting a map entry for k. Not true. NaNs can be inside arrays or structs. Good point. But it's still easy to check, especially given the "not equal to itself" phrasing I've proposed: ```if k != k { … // Can't be a map key. Do something else with it. } else { m[k] = v }```

### bcmills commented Jun 14, 2017

 The main issue I want to address is the surprising inability to access elements by key in an irreflexively-keyed map. If you don't want to introduce value-based restrictions, perhaps we could at least change map entries from "with key equal to `x`" to "with key represented by `x`"?

### rsc commented Jun 14, 2017

 It's certainly an idea worth evaluating when we get to language changes again (hence the Go2 label). Like all the Go2-labelled things, I'm not going to try to evaluate them now.
changed the title proposal: disallow NaN keys in maps proposal: spec: disallow NaN keys in maps Jun 16, 2017
mentioned this issue Jul 28, 2017

### zombiezen commented Dec 21, 2017

 One other way to address the specific case of copying maps would be to extend the `copy` builtin function to support maps.

### bcmills commented Dec 21, 2017 • edited

 That would still give surprising behavior for functions that are not `copy` exactly. For example, consider this snippet: ``` m := map[float64]int{k: 0} m[k] += 1 m[k] += 1``` It appears to increment the entry for `m[k]` twice, but what it actually does (when `k` is a NaN) is add two new entries mapping `k` to `1` (https://play.golang.org/p/gIUlA0lg-9b). Similarly, there is no way to delete a NaN entry from a map: https://play.golang.org/p/Rz2TLn336c8 So special-casing `copy` isn't, IMO, a viable solution.

### bcmills commented Jan 2, 2018

 For an example where `+=` matters, consider the case of aggregating two maps into one (https://play.golang.org/p/iwKyUWV3m_U): ```func inc(dst, src map[float64]int) { for k, v := range src { dst[k] += v } }``` There is currently no way to implement such a function correctly in the presence of NaNs: because `delete` doesn't work with NaN keys, there is no way to remove or update the existing entries in the `dst` map even if you use the `Float64bits` trick to aggregate into some non-float-keyed intermediate.

### ianlancetaylor commented Feb 20, 2018

 I agree with @bcmills that a map with NaN key values is hard and perhaps even impossible to use. However, the current definition of maps is entirely typed based, and I don't see a clear reason to change that. The current behavior is consistent and predictable even though surprising. @bcmills suggests that you can check for an irreflexive value before adding it to the map, but that is true whether we make this change or not. Closing.

### griesemer commented Feb 20, 2018

 cc: griesemer

### bcmills commented Feb 21, 2018 • edited

 The current behavior is consistent and predictable even though surprising. I would argue that “surprising” is exactly the opposite of “predictable”: you can only predict the current behavior if you stop to think about it, and (empirically) most users don't. So what about using representation rather than equality (per #20660 (comment))? That would also be consistent and predictable, and would arguably be a better match for the behavior people already assume for maps with floating-point keys. (For example, it would give reasonable behavior for all of the examples of subtly-broken loops I've listed above.) The only downside I can see is that it would not equate the entries for `+0.0` and `-0.0`, but it's not at all obvious to me that that matters in practice: given that lookups are already sensitive to rounding, it seems much less surprising to have separate entries for +0 and -0 than to have arbitrarily many entries for `math.NaN()`. A quick sample of float-keyed maps at Google (because that's an easy corpus for me to search) gives some interesting data: The vast majority are inversions or indexes of other floating-point data, where the set of keys used for lookup are drawn from some fixed data set and lookups from that set are assumed to succeed. This is a fairly common pattern in unit tests. A public example in this category is in @aclements' `moremath` package: https://github.com/aclements/go-moremath/blob/033754ab1fee508c9f98f2785eec2365964e0b05/internal/mathtest/mathtest.go#L42-L49 Several of the non-test examples are aggregating loops that pretty much exactly mirror the examples above. Most of the rest are operating on data sets with no negative sign bits, such as percentiles, probabilities in the range [0, 1], or scalar distances. One does operate with arbitrary keys, but special-cases keys equal to 0 before performing the lookup. One is a `float64`-to-`*float64` interning table, which doesn't particularly care about rounding or ±0 but will consume much more memory than intended if you feed it NaNs.

### ianlancetaylor commented Feb 22, 2018

 Right now maps keys use the same operation as the `==` operator in the language. If we use representation instead, we break that simple rule. I don't think that is a good idea. In general, the current implementation minimizes special cases in the language. I think that is a good thing.

### bcmills commented Feb 22, 2018 • edited

 In general, the current implementation minimizes special cases in the language. I think that is a good thing. I agree that minimizing special cases is a good thing, but I don't think it's accurate to say that the current implementation minimizes special cases. The existing semantics already imposes a special case: it just isn't documented. The spec for `range` says: For each iteration, iteration values are produced as follows if the respective iteration variables are present: ``````Range expression 1st value 2nd value […] map m map[K]V key k K m[k] V `````` But today, `range` does not produce the iteration value `m[k]` if `k != k`. Instead, it produces the value for an element with key represented by `k`. It's also worth noting that the current spec does not use the `==` operator to define map semantics: `delete` "removes the element with key `k`" (not "the element with key equal to `k`"). `a[x]` is "the map element with key `x`" (not "the map element with key equal to `x`). For map types, "[t]he comparison operators `==` and `!=` must be fully defined", but the spec does not specify how they are used. If we summarize map semantics in terms of equality, the existing special cases become clear: `a[x]` is the value of the element with key equal to `x`, unless it is the left-hand operand of an assignment. If there is an existing element with key equal to `x`, `a[x] = v` sets the value of that element to `v`. Otherwise, it adds a new element with key `x` and value `v`. `delete(m, k)` removes the element with key equal to `k`. `range m` yields the key and value of each element in `m`. Using representation instead, the special cases disappear: `a[x]` is the value of the element with key represented by `x`. Assigning to `a[x]` creates that element if it does not yet exist. `delete(m, k)` removes the element with key represented by `k`. `range m` yields the key and value of each element in `m`. Enumerating the "equal" rules without the special cases gives the proposed panic semantics: `a[x]` is the value of the element with key equal to `x`. Assigning to `a[x]` creates that element if it does not yet exist. (If such an element cannot exist, attempting to create it causes a run-time panic.) `delete(m, k)` removes the element with key equal to `k`. `range m` yields the key and value of each element in `m`.
mentioned this issue Feb 22, 2018

### mdempsky commented Feb 22, 2018

 All our current restrictions are type-based, not value-based. Even the interface panic restriction is about the type inside the interface, not the value. It's about the dynamic type of the value, which I think is still arguably a value-based restriction from the perspective of the map key type, even if the spec isn't currently worded to emphasize that. Currently we incompletely support NaN keys (e.g., `delete` doesn't work, and package reflect can't access NaN-keyed values). I'd argue we should either explicitly disallow them or we should completely support them (e.g., @bcmills's representation-based proposal above). That said, I'm not particularly fond of the representation-based approach. It feels like an abstraction violation to me, but that's an admittedly rather subjective concern. @ianlancetaylor This was tagged for Go2, but now it's closed. Does that mean it's been rejected for consideration from Go2?

### cznic commented Feb 22, 2018

 Currently we incompletely support NaN keys (e.g., delete doesn't work, ... I don't think `delete` does not work with `NaN` keys. I think that `delete` with a `NaN` key works actually exactly as specified. Related comment.

### bcmills commented Feb 22, 2018 • edited

 I think that `delete` with a `NaN` key works actually exactly as specified. “Works in a way that is not useful to anybody” is the same as “doesn't work” in terms of user value. I have yet to see a program in which the current behavior of `delete` for `NaN` keys is useful and desired: if you have one in mind, that would be a great example to look at.

### bcmills commented Feb 22, 2018

 Hmm, it appears that `-0.0` is a special case for constant evaluation rather than maps (https://play.golang.org/p/W2K_FYMPGs9). Guess I need to update my comments.

### nhooyr commented Feb 22, 2018

 I don't follow. How does the linked snippet show that `-0.0` is a special case for constant evaluation?

### cznic commented Feb 22, 2018

 I have yet to see a program in which the current behavior of delete for NaN keys is useful and desired: if you have one in mind, that would be a great example to look at. Then probably also division by zero, division/multiplication by 1, adding/subtracting zero - all those operations are "not useful to anybody". Yet they are fully supported by the language (modulo division by a constant zero). I consider `delete(m, math.NaN())` to be very much the same as `x /= 1`. It's statically guaranteed to do nothing and should be considered by the DCE gears of the compiler. But other than that I don't see something special about it. It's just a neutral operand as any other.

### bcmills commented Feb 22, 2018

 Then probably also division by zero, division/multiplication by 1, adding/subtracting zero - all those operations are "not useful to anybody". Yet they are fully supported by the language (modulo division by a constant zero). Division by zero causes a run-time panic. The remaining operations are identity operations that look like identity operations. `delete(m, k)` looks like it deletes the entry for the key, but doesn't. That matters when the actual value of `k` is not obvious (such as when it is obtained from a `range` loop over a passed-in map).

### cznic commented Feb 22, 2018

 The remaining operations are identity operations that look like identity operations. delete(m, k) looks like it deletes the entry for the key, but doesn't. It works and looks exactly like ```m := map[int]struct{}{} m[42] = struct{}{} delete(m, 314) // Looks "like it deletes the entry for 314"...?``` Nothing happens because `314` is not in `m` wrt to map indexing. Same as `math.NaN()` is not.

### mdempsky commented Feb 22, 2018

 The point under discussion is that (among other examples): ``````m := make(map[float64]int) var k float64 = f() m[k] = 1 // insert k delete(m, k) // delete k `````` looks like it should work as the comments describe, but doesn't if k is NaN. I expect most Go users would be surprised that this code can run without panicking, but might leave the map entry behind.

### bcmills commented Feb 22, 2018 • edited

 Here's an example for `delete`: a function that looks like it prunes elements into disjoint sets, but actually only duplicates them if the keys are NaN. (https://play.golang.org/p/5tZsxi5uazg) ```func prune(m map[float64]bool, threshold float64) (pruned map[float64]bool) { pruned = make(map[float64]bool) for k, v := range m { if k >= threshold { continue } pruned[k] = v delete(m, k) } return pruned }```

### ianlancetaylor commented Feb 23, 2018

 Yes, this was closed as a change that we will not make for Go 2. Since there is so much discussion, I will reopen it. But it will most likely be closed again. I am strongly against changing maps to work on the NaN representation. I care less about having map operations panic on NaN, but I continue to find the arguments above to be unconvincing. Yes, using NaNs as map keys can be surprising. It is also surprising that `x == x` is `false` if `x` is a NaN. Yet people manage to deal with that. The exact code that people will have to write if map operations panic on NaN will work today without changing the language.
added the label Feb 23, 2018
mentioned this issue Feb 23, 2018

### bcmills commented Feb 23, 2018 • edited

 I am strongly against changing maps to work on the NaN representation. I care less about having map operations panic on NaN, but I continue to find the arguments above to be unconvincing. I don't have a strong preference between panicking on NaNs or using representations instead. If we decide to do neither and maintain the status quo, I do think we would need to address the other NaN-key issues in the standard library (#11104, #14427, and #24075 that I know of). The exact code that people will have to write if map operations panic on NaN will work today without changing the language. That's also true of a number of other run-time panics: for example, division by zero, out-of-bounds slice access, failed type-assertions, races on map operations, and attempts to put incomparable keys into interface-keyed maps. We panic on behaviors because they indicate likely bugs, not because those bugs cannot be avoided.
mentioned this issue Mar 6, 2018
mentioned this issue Jun 20, 2018
removed the label Aug 16, 2019
added the label Sep 3, 2019
to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.