Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core data types: IEEE floating point (in)compatibility #58

Open
jar398 opened this issue May 24, 2023 · 36 comments
Open

core data types: IEEE floating point (in)compatibility #58

jar398 opened this issue May 24, 2023 · 36 comments
Assignees

Comments

@jar398
Copy link
Contributor

jar398 commented May 24, 2023

(Moved here from #5 (comment))

It's proposed (sorry, can't find the exact reference now) that -0 be excluded from OCapN core data values. While this value may not be used often in real programs, it is used and is potentially important. It seems unfortunate to be compatible with IEEE 754 floating point while deviating in this one small aspect. The difference could lead to program failures that would be very difficult to find, test for, and diagnose and could happen at inopportune moments.

I don't remember the rationale for this exclusion.

I'm less concerned about NaNs but it's conceivable they might matter as well.

@erights
Copy link
Collaborator

erights commented May 26, 2023

The reason we'd prefer to omit -0 is to avoid the proliferation of similar but different equality operators. JavaScript has

  • Object.is which distinguishes -0 from 0, and judges all NaNs to be the same.
  • SameValueZero is spec name the equality test used by sets to test membership or maps to look up a key. It judges -0 to be the same as 0, and all NaNs to be the same.
  • === which judges -0 to be the same as 0, and all NaN to be different from all NaN including themselves. This is based on IEEE arithmetic equality, which is indeed non-reflexive on NaNs.
  • Casting of floats to bit representations, which reveals that not all NaNs are the same.

We spent years on the Records and Tuples proposal, which started out simple, remained simple in most ways, but could never resolve how to generalize JS equality ops to operate on containers containing -0 or NaNs. Ultimately, despite literally years of effort, this "minor detail" caused the proposal to fail.

FWIW, In JS, JSON.stringify(-0) === '0'. Curious what it does in other languages. Curious if there is any evidence that this loss of precision has ever actually caused any problems. OTOH, it's not a great test, because JSON itself cannot encode NaN or either Infinity, and so already has more severe losses wrt IEEE.

The stance of the JS language on the observable NaN bits when casting numbers to bits is interesting: The JS language clearly states that there is only one NaN in the JS type number. But the JS language does not specify which NaN bit representation should be revealed by casting. Rather, it considers the bit representations for all IEEE NaNs to be in an equivalence class. A NaN can cast into any of them. All of them must cast back to a NaN. Thus, NaN does faithfully round trip through the bit representations, but no one bit representation is guaranteed to round trip through NaN.

WASM has a similar stance on NaN bits. In both cases, this is spec determinacy. Those WASM systems that need strong determinism, like all WASM-on-blockchain systems, refine the WASM spec to require one particular bit encoding. But this is just a conforming refinement of the spec. Any WASM system that conforms to the refined spec also conforms to the WASM spec itself.

From the perspective of defining the OCapN abstract data model, what it means to omit -0 can equivalently be described as treating -0 and 0 as in the same equivalence class. Either one, round tripping through the protocol, coming back as the other, is a successful round trip. In exchange for this loss of precision, there would be no difference between Object.is and SameValueZero.

As with deterministic WASM, if OCapN treats -0 and 0 in the same equivalence class, particular implementations of OCapN can state that they obey a refinement of the OCapN spec which

  • preserved the difference when communicating only among such refined systems, or
  • always normalizes -0 to 0 on transmission, as JS's JSON.stringify does.

Agoric would do the second.

None of this is evidence that the loss would not still be problematic. It is currently just my guess that the pain of this loss of precision is greater than the pain of building abstractions over number that want to consider -0 and 0 equivalent for some purposes, and distinct for other purposes.

I would want to see some real evidence that such a loss in transmission between system would actually cause, or has actually caused, real problems in practice.


Complicating my own position, I want OCapN to guarantee that denormals round trip. There are strong reasons why some platforms want to lose the extra precision of denormals when doing internal high speed computation. But OCapN doesn't do any arithmetic over numbers. I see no benefit from allowing OCapN to reduce a denormal to a nearby non-denormal merely because the numbers were transmitted between systems.

I'm not suggesting we should have any controversy about denormals. I bring this up because it seems to be a bit on the opposite side of the argument I'm making about -0, possibly weakening my position.

@dckc
Copy link
Collaborator

dckc commented May 26, 2023

To keep the ball moving: do Spritely and capnp agree? Is it ok to omit the distinction from -0 and 0 in the float64 type (in the sense of: #3 (comment) )?

@zenhack ? @cwebber ? @tsyesika ?

@jar398
Copy link
Contributor Author

jar398 commented May 26, 2023

Forgive me for pushing on this... this may be the only chance to review the question for a long time.

Note that my comment has nothing to do with existing OCapN-like implementations; they could all agree on eliminating -0, and all still be incorrect from an IEEE 754 point of view, which has to do with portability of programs that use floating point, not ocap interoperability. Taking an existing program that uses IEEE floating point and making it run in a distributed manner over OCapN seems a perfectly plausible and important use case, and this becomes much much harder if the program has to first be reviewed globally for its potential significant use of -0.

The philosophy of IEEE 754 is that every possible kind of floating point hardware is going to have warts, and the only way to get portability of floating point programs is to just decide once and for all what those warts are and what the behavior has to be for each one. Once this is done, and the hardware is validated, we don't have to worry about getting different results on different platforms. This philosophy has worked beautifully. If you want to eliminate -0 you are saying there is one 'wart' that is behaving at variance to the standard, so the whole system is no longer really IEEE 754.

Remember that most programmers don't understand floating point and won't care about this, but a certain important community (including scientific programming and perhaps some game developers?) cares enormously about the fine details of floating point. I'm talking about satisfying the latter, not the former.

What JSON does with regard to reading and printing floating point does not seem relevant to me. Serious floating point programmers are already wary of reading and printing, which are almost always lossy. FWIW Python supports a distinct -0.0, but has 0.0 == -0.0, arguing that == addresses certain kinds of bugs (obviously different kinds of bugs than what I'm talking about). (I'm pretty sure no one should ever be comparing floating point numbers for equality, by the way.)

I sort of see the rationale now, so thanks for the explanation. Looks like a rock and a hard place.

"I want OCapN to guarantee that denormals round trip" sounds as if it might address my concern, if we broaden it a little to "I want OCapN to guarantee that IEEE 754 floating point entities (numbers whether normalized or not, infinities, zeroes, NaNs) round trip". That is, we should leave any interpretation of floats up to the host system(s), with the possible exception of ordering, which IEEE 754 prescribes bitwise in a simple manner.

@erights
Copy link
Collaborator

erights commented May 26, 2023

@zenhack
Copy link
Collaborator

zenhack commented May 26, 2023

I don't think capnp has a dog in this fight; every implementation I know of just casts the bits and calls it a day, so we can be as faithful when round-tripping as any proposal we might come up with.

@cwebber
Copy link
Contributor

cwebber commented May 31, 2023

Remember that most programmers don't understand floating point and won't care about this, but a certain important community (including scientific programming and perhaps some game developers?) cares enormously about the fine details of floating point. I'm talking about satisfying the latter, not the former.

This is a really good point. I will say that from Spritely's high-level work, of targeting social network issues, faithfulness to IEEE floating point strangeness is not something that is likely to matter to us. However I do have many friends in the gamedev and scientific computing world, and I know they care deeply about these issues. Since, in effect, we really are using IEEE floats, I think we should support all their needs fully. Again, not for a Spritely's high-level goal needs, but so that OCapN is most useful to a broad set of possible users.

@tsyesika
Copy link
Contributor

I think what @jar398 brought up in #58 (comment) makes a lot of sense and if we can be faithful to the spec, that'd be good.

@erights obviously you've expressed a preference to not have -0 round trip or normalize it to zero. How much of a problem would it be supporting negative zero over CapTP?

@erights
Copy link
Collaborator

erights commented May 31, 2023

How much of a problem would it be supporting negative zero over CapTP?

I don't know, but we can explore it. Agoric would at least need to

  • Extend smallcaps to encode -0

  • Extend encodePassable to encode -0 in an order preserving manner Attn @gibson042

  • Have OCapN -0 round trip through a JS -0

  • Ensure all JS handling of passable data does not accidentally lose the distinction between -0 and 0. Since this is not an existing correctness constraint, it does require checking.

  • compareRank(-0, 0) === 0, i.e., they are tied for the same rank. No surprise.

  • compareKeys(-0, 0) === 0

    • Therefore sameKey(-0, 0), like JS SameValueZero
    • Therefore mapStore.set(-0, 7); mapStore.get(0) === 7. Likewise CopyMaps. Like JS Maps and Sets
    • Therefore match(-0, 0) === true and vice versa. No current pattern matching operator would see these as different. Possibly, no pattern matching operator ever would?

This one directly touches on the unpleasantness we experience in JS, with too many equality operators. (Though still fewer than many Lisps!) Until this step, if two JS values are sameKey, they are equivalent in the distributed object semantics defined by @endo/pass-style.

The representational trick that some of our Maps use for keys (and likewise Sets and Bags for elements) is to use encodePassable and then store and look up by the corresponding string encoding. The string encoding must be distinct though, to guarantee round tripping. JS Maps and Sets implement their SameValueZero semantics by normalizing a -0 key to 0 on entry into a Map or Set, so enumerating the keys/elements will never see a -0 key. We could do likewise with CopySets, CopyBags, CopyMaps, SetStores, and MapStores.

Thus, -0 would still not round trip through being stored into a Map/Set and then being retrieved. But since Maps, Sets, Keys, and pattern matching are not concepts until above @endo/pass-style, the lack of round tripping at that level doesn't cause any agreement problems with the OCapN data model. It is also consistent with @endo/pass-style itself not having a strong theory of equality, and leaving such theories to higher levels like @endo/patterns. By this account, we could say that the @endo/patterns equality theory hold -0 to be equal to 0, consistent with its sameKey and match behavior.

This is all plausible enough that we should give it a serious try. I think it is also plausible enough that OCapN can proceed tentatively assuming that we will succeed unless/until we report an unexpected problem.

Only one NaN though, please!
Or, more precisely, any NaN round tripping into any other NaN is considered a valid round trip. Please!

I cannot imagine anyone has a problem requiring denormals to round trip. But just reiterating in case anyone sees a problems. Anyone?

@erights
Copy link
Collaborator

erights commented May 31, 2023

See endojs/endo#1602

@jar398
Copy link
Contributor Author

jar398 commented Jun 23, 2023

Recognizing that NaNs seem less important than -0, but also recognizing I don't understand how the community uses NaNs so it's possible they're important too:
Treating floats similarly to bit strings (but with a slightly different most-granular total ordering, the one in the IEEE spec) seems really simple and easy to understand (see @zenhack above). Remind me what benefit there is (@erights "Only one Nan though, please!") to allow ocapn nodes that swap one NaN for another as a float passes through it?

@erights
Copy link
Collaborator

erights commented Jun 23, 2023

JS and wasm both allow NaN canonicalization, and some implementations do so. Thus, only one NaN could round trip though these languages and back as a NaN.

@erights
Copy link
Collaborator

erights commented Jun 23, 2023

As a fellow language implementor, you may be amused that one of the uses of NaN canonicalization is NaN boxing, so that non-canonical NaN values can be interpreted as something else.

@jar398
Copy link
Contributor Author

jar398 commented Jun 23, 2023

There are lots of search results of "NaN canonicalization". The first one I looked at (WebAssembly/design#1463) seemed quite informative but I didn't have the energy to digest it thoroughly (maybe later). However the topic seems to be what happens when operators are applied, not when a value is passed through (which doesn't come up really). As far as I can tell, if you were to pass a noncanonical NaN into webassembly code, which then passes it through and spits it out somewhere (no operators like max or + 0 applied), it comes out unchanged. Reading between the lines it sounds like the same would be true of ecmascript. And that's the situation I think is important in OCapN: if you pass a Nan in a message through an identity function in the vat, sending it onwards (either to an object or to another node) unchanged, will the Nan be swapped out for a different NaN. Or: is an OCapN node permitted to canonicalize a NaN on receipt or on transmission. I think we can say no, without compromising the ability of the objects connected to the OCapN layer to canonicalize NaNs (especially when they perform operations like max). That is, leave it up to the objects when and whether to canonicalize or internally use boxed NaNs.

@erights
Copy link
Collaborator

erights commented Jun 23, 2023 via email

@jar398
Copy link
Contributor Author

jar398 commented Jun 23, 2023

couldn't you unbox when the NaN exits whatever environment boxed it?

@erights
Copy link
Collaborator

erights commented Jun 23, 2023 via email

@jar398
Copy link
Contributor Author

jar398 commented Jun 23, 2023

What I was thinking was that logically we have a circle with OCapN on the outside and webassembly or JS on the inside (some sort of logical 'container'). If a Nan as it goes from outside to inside the circle gets boxed (which as far as I can tell in my reading so far never happens), then you could just invert that operation (unbox) as it goes from inside to outside the circle, and from an external perspective nothing has happened to the data.

If there are any operations involved causing the NaN to be replaced by a different NaN, or by a box, that's not a problem, it's just part of the circle's (object's) behavior.

But I think this is moot because a Nan that is not subject to any operation is left untouched (and unboxed?) by wasm or JS. Maybe I am wrong.

This goes to my hypothetical use case which was using OCapN to decompose and 'distribute' a complex program, perhaps even a C++ program, that uses floating point, and is connected together using generic modules (routing, scheduling, scatter/gather, etc) built using OCapN, written perhaps in some other language like JS. The connector code doesn't know what's being done with the floating point and itself does no operations on floating point, but the code being connected cares that the data is transmitted through the connectors faithfully.

@mhofman
Copy link

mhofman commented Jun 23, 2023

Can someone clarify something: what does it mean for a JS node to "read" a NaN? JS doesn't have a native way of serializing NaN values, which means the serialization / deserialization has to use a "custom" encoding. From what I understand, the suggestion above is that the encoding is simply the 64bits of the IEEE floating point. I assume that means the serialization / deserialization step would use Float64Array then? My understanding of ECMA262 is that the implementation is free to canonicalize a NaN when reading / writing through Float64Array, but is not obligated to.

@zenhack
Copy link
Collaborator

zenhack commented Jun 26, 2023

I assume "read" means "convert into a native js number." how we do that is probably up to us, since we'll have to implement the decoders.

I realized a couple small caveats wrt capnp: the node implementation parses things into native data structures, so probably has the same constraints as agoric is dealing with re: round-tripping.

I have a couple suspicions:

  • Other languages will have similar constraints. E.g. I think lua does some trickery with NaN representations that would preclude being faithful about which nan things get decoded/encoded as
  • However we specify this, I have no real faith that implementers will pay special attention to NaN; I think if we say you're not allowed to pick a different NaN, we'll just end up with a number of "wrong" implementations out there, and the right ones will just have to obey Postel's law, rather than the spec as written.

@mhofman
Copy link

mhofman commented Jun 26, 2023

If the ocapn spec says that any NaN value when decoded or encoded is canonicalized, then even if some implementation out there does not canonicalize when encoding, receiving from that implementation will be compatible, as long as the receiving implementation is able to decode a non canonical NaN value.

As I mentioned, there is no guarantee in JS that the language engine will not do this canonicalization, so I don't believe any guaranteed preservation through a JS node is in fact possible.

@jar398
Copy link
Contributor Author

jar398 commented Jun 27, 2023

This has been helpful. For me the issue goes to the structure of the OCapN specs, which seem to have a lot of moving parts.

A traditional protocol spec like HTTP is purely syntactic: it says what messages are OK to send, perhaps in the context of previous messages. Any semantics is purely motivational or advisory. It gives a concrete syntax, i.e. no notion of an abstract syntax with 'bindings' to a set of concrete syntaxes. It does not talk about how one programs an endpoint at all - programming language, data model, anything like that.

We seem to be talking about the 'bindings' to various languages and what is supposed to happen to data when that happens (e.g. Nan changes or 0/-0 canonicalization). Those questions just don't come up in an IETF protocol spec. They would be relegated to a language-specific API for using the protocol.

We know the protocol concrete syntax is orthogonal to the language used by the endpoint, since otherwise we could not get interoperation.

I think making these separations might make talking about this issue a little easier. E.g. we could say that floating point in the protocol is IEEE floating point with no substitutions on transmission, even if a particular language binding did not use IEEE floating point, or "modified" it via canonicalization or boxing.

@zenhack
Copy link
Collaborator

zenhack commented Jun 27, 2023

I think what we're really trying to do is define a set of values and an equivalence relation on that set, and the present questions are:

  1. Should -0 and +0 be in the same equivalence class?
  2. Should all NaN values be in the same equivalence class?

...I actually think that the fact that some specs only specify syntax formally is not necessarily something to emulate. There are counterexamples to this, e.g. The Definition of Standard ML, and if you look far enough back you can find examples of specifications that described even syntax in plain English, which is unthinkable today

I think what we do/don't decide to formalize ought to be motivated by what will make building interoperable software easiest, which is going to require talking about possible & likely implementations, even if those discussions don't make it into the spec proper.

@zenhack
Copy link
Collaborator

zenhack commented Jun 27, 2023

I guess an operational way to specify this type of requirement is, for the Echo Gc object specified in the test suite, If e.g. the test suite passes in -0 and gets back +0, should it fail?

@erights
Copy link
Collaborator

erights commented Jun 27, 2023

I guess an operational way to specify this type of requirement is, for the Echo Gc object specified in the test suite, If e.g. the test suite passes in -0 and gets back +0, should it fail?

If we decide that floating point -0 and 0 are distinct equivalence classes, yes, such a test should fail. The first test for maintenance of equivalence classes should be round trip tests through combinations of concrete representation conversions.

@jar398
Copy link
Contributor Author

jar398 commented Jun 27, 2023

Re specs and syntax, a protocol spec is very different from a programming language spec. IETF has been very successful with its style and I'd be reluctant to innovate. But maybe there is another protocol spec we can emulate.

I don't think what you're saying is that different from what I said. You're saying that the bindings in the various language should be designed, perhaps even coordinated, to promote interoperation. That is hard to argue with. But it does not preclude saying something stronger at the protocol syntax level, such as that all IETF floats are expressible 'on the wire', even if some language bindings choose to normalize them on receipt. That way two bindings that don't want to normalize can talk to one another without normalization.

Maybe that's an interoperability risk, and it would be better if those two endpoints represented their IETF floats as a data type distinct from OCapN floats.

@erights
Copy link
Collaborator

erights commented Jun 27, 2023

...I actually think that the fact that some specs only specify syntax formally is not necessarily something to emulate.

I agree. Or rather, I think we need to think in terms of layers of spec, and dependencies between them. The lowest data layer is the abstract data model, which can equally well be described as an abstract syntax. It defines the equivalence classes. For Agoric, @endo/pass-style must be consistent with that.

Concrete language bindings specs are layered on top of the abstract spec. Hopefully but not necessarily one per endpoint language. For Agoric, this is the remaining responsibility of @endo/pass-style for JavaScript bindings.

Concrete syntax specs (hopefully but not necessarily singular) are also, and separately, layered on top of the abstract spec. For Agoric, @endo/marshal provides bindings for smallcaps and encodePassable concrete syntaxes.

Abstract tagged-interpretation layers are also, and separately, layered on top of the abstract spec. For Agoric, @endo/patterns defines such a higher level interpretation. By virtue of tagged being the extension point defined by the base abstract layer for such extensions, we must assume that there are an open ended set of such extensions, that are generally ignorant of each other and must co-exist smoothly.

Concrete language-bindings for taggeds are then layered on the abstract tagged interpretation. For Agoric, the concrete APIs provided by @endo/patterns are the JS language binding for the abstract tagged interpretation also represented by the @endo/patterns package.

image

is the most relevant slide for this part of the layering. From https://ocapn.org/files/ocapn-layers-orders-ocapn-talk.pdf
from talk at https://youtu.be/htS-2gvY3Cs?t=722

@erights
Copy link
Collaborator

erights commented Jun 27, 2023

Re specs and syntax, a protocol spec is very different from a programming language spec. IETF has been very successful with its style and I'd be reluctant to innovate. But maybe there is another protocol spec we can emulate.

Maybe programming language specs are simply the better precedent to emulate. The advantage of starting with the abstract syntax / abstract data model is that both protocol and language are concrete syntaxes of the same abstract syntax. The whole premise of what we're doing is that we at least have adapters at each endpoint that translate between a concrete language implementation and a concrete protocol implementation. If the equivalence classes do not round trip across such adapters between concrete syntaxes, then there is a bug somewhere.

This same perspective enables concrete language bindings to be coupled to each by other non-protocol means. We would want this case to be equivalent, from the pov of the code in each respective language, to being coupled via the protocol. For the same spec to govern this non-protocol-based interoperation, it cannot fundamentally be a concrete protocol spec.

@erights
Copy link
Collaborator

erights commented Jun 27, 2023

I don't think what you're saying is that different from what I said. You're saying that the bindings in the various language should be designed, perhaps even coordinated, to promote interoperation. That is hard to argue with.

I think I'm saying this, but also something much stronger: That the fundamental spec is the definition of equivalence classes over the abstract data. Promoting interoperation is the point! It is not simply a nice-to-have additional property.

But it does not preclude saying something stronger at the protocol syntax level, such as that all IETF floats are expressible 'on the wire', even if some language bindings choose to normalize them on receipt.

This can be accommodated within the abstract-first layering for any given concrete protocol. Let's take the NaN example. A protocol that just transmits the IEEE bit representation with naturally be able to represent all IEEE representable NaNs. But the binding of this concrete syntax to the abstract data model would be to say that all of these represent the one NaN equivalence class. Thus, an adapter, perhaps a membrane that coverts between one instance of this concrete syntax to another instance of this concrete syntax, if it substituted one NaN representation for another in the conversion, would still preserve the abstract requirements, and would still pass all round trip tests.

That way two bindings that don't want to normalize can talk to one another without normalization.

If their correctness depends on not normalizing, then I'd say they depend on a stronger spec than ocapn. Placing the above ocapn-correctness-preserving adaptor between than would violate their correctness. The spec they count on can be a refinement of the ocapn spec, in that any correct implementation of their refined spec is necessarily a correct implementation of ocapn. But this hypothetical adapter would be ocapn conformant while violating the refined spec. This is a perfectly sensible layering.

@erights
Copy link
Collaborator

erights commented Jun 27, 2023

Cross referencing #47 (comment) for the application of this abstract-syntax / equivalence-class / round-trip perspective to the outstanding Unicode string questions.

@dckc

This comment was marked as outdated.

@erights

This comment was marked as outdated.

@jar398
Copy link
Contributor Author

jar398 commented Jun 28, 2023

@erights I don't agree that the fundamental spec is a definition of equivalence; that doesn't feel right to me, ontologically. I think any spec for anything has to be an actionable definition of conformance of a given artifact to some set of criteria, and an equivalence predicate (or even a 'data model') doesn't provide this on its own. What I'm trying to get my head around now is what kind of artifact we're talking about (an endpoint, I presume, but what exactly constitutes an endpoint) and how conformance is to be tested in general, given that the 'data model' (or equivalence relation etc.) seems to be so important while at the same time the language and protocol details can vary. I'm going to go off and think about this for a while (it's outside the scope of this issue), but I'm happy if others continue to try to develop consensus on this issue. At least I'm starting to understand the non-IEEE (single zero and/or single NaN) position now, which I didn't before.

@erights
Copy link
Collaborator

erights commented Jun 30, 2023

Just quickly googling for an explanation of NaN boxing, I found WebAssembly/design#1463 (comment) which I like.

@zenhack
Copy link
Collaborator

zenhack commented Jun 30, 2023

@jar398, I think informally my comment above (#58 (comment)) captures how I think we will test this: fuzz an echo server and make sure the return values obey the equivalence relation. I think the harder question is what does the spec language look like, and that is non-obvious to me as well -- but it does seem like something that's out of scope here, and more generally probably can be figured out independently from agreeing on what that equivalence relation is.

@erights
Copy link
Collaborator

erights commented Jun 30, 2023

A traditional protocol spec like HTTP is purely syntactic: it says what messages are OK to send

actionable definition of conformance of a given artifact

I am trying to understand your position, but I do not yet. What artifact does one test to test conformance to a syntactic protocol spec?

@zenhack
Copy link
Collaborator

zenhack commented Jun 30, 2023

Let's move the discussions about what a spec should look like over to #71, and keep this issue confined to floats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants