Should binary operations be defined on float types? #177
Comments
I'd be ok restricting them to integer types. We have neg and abs already, so I've lately been pondering proposing adding copySign as well, which would give us the complete IEEE-754 set of quiet computational operations. That would satisfy most of use cases for floating point and/or/xor/not that don't need to be done on integers for other reasons anyway. |
I was the one requesting the logical operations on float32x4 (see #40 ). I found them useful when coding up the sinx4 benchmark. Looking at the code now though it seems that I ended up not using them afterall. |
Will any architecture permit an implementation faster than the polyfill? If not, maybe we should just leave it out, and let users compose the functions themselves. |
Requiring the developer to explicitly do a bitCast from a float to an int before doing any bit operations seems reasonable. Existing users would just have to do: SIMD.int32x4.fromFloat32x4Bits(input) The JIT will strip away the bit casting operations (no performance hit) and the code will be more readable. |
Well... I hope the JIT can strip it out, though with current spec language, it'll have to do the NaN->qNaN conversion. |
Maybe this can be fixed just for casts but leaving it in for stores. |
The sNaN->qNan conversion is dead so the compiler can easily strip away the casts. |
Good, so I'll just remove the float32 binary operations from the polyfill then? |
@littledan We'll discuss it in the next call, but, likely, yes. |
Note that performing an and on integer data is not the same as performing an and on floating point data. There is generally a one cycle stall involved when moving data between execution ports. See https://software.intel.com/en-us/forums/topic/279382 and section 3.5.2.3 in http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html . |
@juj Thanks for pointing this out; that sounds like a real issue. Do you know anything about the NEON behavior? Is the same instruction used on float32x4 and float64x2, removing the same cycle stall in both cases? I see a few ways to go from here:
Out of minimalism, I'd personally lean more towards the second or third option, but with more data I could be persuaded that it's best to add them back in. We are already planning on continuing to develop the SIMD spec (with the Universal API and the Long SIMD API), so it would not be so bad to add them into a later spec (and nothing stops implementations from having the functions before they are standardized). |
I think option 3 seems like the best path forward. I doubt this is going to be a performance issue in real world programs, and if I'm wrong, we can always add it back later. |
Don't want to be harsh or anything, but I don't want to play the "please prove this is a problem by finding a real world program which is hurt by this" game anymore. I find that the test cases I provide are rejected as synthetic and theoretical. I don't currently have a codebase at hand that would block on this performance limitation, and I doubt that I should spend much cycles in harvesting one. When developing native SSE code, I can routinely observe the latency difference of routing float ops via the int pipe. If the bitops on floats were added, then it the spec would be simpler and would allow removing the special case notes "This operation exists only on integer and boolean SIMD types." here: http://littledan.github.io/simd.html#simd-and . The lack of bitops on floats is already worked around in Emscripten, at https://github.com/juj/emscripten/blob/sse2_emmintrin_h/system/include/emscripten/vector.h#L58 . In general, this seems to boil down to the same as #59. |
@juj Optimizing for synthetic micro-benchmarks isn't a good approach. I'm impressed you can observe a fraction of a nanosecond (aka one cycle). I would love to see a real world program that demonstrates that. Anyways, I'm not sure how having an explicit cast at the JavaScript level would stop a motivated JIT from avoiding the cast at the machine level. It's an incredibly simple pattern to match. |
Sounds like we're resolved that we aren't including this. |
It seems that operations like and, or, not, etc are defined in the polyfill on float32x4 but not float64x2. Should they be defined on float types? Seems to me like they should be on both of those or neither. For float32x4 they are just defined as casting bits to int32x4, doing the binary operation, and casting back; seems to me like this would extend pretty cleanly to float64x2.
Naively, I'd suggest restricting these binary operations to integer types, but I don't have much background about use cases.
The text was updated successfully, but these errors were encountered: