Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Should binary operations be defined on float types? #177

Closed
littledan opened this issue May 27, 2015 · 15 comments
Closed

Should binary operations be defined on float types? #177

littledan opened this issue May 27, 2015 · 15 comments

Comments

@littledan
Copy link
Member

It seems that operations like and, or, not, etc are defined in the polyfill on float32x4 but not float64x2. Should they be defined on float types? Seems to me like they should be on both of those or neither. For float32x4 they are just defined as casting bits to int32x4, doing the binary operation, and casting back; seems to me like this would extend pretty cleanly to float64x2.

Naively, I'd suggest restricting these binary operations to integer types, but I don't have much background about use cases.

@sunfishcode
Copy link
Member

I'd be ok restricting them to integer types. We have neg and abs already, so I've lately been pondering proposing adding copySign as well, which would give us the complete IEEE-754 set of quiet computational operations. That would satisfy most of use cases for floating point and/or/xor/not that don't need to be done on integers for other reasons anyway.

@PeterJensen
Copy link
Collaborator

I was the one requesting the logical operations on float32x4 (see #40 ). I found them useful when coding up the sinx4 benchmark. Looking at the code now though it seems that I ended up not using them afterall.

@littledan
Copy link
Member Author

Will any architecture permit an implementation faster than the polyfill? If not, maybe we should just leave it out, and let users compose the functions themselves.

@johnmccutchan
Copy link
Collaborator

Requiring the developer to explicitly do a bitCast from a float to an int before doing any bit operations seems reasonable.

Existing users would just have to do:

SIMD.int32x4.fromFloat32x4Bits(input)
result =
SIMD.float32x4.fromInt32x4Bits(result)

The JIT will strip away the bit casting operations (no performance hit) and the code will be more readable.

@littledan
Copy link
Member Author

Well... I hope the JIT can strip it out, though with current spec language, it'll have to do the NaN->qNaN conversion.

@littledan
Copy link
Member Author

Maybe this can be fixed just for casts but leaving it in for stores.

@johnmccutchan
Copy link
Collaborator

The sNaN->qNan conversion is dead so the compiler can easily strip away the casts.

@littledan
Copy link
Member Author

Good, so I'll just remove the float32 binary operations from the polyfill then?

@johnmccutchan
Copy link
Collaborator

@littledan We'll discuss it in the next call, but, likely, yes.

@juj
Copy link

juj commented Jun 20, 2015

Note that performing an and on integer data is not the same as performing an and on floating point data. There is generally a one cycle stall involved when moving data between execution ports. See https://software.intel.com/en-us/forums/topic/279382 and section 3.5.2.3 in http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html .

@littledan
Copy link
Member Author

@juj Thanks for pointing this out; that sounds like a real issue. Do you know anything about the NEON behavior? Is the same instruction used on float32x4 and float64x2, removing the same cycle stall in both cases? I see a few ways to go from here:

  • Add this back in for both float32x4 and float64x2.
  • Don't add it in, and in some implementations, teach the compiler to transform casting to an int type, performing an and and casting back to float into this SSE instruction. Depending on how your compiler is laid out, this could be either in value numbering or register allocation.
  • Don't add it back in and don't worry about it in implementations until more data is collected from real programs, using that to prioritize what to optimize rather than trying to optimize everything where we see a possibility with the instruction set.

Out of minimalism, I'd personally lean more towards the second or third option, but with more data I could be persuaded that it's best to add them back in. We are already planning on continuing to develop the SIMD spec (with the Universal API and the Long SIMD API), so it would not be so bad to add them into a later spec (and nothing stops implementations from having the functions before they are standardized).

@johnmccutchan
Copy link
Collaborator

I think option 3 seems like the best path forward. I doubt this is going to be a performance issue in real world programs, and if I'm wrong, we can always add it back later.

@juj
Copy link

juj commented Jun 23, 2015

Don't want to be harsh or anything, but I don't want to play the "please prove this is a problem by finding a real world program which is hurt by this" game anymore. I find that the test cases I provide are rejected as synthetic and theoretical. I don't currently have a codebase at hand that would block on this performance limitation, and I doubt that I should spend much cycles in harvesting one. When developing native SSE code, I can routinely observe the latency difference of routing float ops via the int pipe.

If the bitops on floats were added, then it the spec would be simpler and would allow removing the special case notes "This operation exists only on integer and boolean SIMD types." here: http://littledan.github.io/simd.html#simd-and .

The lack of bitops on floats is already worked around in Emscripten, at https://github.com/juj/emscripten/blob/sse2_emmintrin_h/system/include/emscripten/vector.h#L58 . In general, this seems to boil down to the same as #59.

@johnmccutchan
Copy link
Collaborator

@juj Optimizing for synthetic micro-benchmarks isn't a good approach. I'm impressed you can observe a fraction of a nanosecond (aka one cycle). I would love to see a real world program that demonstrates that.

Anyways, I'm not sure how having an explicit cast at the JavaScript level would stop a motivated JIT from avoiding the cast at the machine level. It's an incredibly simple pattern to match.

@littledan
Copy link
Member Author

Sounds like we're resolved that we aren't including this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants