Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Instructions that can be emulated by other instructions #68

Closed
bnjbvr opened this issue Sep 23, 2014 · 5 comments
Closed

Instructions that can be emulated by other instructions #68

bnjbvr opened this issue Sep 23, 2014 · 5 comments

Comments

@bnjbvr
Copy link
Contributor

bnjbvr commented Sep 23, 2014

I don't follow the rationale behind these 4 functions (SIMD.int32x4.withFlag{X,Y,Z,W}):

  • they don't map to a single instruction, in x86/x64/arm.
  • their behavior can be emulated with:
 SIMD.int32x4.withFlagX = function(vec, val) {
    return SIMD.int32x4.withX(vec, 0 - (!!val | 0));
 }
// or (less optimized)
SIMD.int32x4.withFlagX = function(vec, val) {
   return SIMD.int32x4.withX(vec, val ? 0xFFFFFFFF : 0x0))
}

Unless there is an obvious use case I am missing, could we delete this set of functions from the spec?

@bnjbvr
Copy link
Contributor Author

bnjbvr commented Sep 25, 2014

Actually, the issue is broader:

  • SIMD.type.zero() is equivalent to SIMD.type(0,..0). For float32x4,int32x4 it spares 2 chars, but for float64x2 it wastes 5 chars. If the reason to have it in the specs was that it maps directly to x86 instructions (xorps, xorpd, pxor), it seems that any good JIT compiler should be able to find SIMD literals and use optimized paths for generating the equivalent code?
  • SIMD.float32x4.clamp(x, low, high) is equivalent to SIMD.float32x4.min(SIMD.float32x4.max(x, low), high). SIMD implementations tend to use minps and maxps for this one, so what's the rationale behind this operator?

@bnjbvr bnjbvr changed the title SIMD.int32x4.withFlagX and such Instructions that can be emulated by other instructions Sep 25, 2014
@ghost
Copy link

ghost commented Sep 25, 2014

If I understand correctly, we have three different cases:

  1. withFlagX - does not have a single target instruction to be optimized to
  2. zero - should be just as optimizable as typex4(0,0,0,0)
  3. clamp - guaranteed optimization, whereas the min(max) pattern could potentially be missed (maybe the user writes it obtusely, maybe GVN breaks it apart accidentally)

My gut feeling is that 3 deserves a builtin function but 1 and 2 don't.

@bnjbvr
Copy link
Contributor Author

bnjbvr commented Sep 25, 2014

To these 3 instructions, we can add a fourth one: float32x4.scale. It doesn't have any SSE2 counterpart and it can be implemented as SIMD.float32x4.mul(SIMD.float32x4.splat(scalar), vec).

@bnjbvr
Copy link
Contributor Author

bnjbvr commented Oct 17, 2014

The same way, do we need:

  • int32x4.bool? can be emulated with int32x4(0xFFFFFFFF, 0x0, etc), afaik no matching SSE2 instruction
  • int32x4.flag{X,Y,Z,W}? can be emulated with (int32x4.X != 0x0), afaict no matching SSE2 instruction

@bnjbvr
Copy link
Contributor Author

bnjbvr commented Oct 17, 2014

+if we keep clamp, how should it behave wrt NaN? with -0 / +0?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant