Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quaternions #27

Open
dicepd opened this issue Jan 26, 2018 · 66 comments
Open

Quaternions #27

dicepd opened this issue Jan 26, 2018 · 66 comments

Comments

@dicepd
Copy link
Collaborator

dicepd commented Jan 26, 2018

What fun this has been so far.

Just to let you know the changes.

Made mul conform to first second as there is a MulFirst (not tested yet) which does non Math order.
Commented out large parts of CreateEuler, trying to hit the exceptions that they are guarding against, but not managed yet.
Rotations conform to existing rotations.
Euler create gives correct numbers.

Default create rpy not done as we will probably have to decide on world up for the system as this is meant to be the quick quat that defaults to world.

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 26, 2018

I found this about quats
https://blog.molecular-matters.com/2013/05/24/a-faster-quaternion-vector-multiplication/
I amy code this up and test when I have finished the quat tests and added to from Matrix

@jdelauney
Copy link
Owner

It have also some interesting articles. It will be no difficult to implement

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

Right when I get to Slerp ordinary Slerp works fine.
How ever Slerp with spin is fundamentally flawed from the first line of code onward.
It may give increasing angles but not in a defined manner.

cost:=VectorAngleCosine(QStart.ImagPart, QEnd.ImagPart);  <--- Imag part of quat holds half angles 

Again no one has used this in Scene, another for the bin? I will check in as is, so you can play with the behaviour to see if I am 'doing it wrong'

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

Ok maybe you can scale with quat, something I had not seen before

You don’t seem to find it in writing anywhere, but quaternions support uniform scaling just fine: p’ = q p q* (simplified rotation form)…multiply q by non-zero scalar ‘s’: (qs) p (qs)* = ss (q p q*) = ss p’. So sqrt(s) is a uniform scale factor and these obviously naturally compose.

@jdelauney
Copy link
Owner

Many things don't work here with Quaternion :

  • "CreateTwoUnitAffine:Sub3 Z failed " expected: <0,7071> but was: <1>

  • "CreateTwoUnitHmg:Sub3 Z failed " expected: <0,7071> but was: <1>

  • "TestOpMul:Sub5 X failed " expected: <-0,5> but was: <0,5>

  • "Normalize:Sub1 X failed " expected: <0,5> but was: <2>

  • "MultiplyAsSecond:Sub10 Y failed " expected: <0,5> but was: <-0,5>

  • "SlerpSingle:Sub3 Z failed " expected: <0,3827> but was: <1>

  • "SlerpSingle:Sub3 Z failed " expected: <0,9239> but was: <0,3536>

  • Quaternion * Quaternion no match(ImagePart.X: -117,29015 ,ImagePart.Y: -18,92155 ,ImagePart.Z: 77,51254 , RealPart.W: 54,67465) --> (ImagePart.X: 133,66414 ,ImagePart.Y: 7,49755 ,ImagePart.Z: -46,91354 , RealPart.W: 54,67465)

  • Quaternion Normalize no match(ImagePart.X: 0,00000 ,ImagePart.Y: 0,00000 ,ImagePart.Z: 0,00000 , RealPart.W: 0,00000) --> (ImagePart.X: 0,00000 ,ImagePart.Y: 0,00000 ,ImagePart.Z: 0,00000 , RealPart.W: 0,00000)

What the change you did exactly ? Normalize and mul worked before

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

Are you normalising 3 element or 4 with Quats you have to normalise all four not just the inaginary vector

"CreateTwoUnitAffine:Sub3 Z failed " expected: <0,7071> but was: <1>

this result would point to that, Quats are 4 dimensional beasts so are normalised as such. It is not a vector + flag or Vector + length. So if you copied Vector normalise then this would be the cause.

@jdelauney
Copy link
Owner

I' m doing nothing just ran the test

@jdelauney
Copy link
Owner

I found article in french http://mecaspa.cannes-aero-patrimoine.net/SCAO/QUATERN/complements/som_quat.htm#note_calcul_perso but don't understand well all yet. I need little more time

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

Ok I just looked at the win64 SSE and the above is exactly the reason for most of the errors.
only error in win64 native is slerp spin

@jdelauney
Copy link
Owner

by comment andps xmm2, [RIP+cSSE_MASK_NO_W] in normailize, now the SlerpSingle func is working but not others

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

Now onto the gimbal lock code in Euler Create. What I think has happened here is the original code in the sub function was the first written, someone read about gimbal lock and decided to 'protect' original code with some code found somewhere, not realising that the original code was gimbal lock safe as it creates 3 quats and returns the product of the three, This is inherently safe from gimbal lock as it is combining quats not euler angles.

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

maybe this

Made mul conform to first second as there is a MulAsSecond (not tested yet) which does non Math order.

This cured a lot of ills in the pascal. So tests reflect what it naturally broke. Try swapping A and B registers

@jdelauney
Copy link
Owner

jdelauney commented Jan 27, 2018

Try swapping A and B registers

In which function ???? i'm lost, normailze ? or Mul

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

I have checked in the fixes

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

Normalize could do with reworking and simplifying but it does work

@jdelauney
Copy link
Owner

jdelauney commented Jan 27, 2018

Ok, i'll compare with old code, i don't see what you're doing. So now only 1 fail :

  • "SlerpSingle:Sub3 Z failed " expected: <0,9239> but was: <0,3536>

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

That one is to be expected that is the routine that makes no sense to me.

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

Thinking ahead a bit while I finish Quat and Matrix what functionallity are we missing from fustrum. That I think, is the last piece of the jigsaw to a simple software world line render. We have plane and the ability to test for in plane. Do we need a create from Quat/EyeVector or similar.

@jdelauney
Copy link
Owner

jdelauney commented Jan 27, 2018

Yes, you're right missing some functions, but not many. For a 3D word render. After some learning somethings about quaternion. I'm thinking it will good if we'll add RotateWithQuaternion functions in TGLZVectorHelper. From what I read, we would have several advantages to using quaternions in rotations rather than matrices

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

The one issue I had withquat was needing to scale.
Take boidz as an example with a simple 3D mesh .
Place master boidz at center fustrum at suitable scale.
For each boid
work out where center of boid[i] is in relation to master boid
locally orient boid
work out scale near to far
add Boid orient + boid scale + boid pos to a matrix and apply

Without quats that is a lot of Mat manipulation for each item
with quat we have delta orient, and if scale works then a quat move and scale from two vectors center of viewplane (fixed for all) and point project center boid to viewplane, so two quat create and two quat mul array. No idea if this is more efficient for low poly but may be, for high poly maybe not.
But then it maybe more efficent to create the two quats convert to Mat and Add two mat.

@jdelauney
Copy link
Owner

jdelauney commented Jan 27, 2018

I found this for unity (it's in french) https://www.youtube.com/watch?v=EWQCsjp3NvY it describe very well how Unity work with quaternion and why.
I understand much better quaternion with this and making the link with maths formulas.
In terms of performance computing quaternion is less expensive than matrices some infos
(near twice less operations)
I'm also falling on that and this. in an article. What do you think about quaternion structure and functions ?

@jdelauney
Copy link
Owner

I'm trying to fix slerpSpin

  • "SlerpSingle:Sub3 Z failed :
    (ImagePart.X: 0,00000 ,ImagePart.Y: 0,00000 ,ImagePart.Z: -0,38268 , RealPart.W: -0,92388)" expected: <0,9239> but was: <-0,3827>

Like we can see the result is in W

@jdelauney
Copy link
Owner

Ok by accessing var in with aq4.Realpart instead aq4.Z get :

  • "SlerpSingle:Sub3 Z failed : (ImagePart.X: 0,00000 ,ImagePart.Y: 0,00000 ,ImagePart.Z: -0,38268 , RealPart.W: -0,92388)" expected: <0,9239> but was: <-0,9239>

@jdelauney
Copy link
Owner

jdelauney commented Jan 27, 2018

After some mino changes, now i've :

  • "SlerpSingle:Sub7 Z failed (ImagePart.X: 0,00000 ,ImagePart.Y: 0,00000 ,ImagePart.Z: 1,00000 , RealPart.W: 0,00000)" expected: <0,766> but was: <1>

Have you swap components results in functionnal test ? or it is something wrong

@jdelauney
Copy link
Owner

jdelauney commented Jan 27, 2018

After some minors change, the problems come always from the Z

  • "SlerpSingle:Sub7 Z failed (ImagePart.X: 0,00000 ,ImagePart.Y: 0,00000 ,ImagePart.Z: -0,17365 , RealPart.W: 0,98481)" expected: <0,766> but was: <-0,1737>

Now the result don't match. It's a pain

@jdelauney
Copy link
Owner

if i normailze the result i get :

  • "SlerpSingle:Sub4 Z failed " expected: <-0,3827> but was: <0,3827>

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

sorry just back from shopping, tests may may not be correct as I had no idea if spin was 180 or 360 thats what I was trying to determine when nothing worked in a sane manner

@jdelauney
Copy link
Owner

now if i negate result.z : ( and i change slerpsingle to SlerpSpin, for not to be confused)

-"SlerpSpin:Sub7 Z failed (ImagePart.X: 0,00000 ,ImagePart.Y: 0,00000 ,ImagePart.Z: -0,17365 , RealPart.W: -0,98481)" expected: <0,766> but was: <-0,1736>

I don't understand why result is false at this stage

this is the change i've made for testing :

function TGLZQuaternion.Slerp(const QEnd: TGLZQuaternion; Spin: Integer; t: Single): TGLZQuaternion;
var
   to1: array[0..4] of Single;
   phi,omega, cosom, sinom, scale0, scale1: Extended;
// t goes from 0 to 1
// absolute rotations
begin
   // calc cosine
   cosom:= Self.ImagePart.X*QEnd.ImagePart.X
          +Self.ImagePart.Y*QEnd.ImagePart.Y
          +Self.ImagePart.Z*QEnd.ImagePart.Z
	  +Self.RealPart*QEnd.RealPart;
   // adjust signs (if necessary)
   if cosom<0 then
   begin
      cosom := -cosom;
      to1[0] := - QEnd.ImagePart.X;
      to1[1] := - QEnd.ImagePart.Y;
      to1[2] := - QEnd.ImagePart.Z;
      to1[3] := - QEnd.RealPart;
   end
   else
   begin
      to1[0] := QEnd.ImagePart.X;
      to1[1] := QEnd.ImagePart.Y;
      to1[2] := QEnd.ImagePart.Z;
      to1[3] := QEnd.RealPart;
   end;
   // calculate coefficients
   if ((1.0-cosom)>cEpsilon) then
   begin
      //Cosom := Cosom
      omega:=GLZMath.ArcCos(cosom);
      Omega :=(Omega + DegToRadian(Spin*360));
      sinom:=1/Sin(omega);
      phi:=omega;// + Spin * cPi;
      scale0:=sin(omega - t * phi) * sinom;
      scale1:=sin(t * phi) * sinom;
      //scale0:=Sin((1.0-t)*omega)*sinom;
      //scale1:=Sin(t*omega)*sinom;
   end
   else  // "from" and "to" quaternions are very close
   begin
      //  ... so we can do a linear interpolation
      scale0:=1.0-t;
      scale1:=t;
   end;
   // calculate final values
   Result.ImagePart.V[0] := scale0 * Self.ImagePart.V[0] + scale1 * to1[0];
   Result.ImagePart.V[1] := scale0 * Self.ImagePart.V[1] + scale1 * to1[1];
   Result.ImagePart.V[2] := -(scale0 * Self.ImagePart.V[2] + scale1 * to1[2]); // Don't understand why we need negate here
   Result.RealPart := (scale0 * Self.RealPart + scale1 * to1[3]);
   Result.Normalize;
end; 

@jdelauney
Copy link
Owner

sorry just back from shopping

No, problems Peter :) 👍

@jdelauney
Copy link
Owner

Ok changed Omega :=(Omega + DegToRadian(Spin*360)); by Omega :=(Omega + DegToRadian(Spin*180)); it's remove the negation of Z but now it's failed on W'sign :'(

  • "SlerpSpin:Sub3 W failed : (ImagePart.X: 0,00000 ,ImagePart.Y: 0,00000 ,ImagePart.Z: -0,38268 , RealPart.W: -0,92388)" expected: <0,9239> but was: <-0,9239>

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

(720 - 90) / 2 = 315 so perhaps we are doubling the spin rotation and negating the initial rotation. this is still from the unchanged code you posted, will plug some more numbers in to test

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

Ok that is it from changing 90 to 20 we get 720 - 20 = 700 : 700 / 2 = 350 and I get a euler answer of -10

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

so need to half spin value and negate initial somehow.

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

Quat to Mat4 done and checked in. Now for Mat4 to Quat.

@jdelauney
Copy link
Owner

jdelauney commented Jan 27, 2018

Always don't understand SlerpSpin.

I don't understand "(720 - 90) / 2 = 315" = ((360*2) - 90) / 2

Why 360 our destination is at 90°

I don't understand slerpSpin in the same maner like you.

why in

if ((1.0-cosom)>cEpsilon) then
   begin
      //Cosom := Cosom
      omega:=GLZMath.ArcCos(cosom); 

RadiansToDeg(Omega) = 45 and not 90 like the initial aqt2.Create(90,ZVector);

at firts i'm understading SlerpSpin like this :

  • we are at A(x:0,y:0,z:0,w:1)
  • we want to go at pos B by a rotation of 90° twice (Spin = 2) on the ZAxis,
  • we here at the half time (t = 0.5)

so normally we have made a rotation of 90° = (90° * 2) * 0.5 or something escape me

And for Matrix test

  • "ConvertToMatrix:Sub1 m33 failed " expected: <0,8936> but was: <1,0026>

@jdelauney
Copy link
Owner

if this is the case an Angular lerp is needed here

@jdelauney
Copy link
Owner

jdelauney commented Jan 27, 2018

Ok by making like i think SlerpSpin is ok

First change :

procedure TGLZQuaternion.Create(const angle  : Single; const axis : TGLZAffineVector);
//procedure TGLZQuaternion.Create(const angle  : Single; const axis : TGLZVector);
var
   f, s, c : Single;
   vaxis : TGLZVector;
begin
   GLZMath.SinCos(DegToRadian(angle*cOneHalf), s, c); //--> Angle div by 2 here
   Self.RealPart:=c;
   vaxis.AsVector3f := axis;
   vAxis.w :=1; // -----> Need set as affine
   f:=s/vAxis.Length;
   Self.ImagePart.V[0]:=axis.V[0]*f;
   Self.ImagePart.V[1]:=axis.V[1]*f;
   Self.ImagePart.V[2]:=axis.V[2]*f;
end; 

after

function TGLZQuaternion.Slerp(const QEnd: TGLZQuaternion; Spin: Integer; t: Single): TGLZQuaternion;
begin
Result := Self.Slerp(Qend,t*spin);
end;

and for finish :

procedure TQuaternionFunctionalTestCase.TestSlerpSpin;
begin
//   aqt1.AsVector4f := WHmgVector;  // null rotation as start point.
   aqt1.create(1e-14,ZVector);  // null rotation as start point.
   aqt2.Create(90,ZVector); // 90  = 90
   aqt4 := aqt1.Slerp(aqt2, 2, 0.5); //  90 [ 0, 0, 0.7071068, 0.7071068 ]
   AssertEquals('SlerpSpin:Sub1 X failed ', 0.0, aqt4.X);
   AssertEquals('SlerpSpin:Sub2 Y failed ', 0.0, aqt4.Y);
   AssertEquals('SlerpSpin:Sub3 Z failed ', 0.7071068, aqt4.Z);
   AssertEquals('SlerpSpin:Sub4 W failed ', 0.7071068, aqt4.W);
   aqt4 := aqt1.Slerp(aqt2,2,2/9); // 40  [ 0, 0, 0.3420185, 0.9396932 ]
   AssertEquals('SlerpSpin:Sub5 X failed ', 0.0, aqt4.X);
   AssertEquals('SlerpSpin:Sub6 Y failed ', 0.0, aqt4.Y);
   AssertEquals('SlerpSpin:Sub7 Z failed ',  0.3420185, aqt4.Z);
   AssertEquals('SlerpSpin:Sub8 W failed ', 0.9396932, aqt4.W);
   aqt4 := aqt1.Slerp(aqt2,2,8/9); // 160  [ 0, 0, 0.9848078, 0.1736482 ]
   AssertEquals('SlerpSpin:Sub9 X failed ',   0.0, aqt4.X);
   AssertEquals('SlerpSpin:Sub10 Y failed ',  0.0, aqt4.Y);
   AssertEquals('SlerpSpin:Sub11 Z failed ', 0.9848078, aqt4.Z);
   AssertEquals('SlerpSpin:Sub12 W failed ', 0.1736482, aqt4.W);
end;

This is the behaviour you want ?

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

Ok I just did the last two parts and it works, you have changed the spin to mean rotation * spin which is fine by me. previous code looked like it was adding some multiple of pi.

vAxis.w :=1; // -----> Need set as affine

This has no effect, with it or without it. Best not putting it in as it may make us add some asembler code where not required.

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

And for Matrix test
"ConvertToMatrix:Sub1 m33 failed " expected: <0,8936> but was: <1,0026>

Fixed

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 27, 2018

thinking about it need to test for large number of 'spirals' and see what the limits are, if any.

@jdelauney
Copy link
Owner

If it not working just go on Bin, not really necessary. If you see others functions like this don't hesitate to trash it

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 28, 2018

have you seen these
quad_tess4x3
tri_tess3

patterns before for creating tri-strips without degenerates?

@jdelauney
Copy link
Owner

jdelauney commented Jan 28, 2018

I see the problem, but I do not understand what you want to talk about and about what. Can you be more precise

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 28, 2018

None of the grids I have come across in GLScene has single tri strip for a rectangular mesh. And I was looking for such an algo myself but all I ever found before were algos that created degenerate tris at the end of rows. With those grid patterns I now see a way of creating a tri strip with no degenerate triangles in it.
When I originally looked I got too hung up about normals and winding, and completely forgot I was going to do my own norms anyway and these were to be in a VBO. I just wondered if you had come across this winding scheme before.

@jdelauney
Copy link
Owner

jdelauney commented Jan 28, 2018

Ok i understand now it's not a problem like i've supposed. We can use this scheme for terrain rendering (not implemented for terrain, in glscene I believe) i'm already try this thrue an article long time ago. never finished. This scheme can also be using with tesselate geometry shader if i'm remember

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 29, 2018

I presume you have seen this, this looks like the 'bible' on shuffles as it was written by Peter Cordes, a name that comes up a lot when looking at SSE et al.
I am just digesting it all atm and re reading.
https://stackoverflow.com/questions/6996764/fastest-way-to-do-horizontal-float-vector-sum-on-x86

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 29, 2018

Ok I have just applied those to quat nornalise. It's not the only optimisation I'm afraid so no test on the code there yet. But I have taken Quat.Normalize from a SF of 1.9 to a SF of 2.88.
Comments in code until you pick it up, then delete block in unix64. Will checkin in a few mins. have to sort out epsilons for SSE.

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 29, 2018

best test for the article above will be Magnitude of Quat.

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 29, 2018

Ok I have tested Magnitude. Here are the speed up results from that testing

  • base before opts 0.939547 slightly slower that pascal
  • add nostackframe 1.311267
  • use Peter Cordes code 1.762932

This is for SSE 3

@jdelauney
Copy link
Owner

I presume you have seen this, this looks like the 'bible' on shuffles as it was written by Peter Cordes, a name that comes up a lot when looking at SSE et al.
I am just digesting it all atm and re reading.
https://stackoverflow.com/questions/6996764/fastest-way-to-do-horizontal-float-vector-sum-on-x860

Yes but not same thread :) if you look in the win64 sse you can see some Horizontal Sum test surround by {$ifdef TEST}..{$endif}

for me the most performant with hsum is (sse3)

   movshdup xmm1, xmm0
   addps    xmm0, xmm1
   movhlps  xmm1, xmm0
   addss    xmm0, xmm1  

@jdelauney
Copy link
Owner

jdelauney commented Jan 29, 2018

the code i've wrote above is for Magnitude, strangely with normalize this :

  pshufd  xmm1, xmm2, $0E
  addps   xmm2, xmm1
  pshufd  xmm1, xmm2, $01
  addss   xmm2, xmm1  

is better

@jdelauney
Copy link
Owner

I made some little test also with TGLZVector4f Length and normalize

In Normalize

//haddps  xmm2, xmm2
  addss   xmm1, xmm2         //    |z^2+x^2*|1|1|
  shufps  xmm2, xmm2, 01010101b 

is better (around 4%)

in length :

        //haddps xmm0, xmm0
        //haddps xmm0, xmm0
        movshdup    xmm1, xmm0
        addps       xmm0, xmm1
        movhlps     xmm1, xmm0
        addss       xmm0, xmm1     

is better (around 2%)

and with this :

        pshufd  xmm1, xmm0, $0E
        addps   xmm0, xmm1
        pshufd  xmm1, xmm0, $01
        addss   xmm0, xmm1 

is better around 1%

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 29, 2018

lots of warnings about using pshufd when processing packed single, does not affect all processors but those it does it hurts.

On old CPUs with slow shuffles:

  • movhlps (Merom: 1uop) is significantly faster than shufps (Merom: 3uops). On Pentium-M, cheaper than movaps. Also, it runs in the FP domain on Core2, avoiding the bypass delays from other shuffles.
  • unpcklpd is faster than unpcklps.
  • pshufd is slow, pshuflw/pshufhw are fast (because they only shuffle a 64bit half)
  • pshufb mm0 (MMX) is fast, pshufb xmm0 is slow.
  • haddps is very slow (6uops on Merom and Pentium M)
  • movshdup (Merom: 1uop) is interesting: It's the only 1uop insn that shuffles within 64b elements.

@dicepd
Copy link
Collaborator Author

dicepd commented Jan 29, 2018

One other item that came up in that was using ss where possible so it does not use the other gates. I suppose when you are multithreading and hammering all processors that the heat generated forces the CPU to throttle its clock speed to stay within its working thermal envelope.

@jdelauney
Copy link
Owner

In Distance with :

          movaps xmm0,[RCX]
          movaps  xmm1, [A]
          subps   xmm0, xmm1
          andps xmm0, [RIP+cSSE_MASK_NO_W]
          mulps   xmm0, xmm0
          //haddps xmm0, xmm0
          //haddps xmm0, xmm0
          // Instead of haddps :
          movshdup    xmm1, xmm0
          addps       xmm0, xmm1
          movhlps     xmm1, xmm0
          addss       xmm0, xmm1 

better around 7,8%

With :

        movq xmm0, [RCX]         // move 64 bits and clear top  x,y,0,0   ** Not working on Win10 64bit
        movq xmm1, [A]           // move 64 bits and clear top  x1,y1,0,0
        subps xmm0, xmm1   // x-x1,y-y1,0,0
        mulps xmm0, xmm0   // (x-x1)^2,(y-y1)^2,0,0
        movss xmm1, [rcx]8      // z,0,0,0
        movss xmm2, [A]8        //z1,0,0,0
        subps  xmm1, xmm2   //z-z1,0,0,0
        mulps  xmm1, xmm1   //(z-z1)^2,0,0,0
        addps  xmm0, xmm1   //(x-x1)^2+(z-z1)^2, (y-y1)^2, 0, 0
        haddps xmm0, xmm0  //(x-x1)^2+(z-z1)^2 + (y-y1)^2, 0, 0
        sqrtss xmm0, xmm0  

this better around 9%

and by replacing haddps xmm0, xmm0 //(x-x1)^2+(z-z1)^2 + (y-y1)^2, 0, 0
by

   movshdup  xmm1, xmm0
   addss  xmm0, xmm1  

is better around 13,5 % 👍

@jdelauney
Copy link
Owner

lots of warnings about using pshufd

Yes, as I read it, many recommend using 'Shufps' instead

@jdelauney
Copy link
Owner

In Normalize replacing
shufps xmm2, xmm2, 01010101b by movshdup xmm2, xmm2
not big difference shufps better near 0.3%

@jdelauney
Copy link
Owner

In AngleCosine by replacing all haddps gain is around 42% instead of 21%

@jdelauney
Copy link
Owner

Hi Peter, i hope your are fine. I haven't got many time since while days. I've some little problems, i need to solve. So i'm finding some little time to code and empty my head. In following this thread :
https://sourceforge.net/p/glscene/discussion/93606/thread/95f2322b/
I've made some corrections and i've added ConvertToEuler and ConvertToAngleAxis functions
For doing this i've introduce a simple new record TGLZEulerAngles
ConvertToEuler seems working but i've a doubt with singularities like is describe throught links in comments.
I've also added a Docrefs folder with some technicals papers

See you soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants