Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perhaps switch to pointers for matrices and vectors #29

Closed
UserAB1236872 opened this issue Aug 22, 2014 · 10 comments
Closed

Perhaps switch to pointers for matrices and vectors #29

UserAB1236872 opened this issue Aug 22, 2014 · 10 comments

Comments

@UserAB1236872
Copy link
Member

I got an email recently with some bechmarks. The long and short is: once you get to Mat4 size, it is a significant speed improvement (about 13 ns/op for multiplcation) to use a pointer for Mat4. This is a significant API change and would break a lot of existing code. Doing

m := mgl32.Ident4()
m[0] = 5

Would break. Any code that currently uses any of: At and Set for matrices or X, Y, Z, or W for vectors is safe (SetX and the like would likely needed to be added for Vectors).

The way I see it, there are two potential solutions here:

  1. Redo everything to use pointers. This, as mentioned, breaks existing code.
  2. Add functions like func (m *Mat4) MulTo(dst *Mat4, m2 *Mat4) which would have similar semantics to *MatMN.MulMN(dst, m2).

I like option 2, since there's no compatability breakage, but it does clutter the API and people may mistakenly use the slower code without realizing it.

Thoughts?

@UserAB1236872
Copy link
Member Author

Another snag with wholesale switching to pointers is the Quaternion type. Right now it's

type Quat struct {
    W float32
    V Vec3
}

Should this be changed to

type Quat struct {
    W float32
    V *Vec3
}

or kept the same and have a pointer to the Quat struct (assuming we go with option 1 above)?

@dmitshur
Copy link
Member

I was gonna comment on this, but I forgot what I wanted to say.

Honestly, I think it's a hard question and I don't know what's the best thing to do.

@ghost
Copy link

ghost commented Aug 28, 2014

(i initially brought up the problem)

BenchmarkValue 50000000 45.5 ns/op
BenchmarkReference 50000000 40.5 ns/op
BenchmarkDestination 100000000 29.2 ns/op
BenchmarkBoth 100000000 24.2 ns/op

multiplication by reference: 11% faster
multiplication by value with destination matrix: 36% faster
both pass by ref and have dst matrix: 53% faster

"The long and short is: once you get to Mat4 size, it is a significant speed improvement (about 13 ns/op for multiplication)" @Jragonmiris
13 ns/op doesn't mean anything if you don't also give the speed before :P

you can find the implementation of the benchmark here

as for wether or not to break the api, the library seems to be 6 month old (according to first commit), might as well make it right today then to pay for it later. There's only so much you can break. Also, since the code is mostly generated, you could offer the new (pointer and dst) version as the main version and a deprecated version with the pass-by-value, something like

import "github.com/go-gl/mathgl/mgl32"
or
import "github.com/go-gl/mathgl/dep/mgl32"

for the quaternions, the right answer is "whatever is fastest" but i suspect it's the 2nd option (with the pointer) because quaternion multiplication probably suffers a lot from pass-by-value
quat.go:L92

func (q1 Quat) Mul(q2 Quat) Quat {
return Quat{q1.W*q2.W - q1.V.Dot(q2.V), q1.V.Cross(q2.V).Add(q2.V.Mul(q1.W)).Add(q1.V.Mul(q2.W))}
}

@dmitshur
Copy link
Member

as for wether or not to break the api, the library seems to be 6 month old (according to first commit), might as well make it right today then to pay for it later. There's only so much you can break. Also, since the code is mostly generated, you could offer the new (pointer and dst) version as the main version and a deprecated version with the pass-by-value, something like

import "github.com/go-gl/mathgl/mgl32"
or
import "github.com/go-gl/mathgl/dep/mgl32"

If you're going to do that, you might as well use gopkg.in with semantic versioning. I have to admit, it's starting to grow on me because it allows one to improve APIs with less fear of breaking dependent code.

@UserAB1236872
Copy link
Member Author

I've done some benchmarking, though with Vec4 rather than Mat4. I was playing around with SIMD (oh, yeah, I'm going to update the package to use SIMD, surprise) and I've found:

With pointers, a destination, and SIMD, performing element-wise multiplications of two float32 vectors is only barely slower than multiplying two float32s. Once pointers come in, it gets tricky:

Two float32s: 0.5ns/op
Two Vec4s SIMD by pointer with preallocated dst: 1.5 ns/op
Two Vec4s SIMD by pointer returned by value: 4.0 ns/op
Two Vec4s SIMD by pointer, allocate new dst every time: 40 ns/op
Two Vec4s SIMD by value, return by value: 7.3 ns/op
What we have now (no SIMD, all by value): 10 ns/op

Reallocating a new dst is over 25x slower than having a preallocated dst. More importantly, it's about 4x slower than what we have now.

What do I mean by "reallocating a new dst"? I mean, essentially, this:

func (v1 *Vec4) ElMul(v2 *Vec4) *Vec4 {
    out := &Vec4{}
    simdElMul(v1, v2, out)
    return out
}

That single step of creating that new Vec4 single handedly wreaks havoc performance. It seems irresponsible to have a function like that, since people will likely use it due to the much nicer syntax than a dst function. More notably: if we simply made every functions return pointers now, all existing code would immediately be changed to that. It's possible a sync.Pool may help, but given the extra type conversions I'm dubious.

Thoughts?

@ghost
Copy link

ghost commented Sep 3, 2014

well according to those results, keep returning by value and/or allow a dst function on top, this way people can chose, I know my app takes care of creating mat4 in advance. this way if people don't wish to care about performance, they can still use the old non-dst version.

Mul4(in *Vec4) Vec4
and
Mul4dst(in, out *Vec4)

@UserAB1236872
Copy link
Member Author

I think the direction to go for the time being (ignoring any theoretical code changes we can do due to things possibly happening in December with this package), would be to change all functions to have a pointer receiver, this is possible and will result in minimal code breakage due to auto-addressing. Functions will still take vectors as arguments and return vectors.

MulIn functions will be added, e.g.

func (m1 *Mat4)Mul4In(dst, m2 *Mat4) {
   (*dst) = //do stuff
   // alternatively simdMul4In(m1, m2, dst)
   return dst
}

@ghost
Copy link

ghost commented Apr 8, 2016

holy crap these results are wrong, what happens is cpu cache kicks in and makes pointer faster then they really are, memory access is super expensive. I don't know what exactly the right solution is, I just know these results shouldn't be considered

@dominikh
Copy link

Doing

m := mgl32.Ident4()
m[0] = 5

Would break.

As long as Ident4() returns a unique copy, not a pointer to a globally defined variable, this will continue to work fine.

Indexing, as well as the built-in len function, are defined for pointers to arrays (not to be confused with slices, where the same is not true.)

What would break are conversions between mathgl types and arrays or f32/f64 types.

@UserAB1236872
Copy link
Member Author

I think the package has been stable for too long to risk a breaking change this big now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants