Skip to content
This repository has been archived by the owner on May 6, 2024. It is now read-only.

How about let xy() and xyz() functions return references instead of values. #11

Closed
FrankStain opened this issue Dec 9, 2015 · 5 comments

Comments

@FrankStain
Copy link

I had noticed that mathfu::Vector<T, 3> has homogeneous memory representation with mathfu::Vector<T, 3>.
It means that mathfu::vec3 can be safely casted to mathfu::vec2 via reinterpret_cast<mathfu::vec2&>( vec3_inst ).
And same situation i see for mathfu::vec4 and mathfu::vec3 classes.

So how about making mathfu::vec3::xy(), mathfu::vec4::xy() and mathfu::vec4::xyz() return references instead of values?

Smth like:

inline Vector<T, 2>& xy() { return *reinterpret_cast<Vector<T, 2>*>( this ); };
inline const Vector<T, 2>& xy() const { return *reinterpret_cast<const Vector<T, 2>*>( this ); };

Such conversion is much helpful!

@VladSerhiienko
Copy link

Hi, FrankStain!
Did you try to run the code with the suggested changes in the Release configuration? I think you might face memory-access-related crashes on some platforms (on Windows x64 most likely too). Also, if you require that sort of functionality, you can try Eigen library. As I remember, it has heading() template function, that returns reference.

@FrankStain
Copy link
Author

@VladSerhiienko , hey :)

It was long time away, so yep, i tried to make some tests around this issue.
So... running on x64 platform with enabled SIMD processing of vectors we facing one major restriction: the memory of any vector must be aligned to 8 bytes. Actually, this restriction is common for all platforms where SIMD processing is enabled.
Casting the memory from vec3 to vec2 we don't cause pointer deformation, but only declares that pointer must point to memory of less size.
Also, all representations of mathfu::Vector<T, N> consists of same __m128 member and covers memory of same size.

So, just requesting the xy component from xyz or xyzw causes no problems. But situation changes if we need to access yz component by reference. Yep, here we will have a problems.

Did i missed something more important?

By means accessing of xy component by reference i see the performance optimization - excluding of allocation and construction of another vector. When read-only vector of less dimension must be passed as argument or used in calculation, such optimization can save some ticks to let you spend it on more usefull operations. :)

@stewartmiles
Copy link
Contributor

Hi, regarding returning references rather than values for subcomponents,
check out
https://github.com/google/mathfu/blob/master/include/mathfu/vector_3.h#L323

Then take a look at the simd4f data type on each platform here:
https://github.com/scoopr/vectorial/tree/ae7dc88215e14c812786dc7a1f214610ae8540f5/include/vectorial

It's pretty different, in some cases (e.g NEON) this uses a type like
float32x4_t which requires the user to load and store the type using
intrinsics
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0491f/BABGABJH.html
to avoid undefined behavior. Now I know in practice when the NEON value is
then stored in memory we're probably going to be ok but it's very likely -
if you're using a chain of operations - the compiler will try to keep the
vector in a register in which case you're going to be out of luck with the
proposed optimization.

In general, I really don't recommend using reinterpret_cast with mathfu
data types for this type of reason, you may end up with code that works ok
on your beefy x86_64 CPU but results in some really strange behavior
elsewhere.

Cheers,
Stewart

On Fri, May 20, 2016 at 5:23 AM, Franken notifications@github.com wrote:

@VladSerhiienko https://github.com/VladSerhiienko , hey :)

It was long time away, so yep, i tried to make some tests around this
issue.
So... running on x64 platform with enabled SIMD processing of vectors we
facing one major restriction: the memory of any vector must be aligned to 8
bytes. Actually, this restriction is common for all platforms where SIMD
processing is enabled.
Casting the memory from vec3 to vec2 we don't cause pointer deformation,
but only declares that pointer must point to memory of less size.
Also, all representations of mathfu::Vector<T, N> consists of same __m128
member and covers memory of same size.

So, just requesting the xy component from xyz or xyzw causes no problems.
But situation changes if we need to access yz component by reference.
Yep, here we will have a problems.

Did i missed something more important?

By means accessing of xy component by reference i see the performance
optimization - excluding of allocation and construction of another vector.
When read-only vector of less dimension must be passed as argument or used
in calculation, such optimization can save some ticks to let you spend it
on more usefull operations. :)


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#11 (comment)

@FrankStain
Copy link
Author

FrankStain commented May 20, 2016

@stewartmiles , hi!

Sadly, i can't clearly understand some of your words.

Did you mean that float32x4_t can't be safely reinterpretated to float32x2_t?
Also i see the same union inside vec3 and vec4:
https://github.com/google/mathfu/blob/master/include/mathfu/vector_2.h#L225
https://github.com/google/mathfu/blob/master/include/mathfu/vector_4.h#L37
It means the memory mapping is same for vec3 and vec4 on each of supported platforms.

Now I know in practice when the NEON value is
then stored in memory we're probably going to be ok but it's very likely -
if you're using a chain of operations - the compiler will try to keep the
vector in a register in which case you're going to be out of luck with the
proposed optimization.

Can you clarify this point widely?

@stewartmiles
Copy link
Contributor

That's correct, float32x4_t can't safely be reintepreted as float32x2_t.
vectorial does have a union that maps the types to float but this isn't
actually used since the compiler is likely to keep the type in a NEON
register and not write the data back out to memory so that it can be
reinterpreted as a float. All operations on the simd* types need to be
performed using compiler intrinsics.

The compiler intrinsics for NEON's simd2f are located in
https://github.com/scoopr/vectorial/blob/ae7dc88215e14c812786dc7a1f214610ae8540f5/include/vectorial/simd2f_neon.h#L17
. See
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0205g/BABGABJH.html
for more information.

On Fri, May 20, 2016 at 9:26 AM, Franken notifications@github.com wrote:

@stewartmiles https://github.com/stewartmiles , hi!

Sadly, i can't clearly understand some of your words.

Did you mean that float32x4_t can't be safely reinterpretated to
float32x2_t?
Also i see the same union inside vec3 and vec4:
https://github.com/google/mathfu/blob/master/include/mathfu/vector_2.h#L225
https://github.com/google/mathfu/blob/master/include/mathfu/vector_4.h#L37
It means the memory mapping is same for vec3 and vec4 on each of
supported platforms.

Now I know in practice when the NEON value is
then stored in memory we're probably going to be ok but it's very likely -
if you're using a chain of operations - the compiler will try to keep the
vector in a register in which case you're going to be out of luck with the
proposed optimization.
Can you clarify this point widely?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#11 (comment)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants