Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH, SIMD: Add new NPYV intrinsics pack(0) #17789

Merged
merged 2 commits into from
Dec 9, 2020

Conversation

seiko2plus
Copy link
Member

@seiko2plus seiko2plus commented Nov 16, 2020

Add new NPYV intrinsics pack(0)

TODO:

  • test it on armhf

  - add bitfield conversion for boolean vectors
  - add reverse elements of each 64-bit lane
  - add testing cases
@Qiyu8
Copy link
Member

Qiyu8 commented Nov 17, 2020

Looks good to me, I will refactor #17102 and give a new benchmark test by using new intrinsics here.

{
return _mm256_shuffle_ps(a, a, _MM_SHUFFLE(2, 3, 0, 1));
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add reverse f64 api here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this intrinsic only reverse vector elements on each 64-bit width, same as neon vrev64q_*

#define npyv_cvt_b64_f64 _mm256_castpd_si256

// convert boolean vector to integer bitfield
NPY_FINLINE npy_uint64 npyv_tobits_b8(npyv_b8 a)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is a litter confused, how about signmask or movemask?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this intrinsic isn't replaceable for movemask, since it accepts only boolean vector where the value must be
(1 << lane_size_inbits) - 1// 0xff**** or 0 and thats why we should get a better perfomance on aarch64.

}
#define npyv_rev64_s16 npyv_rev64_u16

NPY_FINLINE npyv_u8 npyv_rev64_u8(npyv_u8 a)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

npyv_rev64_u8 should be in front of npyv_rev64_u16

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we need to reuse npyv_rev64_u16() within npyv_rev64_u8() for emulating when SSSE3 isn't available

@mattip
Copy link
Member

mattip commented Nov 17, 2020

I think we should wait till after the 1.20 release is branched off master to put this in.

@seiko2plus
Copy link
Member Author

@mattip, no problem

@seiko2plus
Copy link
Member Author

@mattip, Is there any possibility to merge it soon?

Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks OK from what I understand. Just a minor nit about formatting comments.

*************************************************************************/
//#########################################################################
//## Defining NPYV intrinsics as module functions
//#########################################################################
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? This is C code and should use C comments

Copy link
Member Author

@seiko2plus seiko2plus Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still C99 comment, I had to add a special mark to separate the parent sections of the main template loops to reduce the dispersion. this source file contains two main loops one for defining c functions of module methods and another one to attach it into a static array of PyMethodDef.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. Is there something special about the parsing here that is different from the other *.c.src files?

Copy link
Member Author

@seiko2plus seiko2plus Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no there's no something special about it, except that we're dealing with a huge number of functions that need to be organized in a proper format.

//#########################################################################
//## A comment separating the main sections
//#########################################################################

/*********************************************************************************************************
 ** A comment separating the subsections of SIMD operations, such as memory, reorder ..etc
 ********************************************************************************************************/

// a comment for each group of intriniscs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks for the explanation.

*************************************************************************/
//#########################################################################
//## Defining a separate module for each target
//#########################################################################
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change the comment?

@mattip mattip merged commit fb5aa9d into numpy:master Dec 9, 2020
@mattip
Copy link
Member

mattip commented Dec 9, 2020

Thanks @seiko2plus

@rgommers rgommers added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants