-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added BMI1, BMI2, F16C, MOVEBE, and aligned-allocation/free plus some minor additions. #25
Conversation
Following the sources by Felix Cloutier, i tried to read the Cacheline-size from EAX leg-1. It's giving me a value of 8, that needs to be multiplied by 8. And that gives me 64-byte - but it may be just a coincidence ;) So pls. take a look - it' rather q&d...
Just a little test to how the package is structured. This supports F16C a.k.a Float16 or 'half precision floats'. Not to mix up with 'brain-floats' - thats also 16-bit, but comes with a different encoding.
needs some some testing...
added the {.pop.} at the end..
just begun to look into it...
Some older prbly. not so important tags. Though the CompareExchange16-Tag is interesting. And some of the BMI2 operations, as well.
the missing bmi2.nim
Three functions from SSE1-times - i guess they fit in nice here. I just started to use malloc and free and had no issues so far. It's seem a nice way to get aligned allocations, which had been discussed on the nim-forum..
finished Tags BM11 and BMI2
MOVBE fns in movbe.nim
added Tags BMI1, BMI2, F16C, MOVBE
added older AVX2 broadcasts `mm_broadcast<x>_<type>`and movbe.nim into the right folder.
movbe.nim belongs in /src/nimsimd/
missing 'mm_blend_epi32' in AVX2
Hi and thanks for the PR. Taking a quick look, one thing I noticed is the use of A static pointer would need to be known at compile time which I do not see as compatible with heap memory and the use of these operations. |
changed `static pointer` into `pointer`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thx
- changed from ":static pointer" to ":pointer"
MOVEBE was smth. i considered using, but never did. As you already found out, its not tested.
fixed return vector(s) from 'M256d/M256' in : - `mm_permutevar_pd() :M128d` - `mm_permutevar_ps() :M128`
fixed signatures in : - storebe_i16() - storebe_i32() - storebe_i64()
I took a closer look this time and did notice a couple things to check on:
It will be great to have a bunch more instruction sets available when we've got them looking good. |
edit signature : - bextr_u64()
edited signatures : - bzhi_u32 - bzhi_u64 - mulx_u32 - mulx_u64
ok, BMI1 & BMI2 should be fine now. thx for the review. greets and have a nice weekend :) |
Hi guzba, i've noticed that the something did not pass the GH-pipeline ? Greets Andreas |
Hello and thanks again for this PR. There were still some issues here so instead of continuing to tweak this over time, which may be time consuming, I chose to extend these commits in a new PR and do the remaining fixes here #26 Note that I did remove the queryCacheLineSize call due to the implementation not looking correct to me (should be an unsigned shift, I think incorrect register used, need to check a feature flag to see if the data is present and valid, etc). More info here https://www.felixcloutier.com/x86/clflush if you'd like to work on that in a separate PR. |
Very nice and thx for including the important tags. |
If you have a different source for how to get the cache flush size, that'd be totally fine with me. I don't mean to imply the website I linked is the only authority. If you have a solid one, please just point me to it so I can properly ensure I learn and agree the implementation is correct and ready to go. |
Hello Guzba,
this pull-request adds some new Tags :
Moste of the functionality of BMI 1+2 is already in
std/bitops
. But for demo-purposes it might be ok, to have them explicit here.More important to me is the addition of aligned-allocations and prefetching to
nimsimd/sse2.nim
:I've used the aligned-allocations in two projects of mine. So consider them tested and usable, except the
sfence()
-thing.Some functions were missing from
nimsimd/avx2.nim
. According to the Intel-Intrinsics-Guide, they should be avail.The CacheLineWidth-detection works for me. I have not found a similar way to inquire this info for Arm or AppleSilicon or RISC-V cpus. Its just a starting point. Since i believe the CacheLineWidth to be the crucial-information when dealing with SIMD, i'd love to see this information availiable for all commonly cpu-architecture used by nim ? What do you think ?
Same could be said for the L1,L2,L3-sizes.
best regards, Andreas (a.k.a. bosinski)