You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have read the roadmap and priorities and I believe this request falls within the priorities.
What is your request?
Say I have a List[UInt8] that I want to process. Let it have 16 items and say it's a UTF-8 number list (xored by 0x30 gives the number).
I found no intuitive and simple way to cast a DTypePointer[DType.uint8] to a SIMD and do an xor on it.
First related issue I raised was #2381 because I didn't have an entrypoint into simd or really understand it.
In issue #2695 , I tried doing var ptr = list_unsafe_ptr.bitcast[DType.uint64]() ptr[offset] ^= 0x3030.. (8 times)for offset in range(2) but it doesn't edit the buffer
fnunsafe_simd[size: Int, T: DType](ownedself: DTypePointer[T]) -> SIMD[T, size]::
# somehow steal data into SIMD
What is your motivation for this change?
Right now many interfaces use List[DType] for many operations. If we provide an intuitive api to go from there to SIMD vectors it'll be much easier to provide higher performance since people will be actually using SIMD instead of iterating over a List for everything.
Also I'm not sure but I think __contains__ methods could potentially become much faster if dtypepointer can be turned into a simd and search in a vectorized loop instead of iter (?). Though there would be copy overhead unless len(iterable) is large or able to be consumed when used
Any other details?
No response
The text was updated successfully, but these errors were encountered:
@martinvuyk Not sure if that is what you are referring to but if you want to process elements of a list in chucks using SIMD you can divide the list into equal portions and use SIMD operation.
vara= List[UInt8](1, 2, 3, 4, ...) # list of size 16for i inrange(4):
# load a simd object with size 4vartmp= a.data.load[4](i*4)
# do some operation using SIMD# ......
a.data.store[4](i*4, tmp)
I don't think it make sense to convert an entire pointer array to SIMD since SIMD registers of a CPU usually only has width of 4(different story for GPUs but that's a different programming model and mojo GPU support is still not available yet). The compiler will probably help break down your SIMD size into ISA compatible SIMD width, but still using small SIMD width for parallelized operation seems to be a better practice.
@martinvuyk Not sure if that is what you are referring to but if you want to process elements of a list in chucks using SIMD you can divide the list into equal portions and use SIMD operation.
vara= List[UInt8](1, 2, 3, 4, ...) # list of size 16for i inrange(4):
# load a simd object with size 4vartmp= a.data.load[4](i*4)
# do some operation using SIMD# ......
a.data.store[4](i*4, tmp)
I didn't know you could use a strided load with a pointer like that, pretty neat.
I don't think it make sense to convert an entire pointer array to SIMD since SIMD registers of a CPU usually only has width of 4(different story for GPUs but that's a different programming model and mojo GPU support is still not available yet). The compiler will probably help break down your SIMD size into ISA compatible SIMD width, but still using small SIMD width for parallelized operation seems to be a better practice.
I think the stdlib itself should allow for huge SIMD vector use whatever the underlying architecture, and let the function itself be sent to the CPU/accelerator and let the compiler optimize there.
This still requires a for loop and index access. What I meant was to do the equivalent of C's memcpy from one buffer to the other directly without any loops or index access. I have no idea if the underlying layout in memory for DTypePointer's pointee is the same as the SIMD vector's, so a simple API to unsafely go from one to the other would be useful IMO.
Review Mojo's priorities
What is your request?
Say I have a List[UInt8] that I want to process. Let it have 16 items and say it's a UTF-8 number list (xored by 0x30 gives the number).
I found no intuitive and simple way to cast a DTypePointer[DType.uint8] to a SIMD and do an xor on it.
First related issue I raised was #2381 because I didn't have an entrypoint into simd or really understand it.
In issue #2695 , I tried doing
var ptr = list_unsafe_ptr.bitcast[DType.uint64]()
ptr[offset] ^= 0x3030.. (8 times)
for offset in range(2)
but it doesn't edit the bufferWhat is proposed?
Ways to get there
List[DType]
DTypePointer
What is your motivation for this change?
Right now many interfaces use List[DType] for many operations. If we provide an intuitive api to go from there to SIMD vectors it'll be much easier to provide higher performance since people will be actually using SIMD instead of iterating over a List for everything.
Also I'm not sure but I think
__contains__
methods could potentially become much faster if dtypepointer can be turned into a simd and search in a vectorized loop instead of iter (?). Though there would be copy overhead unless len(iterable) is large or able to be consumed when usedAny other details?
No response
The text was updated successfully, but these errors were encountered: