Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Add Array type #2805

Draft
wants to merge 96 commits into
base: nightly
Choose a base branch
from
Draft

Conversation

martinvuyk
Copy link
Contributor

@martinvuyk martinvuyk commented May 23, 2024

Closes #2804

Since the proposal to rename InlineList to Array fell through (#2773), I thought of adding a dynamic Array that follows most of Python's behavior but lets the developer decide if and how much small buffer optimization to use for DType subtypes.

The current implementation has basically become a user friendly wrapper around SIMD.

The current implementation lets the programmer decide what amount of "capacity" to allocate for the Array. Under the hood it rounds up to the next upper power of two for the underlying SIMD. Though every method has then to take that delta into account (everything is parametrized).

Every operation on the Array is done in a vectorized manner except extending, appending, concatenating, etc. So using array.__contains__ can be a lot faster depending on the hardware it's running on.

You then have calculations like the dot product, cosine between arrays, applying a function to the array, and many other future ease of use features that can be added that vectorize the ops wherever possible.

Arrays of different lengths can interact with each other in many methods where it makes sense (concatenation, appending values from other to self, etc.).

Another important aspect for the future is the ease of use for going List[T] <-> Array[T] taking SBO into account.

It would also be awesome to add some benchmarks.

Examples:

from collections import Array
alias Arr = Array[DType.uint8, 3]
var a = Arr(1, 2, 3)
var b = Arr(1, 2, 3)
print((a - b).sum()) # prints 0
print(a.avg()) # prints 2
print(a * b) # dot product: 14
print(a.cross(b)) # cross product: Array(0, 0, 0)
print(2 in a) # prints True
print(a.index(2).value() if a.index(2) else -1) # prints 1
print((Arr(2, 2, 2) % 2).sum()) # 0
print((Arr(2, 2, 2) // 2).sum()) # 3
print((Arr(2, 2, 2) ** 2).sum()) # 12

martinvuyk and others added 19 commits May 23, 2024 16:26
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@gabrieldemarmiesse
Copy link
Contributor

gabrieldemarmiesse commented May 24, 2024

@martinvuyk If possible, I would recommend to avoid starting working on a new function/struct that's not present in Python and that has not been approved yet by the Mojo maintainers in the issues.

While I'm sure that the new structs/functions contributors are writing have value, there is information internal to Modular that we don't have, it's possible that the Modular staff has already thought about some alternative, or has a plan for another API. Since we're in the dark here, let's proceed with caution.

It know it's annoying as it can delay your work, but it may also avoid some wasted work on PRs. (well, hopefully not really wasted, since I'm sure you learned a lot while writing those PRs!)

I'm currently waiting for the go of the maintainers to implement small buffer optimization in List, in which case this pull request would not be needed anymore. You can follow the progress here: #2467 (comment)

@martinvuyk
Copy link
Contributor Author

@martinvuyk If possible, I would recommend to avoid starting working on a new function/struct that's not present in Python and that has not been approved yet by the Mojo maintainers in the issues.

While I'm sure that the new structs/functions contributors are writing have value, there is information internal to Modular that we don't have, it's possible that the Modular staff has already thought about some alternative, or has a plan for another API. Since we're in the dark here, let's proceed with caution.

It know it's annoying as it can delay your work, but it may also avoid some wasted work on PRs. (well, hopefully not really wasted, since I'm sure you learned a lot while writing those PRs!)

I'm currently waiting for the go of the maintainers to implement small buffer optimization in List, in which case this pull request would not be needed anymore. You can follow the progress here: #2467 (comment)

Thank you for the comment, I'm waiting for confirmation but it's really fun to tinker in this language. The main difference between this type and SBO is that this is 100% on the stack. It tries to follow Python's Array as well, but only using SIMD. My hope is that this becomes the type that is used for high performance IO or be the user friendly interface to SIMD, since List lives in the heap and SBO will only get you so far when you want Arrays of different widths and do vectorized ops on them.

If, for example, someone wants to do high performance operations on strings that are parallel like uppercasing SBO will give you faster acces but not really faster ops, whereas SIMD does.

I've seen your PR and do think it is necessary and will have a big impact on List perf., but I don't think List should be the end all be all. The main reason ppl. will come to Mojo is for out of the box support for the most innovative CPU and GPU ops, to get that, people will expect an Array to be there and be performant. If we stay only with SBO and iterating over arrays in a classic way... every high perf. C++ lib will be faster than the Mojo stdlib. Mojo should be the place for easy to use SIMD ops IMO.

There are some things that I'm not even sure if they are possible, since I don't know the layout in memory of SIMD and DTypePointer's pointee and many other Mojo internals (I don't understand MLIR), so this PR is really just an experiment. I have no problem killing it if the Mojo team tells me to. Code is just code.

martinvuyk and others added 9 commits May 24, 2024 22:43
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants