[stdlib] Add `Array` type #2805

martinvuyk · 2024-05-23T20:51:17Z

Since the proposal to rename InlineList to Array fell through (#2773), I thought of adding a dynamic Array that follows most of Python's behavior but lets the developer decide if and how much small buffer optimization to use for DType subtypes.

The current implementation has basically become a user friendly wrapper around SIMD.

The current implementation lets the programmer decide what amount of "capacity" to allocate for the Array. Under the hood it rounds up to the next upper power of two for the underlying SIMD. Though every method has then to take that delta into account (everything is parametrized).

Every operation on the Array is done in a vectorized manner except extending, appending, concatenating, etc. So using array.__contains__ can be a lot faster depending on the hardware it's running on.

You then have calculations like the dot product, cosine between arrays, applying a function to the array, and many other future ease of use features that can be added that vectorize the ops wherever possible.

Arrays of different lengths can interact with each other in many methods where it makes sense (concatenation, appending values from other to self, etc.).

Another important aspect for the future is the ease of use for going List[T] <-> Array[T] taking SBO into account.

It would also be awesome to add some benchmarks.

Examples:

from collections import Array
alias Arr = Array[DType.uint8, 3]
var a = Arr(1, 2, 3)
var b = Arr(1, 2, 3)
print((a - b).sum()) # prints 0
print(a.avg()) # prints 2
print(a * b) # dot product: 14
print(a.cross(b)) # cross product: Array(0, 0, 0)
print(2 in a) # prints True
print(a.index(2).value() if a.index(2) else -1) # prints 1
print((Arr(2, 2, 2) % 2).sum()) # 0
print((Arr(2, 2, 2) // 2).sum()) # 3
print((Arr(2, 2, 2) ** 2).sum()) # 12

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

gabrieldemarmiesse · 2024-05-24T20:36:19Z

@martinvuyk If possible, I would recommend to avoid starting working on a new function/struct that's not present in Python and that has not been approved yet by the Mojo maintainers in the issues.

While I'm sure that the new structs/functions contributors are writing have value, there is information internal to Modular that we don't have, it's possible that the Modular staff has already thought about some alternative, or has a plan for another API. Since we're in the dark here, let's proceed with caution.

It know it's annoying as it can delay your work, but it may also avoid some wasted work on PRs. (well, hopefully not really wasted, since I'm sure you learned a lot while writing those PRs!)

I'm currently waiting for the go of the maintainers to implement small buffer optimization in List, in which case this pull request would not be needed anymore. You can follow the progress here: #2467 (comment)

martinvuyk · 2024-05-24T21:43:49Z

@martinvuyk If possible, I would recommend to avoid starting working on a new function/struct that's not present in Python and that has not been approved yet by the Mojo maintainers in the issues.

While I'm sure that the new structs/functions contributors are writing have value, there is information internal to Modular that we don't have, it's possible that the Modular staff has already thought about some alternative, or has a plan for another API. Since we're in the dark here, let's proceed with caution.

It know it's annoying as it can delay your work, but it may also avoid some wasted work on PRs. (well, hopefully not really wasted, since I'm sure you learned a lot while writing those PRs!)

I'm currently waiting for the go of the maintainers to implement small buffer optimization in List, in which case this pull request would not be needed anymore. You can follow the progress here: #2467 (comment)

Thank you for the comment, I'm waiting for confirmation but it's really fun to tinker in this language. The main difference between this type and SBO is that this is 100% on the stack. It tries to follow Python's Array as well, but only using SIMD. My hope is that this becomes the type that is used for high performance IO or be the user friendly interface to SIMD, since List lives in the heap and SBO will only get you so far when you want Arrays of different widths and do vectorized ops on them.

If, for example, someone wants to do high performance operations on strings that are parallel like uppercasing SBO will give you faster acces but not really faster ops, whereas SIMD does.

I've seen your PR and do think it is necessary and will have a big impact on List perf., but I don't think List should be the end all be all. The main reason ppl. will come to Mojo is for out of the box support for the most innovative CPU and GPU ops, to get that, people will expect an Array to be there and be performant. If we stay only with SBO and iterating over arrays in a classic way... every high perf. C++ lib will be faster than the Mojo stdlib. Mojo should be the place for easy to use SIMD ops IMO.

There are some things that I'm not even sure if they are possible, since I don't know the layout in memory of SIMD and DTypePointer's pointee and many other Mojo internals (I don't understand MLIR), so this PR is really just an experiment. I have no problem killing it if the Mojo team tells me to. Code is just code.

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

martinvuyk and others added 19 commits May 23, 2024 16:26

add Array type & tests

e928a87

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

add unsafe_get and unsafe_set and test

2433aa1

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix unsafe_set

a74b46a

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix formatting

aba1062

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix docstrings

29c8e61

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix formatting

7ce7e5f

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix sutff

a13e39b

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix stuff

f274641

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix stuff

29cf747

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix stuff

d581b2b

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix stuff

93d9ef6

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix stuff

9e37b6d

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix stuff

33f81f1

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

add structure with SIMD

8d2f6fe

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix details

7761ff8

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

Merge branch 'nightly' into add-array

6cba477

fix details

2bdf735

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix details

9aee64d

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

Merge branch 'nightly' into add-array

3796605

martinvuyk and others added 9 commits May 24, 2024 22:43

add a few ideas

a3a3901

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix details

982cf22

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix details

76f103b

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

Merge branch 'nightly' into add-array

eb01f98

remove stack expansion, will not work

6c5e3c8

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

Merge branch 'nightly' into add-array

d4c1c39

fix index

4657b7b

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

add broadcast ops

ca7b6db

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

I think array is ready :)

e236808

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

martinvuyk added 30 commits June 5, 2024 10:26

add a few ideas

0ceed6a

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix details

8fda752

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix details

718d694

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

remove stack expansion, will not work

42ed2d0

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix index

afa039f

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

add broadcast ops

02e894a

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

I think array is ready :)

336cef2

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix details

ea137fa

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

add logical operations

395edac

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix equal sign

9e2fc85

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

add map and filter

74ecf8c

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix docstrings

bce3fc6

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

add todo tests

db54681

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix mod truediv and floordiv

aca4fab

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

add unsafe pointer constructors

ea16a81

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix typos

1b70cf7

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix docstring

ca00817

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

add math funcs

a3414b0

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix cross product

838c990

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix cross product

66299b1

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

Merge branch 'add-array' of github.com:martinvuyk/mojo into add-array

d9c2842

what is happening

9e2f029

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

what is happening

84b6b43

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

Merge remote-tracking branch 'upstream/nightly' into add-array

37b100c

Merge remote-tracking branch 'upstream/nightly' into add-array

f8d6b10

fix stuff, still no idea what the problem is

41b4d35

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix trying to stop weird error

ea006b9

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix array

da878a1

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

Merge remote-tracking branch 'upstream/nightly' into add-array

7eecca6

add examples

e26806f

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stdlib] Add `Array` type #2805

[stdlib] Add `Array` type #2805

martinvuyk commented May 23, 2024 •

edited

gabrieldemarmiesse commented May 24, 2024 •

edited

martinvuyk commented May 24, 2024

[stdlib] Add Array type #2805

Are you sure you want to change the base?

[stdlib] Add Array type #2805

Conversation

martinvuyk commented May 23, 2024 • edited

gabrieldemarmiesse commented May 24, 2024 • edited

martinvuyk commented May 24, 2024

[stdlib] Add `Array` type #2805

[stdlib] Add `Array` type #2805

martinvuyk commented May 23, 2024 •

edited

gabrieldemarmiesse commented May 24, 2024 •

edited