Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Parameterize __getitem__ and __setitem__ in stdlib types #2384

Closed
wants to merge 1 commit into from

Conversation

bgreni
Copy link
Contributor

@bgreni bgreni commented Apr 23, 2024

Fixes #2337

As mentioned in the referenced issue, the __index__() method allows types other than the base Int to be used when indexing containers without allowing inappropriate types that happen to implement Intable from being used (such as float point scalars).

Introduce the Indexer trait that requires the type to implement the __index__() method, and change all instances of __getitem__ and __setitem__ to accept any type that has the Indexer trait.

@bgreni bgreni changed the title [stdlib][Draft] Parameterize __getitem__ and __setitem__ in stdlib types [stdlib][Blocked] Parameterize __getitem__ and __setitem__ in stdlib types Apr 23, 2024
Returns:
The value as an integer
"""
constrained[type.is_integral(), "expected integral type"]()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOICE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be a good idea, for SIMD at least. This way, SIMD[bool, 1] will be cast to a Int but we might want to index with the original mask (which is size 1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be a good idea, for SIMD at least. This way, SIMD[bool, 1] will be cast to a Int but we might want to index with the original mask (which is size 1).

Now that you mention it Scalar[bool] actually won't be accepted here, which is inconsistent as I've currently given Bool the Indexer trait, but I was hoping for feedback from the team on whether that was acceptable or not.

We need some support for this in SIMD, however, since it would be expected that one could use any of the integral scalars to index the stdlib containers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just think we need a more complete design before adding the support to stdlib.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__index__ is an established pattern in Python, and so is indexing with bool so it seems reasonable to me to implement it this way for Mojo.

@bgreni bgreni force-pushed the parameterized-indexing branch 3 times, most recently from c4962ae to 8902489 Compare April 30, 2024 18:33
@bgreni bgreni force-pushed the parameterized-indexing branch 5 times, most recently from be29ebc to 5c0c215 Compare May 6, 2024 02:52
@ematejska ematejska added the mojo-repo Tag all issues with this label label May 6, 2024
@bgreni bgreni force-pushed the parameterized-indexing branch 5 times, most recently from af3b58f to aa76d86 Compare May 8, 2024 05:36
@bgreni bgreni marked this pull request as ready for review May 8, 2024 05:45
@bgreni bgreni requested a review from a team as a code owner May 8, 2024 05:45
@bgreni bgreni changed the title [stdlib][Blocked] Parameterize __getitem__ and __setitem__ in stdlib types [stdlib] Parameterize __getitem__ and __setitem__ in stdlib types May 8, 2024
@bgreni bgreni force-pushed the parameterized-indexing branch 2 times, most recently from 06e9052 to d4bd2d8 Compare May 9, 2024 18:30
@bgreni bgreni force-pushed the parameterized-indexing branch 2 times, most recently from 14b4235 to 76b2833 Compare May 14, 2024 19:55
When indexing stdlib containers we should accept a generic type that calls
on the  __index__ method to allow types other than Int to be used but
doesn't allow Intable types that should not be used for such purposes
(such as Float)

Signed-off-by: Brian Grenier <grenierb96@gmail.com>
Copy link
Contributor

@laszlokindrat laszlokindrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great improvement, thank you! Internally, we have been discussing related issues, so your timing to do this is perfect. I have a bunch of mostly nit comments, and one big ask: could you please split this into two patches: one that introduces the Indexer trait and makes the types conform to it (along with tests for their __index__ methods if they don't have them yet), and a second one that changes __getitem__ et al to take Indexer instead of Int?

@@ -322,6 +327,15 @@ struct Bool(
"""
return __mlir_op.`index.casts`[_type = __mlir_type.index](self.value)

@always_inline("nodebug")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for this (we test these implementations directly using explicit calls to __index__ instead of through the free functions).

@@ -138,16 +138,19 @@ struct VariadicList[type: AnyRegType](Sized):
return __mlir_op.`pop.variadic.size`(self.value)

@always_inline
fn __getitem__(self, index: Int) -> type:
fn __getitem__[indexer: Indexer](self, idx: indexer) -> type:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: here and elsewhere: we try to make type parameters PascalCase, could you please change these?

Suggested change
fn __getitem__[indexer: Indexer](self, idx: indexer) -> type:
fn __getitem__[IndexerType: Indexer](self, idx: IndexerType) -> type:

If you want, I'm also okay with calling these just T for brevity.

@@ -322,6 +327,15 @@ struct Bool(
"""
return __mlir_op.`index.casts`[_type = __mlir_type.index](self.value)

@always_inline("nodebug")
fn __index__(self) -> Int:
"""Convert this Bool to an integer for indexing purposes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
"""Convert this Bool to an integer for indexing purposes
"""Convert this Bool to an integer for indexing purposes.

"""Convert this Bool to an integer for indexing purposes

Returns:
Bool as Int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide a more details description here, e.g. "1 if True and 0 otherwise."

@@ -472,6 +473,22 @@ struct SIMD[type: DType, size: Int = simdwidthof[type]()](
rebind[Scalar[type]](self).value
)

@always_inline("nodebug")
fn __index__(self) -> Int:
"""Returns the value as an int if it is an integral value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit here and below: missing periods

Suggested change
"""Returns the value as an int if it is an integral value
"""Returns the value as an int if it is an integral value.

Comment on lines 62 to 69
return self.src[].__get_ref[list_mutability, list_lifetime](
self.index - 1
)
else:
self.index -= 1
return self.src[].__get_ref(self.index)
return self.src[].__get_ref[list_mutability, list_lifetime](
self.index
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why are these changes needed?

self: Reference[Self, _, _],
) -> _ListIter[T, self.is_mutable, self.lifetime]:
fn __iter__[
mutability: __mlir_type.`i1`, self_life: AnyLifetime[mutability].type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below: why are these changes needed? Also, if you have to keep them, could you please provide docstrings for the parameters?

@@ -717,7 +717,7 @@ struct DTypePointer[
return LegacyPointer.address_of(arg[])

@always_inline("nodebug")
fn __getitem__[T: Intable](self, offset: T) -> Scalar[type]:
fn __getitem__[T: Indexer](self, offset: T) -> Scalar[type]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please update the docstring below.

fn __setitem__(inout self, index: Int, owned value: ElementType):
fn __setitem__[
indexer: Indexer
](inout self, idx: indexer, owned value: ElementType):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're changing this, could you please also add a docstring?

@@ -719,6 +719,14 @@ def test_list_mult():
assert_equal(len(List[Int](1, 2, 3) * 0), 0)


def test_indexer():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: to be consistent with the other tests

Suggested change
def test_indexer():
def test_indexing():
Suggested change
def test_indexer():
def test_indexer():

@laszlokindrat
Copy link
Contributor

Also, please add a changelog entry!

@bgreni
Copy link
Contributor Author

bgreni commented May 16, 2024

This is a great improvement, thank you! Internally, we have been discussing related issues, so your timing to do this is perfect. I have a bunch of mostly nit comments, and one big ask: could you please split this into two patches: one that introduces the Indexer trait and makes the types conform to it (along with tests for their __index__ methods if they don't have them yet), and a second one that changes __getitem__ et al to take Indexer instead of Int?

@laszlokindrat Sounds good thanks for the feedback. Do you mean split this into two PRs, or simply two commits within this PR?

@gabrieldemarmiesse
Copy link
Contributor

@bgreni I fused part of this PR with the one from @StandinKP ( #2386 ) as they are very much related. Maybe let's merge #2677 first and then both of you can rebase your PRs on nightly afterwards?

@laszlokindrat
Copy link
Contributor

@laszlokindrat Sounds good thanks for the feedback. Do you mean split this into two PRs, or simply two commits within this PR?

Sorry, I should have been clearer: two separate PRs, please (PRs get squashed before merging internally)! Thanks!

@laszlokindrat
Copy link
Contributor

laszlokindrat commented May 16, 2024

@bgreni I fused part of this PR with the one from @StandinKP ( #2386 ) as they are very much related. Maybe let's merge #2677 first and then both of you can rebase your PRs on nightly afterwards?

I would like to wait for the split of this to happen first. Then you can rebase #2677 once Indexer is delivered separately.

@gabrieldemarmiesse
Copy link
Contributor

I would like to wait for the split of this to happen first. Then you can rebase #2677 once Indexer is delivered separately.

Sounds good!

@bgreni
Copy link
Contributor Author

bgreni commented May 16, 2024

@laszlokindrat Sounds good thanks for the feedback. Do you mean split this into two PRs, or simply two commits within this PR?

Sorry, I should have been clearer: two separate PRs, please (PRs get squashed before merging internally)! Thanks!

Ok thanks, I'll do that

@bgreni
Copy link
Contributor Author

bgreni commented May 16, 2024

@laszlokindrat The PR for the introduction of the trait has been created #2685

modularbot pushed a commit that referenced this pull request May 17, 2024
[External] [stdlib] Add Indexer trait

First half of the split off from #2384.
Partially addresses #2337

Introduces the `Indexer` trait and applies to `Int`, `IntLiteral`,
`Bool`, and integral scalar `SIMD` types.

---------

Co-authored-by: bgreni <42788181+bgreni@users.noreply.github.com>
Closes #2685
MODULAR_ORIG_COMMIT_REV_ID: 224c700e9ff9b28abf9e7d72ff3b3271d7e9de09
@bgreni
Copy link
Contributor Author

bgreni commented May 17, 2024

@laszlokindrat #2722 has been created for the second part of applying Indexer to __getitem__ __setitem__ and __refitem__

@laszlokindrat
Copy link
Contributor

@laszlokindrat #2722 has been created for the second part of applying Indexer to __getitem__ __setitem__ and __refitem__

@bgreni cool, thanks! Does that mean that we can close this PR?

@bgreni
Copy link
Contributor Author

bgreni commented May 17, 2024

@laszlokindrat #2722 has been created for the second part of applying Indexer to __getitem__ __setitem__ and __refitem__

@bgreni cool, thanks! Does that mean that we can close this PR?

Yes definitely, I will close it now

@bgreni bgreni closed this May 17, 2024
msaelices pushed a commit to msaelices/mojo that referenced this pull request May 18, 2024
[External] [stdlib] Add Indexer trait

First half of the split off from modularml#2384.
Partially addresses modularml#2337

Introduces the `Indexer` trait and applies to `Int`, `IntLiteral`,
`Bool`, and integral scalar `SIMD` types.

---------

Co-authored-by: bgreni <42788181+bgreni@users.noreply.github.com>
Closes modularml#2685
MODULAR_ORIG_COMMIT_REV_ID: 224c700e9ff9b28abf9e7d72ff3b3271d7e9de09
modularbot pushed a commit that referenced this pull request May 21, 2024
[External] [stdlib] Apply Indexer trait to stdlib containers

The second half of #2384

I have intentionally omitted `static_tuple.mojo` to avoid conflicting
with #2677

Co-authored-by: bgreni <42788181+bgreni@users.noreply.github.com>
Closes #2722
MODULAR_ORIG_COMMIT_REV_ID: 365298938b36c3f4f9ad0067dfcb3856de369875
martinvuyk pushed a commit to martinvuyk/mojo that referenced this pull request May 24, 2024
[External] [stdlib] Add Indexer trait

First half of the split off from modularml#2384.
Partially addresses modularml#2337

Introduces the `Indexer` trait and applies to `Int`, `IntLiteral`,
`Bool`, and integral scalar `SIMD` types.

---------

Co-authored-by: bgreni <42788181+bgreni@users.noreply.github.com>
Closes modularml#2685
MODULAR_ORIG_COMMIT_REV_ID: 224c700e9ff9b28abf9e7d72ff3b3271d7e9de09
martinvuyk pushed a commit to martinvuyk/mojo that referenced this pull request May 24, 2024
[External] [stdlib] Apply Indexer trait to stdlib containers

The second half of modularml#2384

I have intentionally omitted `static_tuple.mojo` to avoid conflicting
with modularml#2677

Co-authored-by: bgreni <42788181+bgreni@users.noreply.github.com>
Closes modularml#2722
MODULAR_ORIG_COMMIT_REV_ID: 365298938b36c3f4f9ad0067dfcb3856de369875
modularbot pushed a commit that referenced this pull request May 28, 2024
[External] [stdlib] Add the `normalize_index` function

This PR is some kind of mix between
#2386 and
#2384 which have issues (aborting)
or have too many conflicts because the PR is too big.

This PR solves part of #2251 and
#2337

We try here to give the ground work for indexing correctly. This
function added can then be used wherever we work with sequences.

Two things I noticed during development:
1) The `debug_assert` does not run in unit tests. Is there any way to
enable it? We currently have out-of-bounds bugs in our test suite.
2) The null terminator is causing pain, again, again, and again. Do we
have any plans to make it optional when working with String? I opened
#2678 to discuss this.

To avoid to fix those issues in this PR, I used the `normalize_index` on
the `__refitem__` of `InlineArray` which doesn't have widespread use yet
and isn't impacted by out-of-bounds bugs.

My recommendation would be to merge this PR then to rebase
#2386 and
#2384 on it. We should also
afterwards fix the out of bounds bugs that can be triggered in the test
suite by enabling debug_assert.

The diff might seem big, but no worries, it's mostly the licenses and
docstrings :)

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes #2677
MODULAR_ORIG_COMMIT_REV_ID: 66e7121a6a333c16284eb33a89eb85c034c296c3
modularbot pushed a commit that referenced this pull request Jun 7, 2024
[External] [stdlib] Add Indexer trait

First half of the split off from #2384.
Partially addresses #2337

Introduces the `Indexer` trait and applies to `Int`, `IntLiteral`,
`Bool`, and integral scalar `SIMD` types.

---------

Co-authored-by: bgreni <42788181+bgreni@users.noreply.github.com>
Closes #2685
MODULAR_ORIG_COMMIT_REV_ID: 224c700e9ff9b28abf9e7d72ff3b3271d7e9de09
modularbot pushed a commit that referenced this pull request Jun 7, 2024
[External] [stdlib] Apply Indexer trait to stdlib containers

The second half of #2384

I have intentionally omitted `static_tuple.mojo` to avoid conflicting
with #2677

Co-authored-by: bgreni <42788181+bgreni@users.noreply.github.com>
Closes #2722
MODULAR_ORIG_COMMIT_REV_ID: 365298938b36c3f4f9ad0067dfcb3856de369875
modularbot pushed a commit that referenced this pull request Jun 7, 2024
[External] [stdlib] Add the `normalize_index` function

This PR is some kind of mix between
#2386 and
#2384 which have issues (aborting)
or have too many conflicts because the PR is too big.

This PR solves part of #2251 and
#2337

We try here to give the ground work for indexing correctly. This
function added can then be used wherever we work with sequences.

Two things I noticed during development:
1) The `debug_assert` does not run in unit tests. Is there any way to
enable it? We currently have out-of-bounds bugs in our test suite.
2) The null terminator is causing pain, again, again, and again. Do we
have any plans to make it optional when working with String? I opened
#2678 to discuss this.

To avoid to fix those issues in this PR, I used the `normalize_index` on
the `__refitem__` of `InlineArray` which doesn't have widespread use yet
and isn't impacted by out-of-bounds bugs.

My recommendation would be to merge this PR then to rebase
#2386 and
#2384 on it. We should also
afterwards fix the out of bounds bugs that can be triggered in the test
suite by enabling debug_assert.

The diff might seem big, but no worries, it's mostly the licenses and
docstrings :)

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes #2677
MODULAR_ORIG_COMMIT_REV_ID: 66e7121a6a333c16284eb33a89eb85c034c296c3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mojo-repo Tag all issues with this label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants