Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Make InlineArray call its elements' destructor #2965

Open
wants to merge 2 commits into
base: nightly
Choose a base branch
from

Conversation

gabrieldemarmiesse
Copy link
Contributor

@gabrieldemarmiesse gabrieldemarmiesse commented Jun 6, 2024

Fix this issue:

A little explanation of what is going on here, this should make the review easier.

Modifications to InlineArray

InlineArray made use of the @value decorator, which generates, in addition to a constructor, the following methods: __moveinit__, __copyinit__ and __del__. While it's easy to understand what is generated when working with attributes that are values, it's less clear what is happenning when working with plain MLIR types like we do here.

The next step is to manually define those to make sure they do what we want. We will always assume that the InlineArray is completely initialized. Thus when copying, moving or deleting, we must copy, move and delete each element in the InlineArray too.

To make the implementation easier, instead of using an UnsafePointer and its methods to control the lifecycle of the values (move_from_pointee & friends), we fill the array with UnsafeMaybeUninitialized. This struct has methods that allows us to control the lifecycle without necessarily work with pointers.

What do we do with InlineArray[...](unsafe_uninitialized=True)?

This method is a massive footgun. This is because throughout the methods of InlineArray, we always assume that the values are initialized. So doing any moving, copying or deleting with uninitialized memory is UB.

This is not much of an issue for very trivial types where moving and copying are just copying bits and the destructor does nothing. This can still cause UB if the user uses some uninitialized memory but it's fine if the caller is careful. For example, in the stdlib, we use unsafe_uninitialized to store some UInt8 here and there and it's fine.

As such, the constructor was kept, but an extensive docstring was added to explain in which situation it is okay to use it.

How InlineArray is now used by InlineList

InlineList wants to carefully manage the lifecycle of each element in the InlineArray while still making the InlineArray believe that it owns the lifecycle of its elements. In those kind of situation, UnsafeMaybeUninitialized is the right struct to use.

So we use an InlineArray[UnsafeMaybeUninitialized[ElementType]] inside InlineList. I know it can be confusing since InlineArray itself also uses UnsafeMaybeUninitialized but this shouldn't be visible from outside InlineArray so it should be fine. Basically it disables all lifecycle methods, and __del__ is a no-op.

This allows InlineList to manually control the lifecycle of each elements which has an index below the size (above the size of the InlineList, it's truly uninitialized memory).

Some guideline for the code review:

I advise to read the files in this order:

  • The files stdlib/test/test_utils/types.mojo and stdlib/test/collections/test_inline_list.mojo have been refactored to make the tests of InlineArray use the ValueDestructorRecorder. This could have been a separate PR but I was too lazy to split this PR even further.
  • The file stdlib/test/utils/test_tuple.mojo has a new test that ensure that all the elements in the InlineArray have their destructors called.
  • The file stdlib/src/utils/static_tuple.mojo has all the changes that makes InlineArray use UnsafeMaybeUninitialized internally. While we could have technically use pointers, it makes writing code easier when controlling lifecycle.
  • Finally, we switch the internal type of InlineList in stdlib/src/collections/inline_list.mojo. We now use InlineArray[UnsafeMaybeUninitialized[ElementType]] which allows the InlineList to control exactly the lifecycle. I repeat here: yes in can be confusing because overall, we use two UnsafeMaybeUninitialized on top of each other when using InlineArray[UnsafeMaybeUninitialized[ElementType]], but the code is still readable, because one "belongs" to InlineArray and one "belongs" to InlineList.

@gabrieldemarmiesse gabrieldemarmiesse changed the title [stdlib] Fix InlineArray not calling destructor and remove unsafe_uninitialized [stdlib] Fix InlineArray not calling its elements' destructor Jun 6, 2024
@gabrieldemarmiesse gabrieldemarmiesse changed the title [stdlib] Fix InlineArray not calling its elements' destructor [stdlib] Make InlineArray call its elements' destructor Jun 6, 2024
modularbot pushed a commit that referenced this pull request Jun 13, 2024
…ized.move_from` (#41744)

[External] [stdlib] Add convenience overload of
`UnsafeMaybeUninitialized.move_from`

This is part of #2965 .

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes #2972
MODULAR_ORIG_COMMIT_REV_ID: fa49109cb7ac5d5af9bd4e54c582df7ed8677c5b
modularbot pushed a commit that referenced this pull request Jun 14, 2024
…aybeUninitalized` (#41760)

[External] [stdlib] Give the `CollectionElementNew` trait to
`UnsafeMaybeUninitalized`

This is part of the PR #2965

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes #2971
MODULAR_ORIG_COMMIT_REV_ID: 994417a3cb7a2b160fb44f4455cd8f71de03af8b
@gabrieldemarmiesse gabrieldemarmiesse marked this pull request as ready for review June 14, 2024 11:01
@gabrieldemarmiesse gabrieldemarmiesse requested a review from a team as a code owner June 14, 2024 11:01
@gabrieldemarmiesse
Copy link
Contributor Author

@ConnorGray it's ready for review, please take a look when you have the time :)

@rd4com
Copy link
Contributor

rd4com commented Jun 16, 2024

good job @gabrieldemarmiesse ,
do you think it would be a proper opportunity to add a taking move to InlineList?

fn __moveinit__(inout self, owned other: Self):
        self._size = other._size
        other.size = 0 # taking move (see ._del)
        self._array = other._array^

@gabrieldemarmiesse
Copy link
Contributor Author

gabrieldemarmiesse commented Jun 16, 2024

@rd4com I think it can be added in another PR, this one is already (too?) big. Furthermore, the __moveinit__ is not as trivial as written because only some elements in the underlying array must be moved (the one below the self.size)

@ConnorGray
Copy link
Collaborator

!sync

@modularbot modularbot added the imported-internally Signals that a given pull request has been imported internally. label Jun 20, 2024
size.value,
`, `,
UnsafeMaybeUninitialized[Self.ElementType],
`>`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand the intention here, but I think this may not actually be required.

I think the state we want to aim for is:

  • InlineArray.__init__(unsafe_uninitialized) is replaced with:

    @staticmethod
    fn unsafe_uninitialized() -> InlineArray[UnsafeMaybeUninitialized[T]]:
        ...
  • All the remaining InlineArray.__init__(...) constructors should (and currently do) always fully initialize the InlineArray.

    (This is okay to do without making the pop.array storage type use UnsafeMaybeUninitialized because the compiler knows that inside an __init__ method self and the fields of self won't be fully initialized anyway.

With this approach, all the methods off InlineArray will be able to assume one of the following two things is true:

  1. The current InlineArray is fully initialized
  2. The current InlineArray may not be fully initialized, but its element types are definitely wrapped in UnsafeMaybeUninitialized[..], so that's okay.

We'd then also want to add this constructor for the programmer to "assert" that an uninitialized InlineArray is now fully initialized.

struct InlineArray[T]:
    fn __init__(
        inout self,
        owned unsafe_assume_initialized: InlineArray[UnsafeMaybeUninitialized[T]]
    ):
        # transmute the argument `InlineArray` into one that is now considered fully initialized
        ...

@gabrieldemarmiesse What do you think of this?

I think it would eliminate the need for some of the changes that had to be made in the current iteration of this PR throughout the InlineArray implementation.

Copy link
Contributor Author

@gabrieldemarmiesse gabrieldemarmiesse Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using UnsafeMaybeUninitialized inside InlineArray is indeed not necessary, but it was intentional to modify the struct like this. The idea was to move away from pointers and use only UnsafeMaybeUninitialized which has a restricted use compared to pointers (notably, no pointer arithmetic). We still want to carefully control the lifecycle, and UnsafeMaybeUninitialized allows it. With pointers the notion of lifecycle and memory address are mixed up in the same struct, using UnsafeMaybeUninitialized allows to manage both independently.

I can change it back and use plain pointers again, if that's really what you want, though I believe the code would be less readable. Let me know what you want there.

For the other changes you requested fn unsafe_uninitialized() -> InlineArray[UnsafeMaybeUninitialized[T]]: and the new constructor, can it be done in another pull request? The CI is currently passing, so I believe those methods are more in the realm of "nice to have" rather than "necessary to fix the bug".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a pointer-based implementation is still my personal preference, since it limits the number of types involved. But I see the merits of your argument too, and I don't want to hold up this excellent work on a small difference in perspective like that 🙂

So I'm going to resync this and merge it, and let this shake out—we can always revisit and cleanup later if there's a readability or maintainability problem in practice.

And I agree, we can / should add @staticmethod fn unsafe_uninitialized() in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind reverting one day to pointers if we see that this is causing some issues down the road :)

@@ -20,6 +20,7 @@ from collections import InlineList
"""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love how clean the changes are in this file with the introduction of UnsafeMaybeUninitialized 🙂

@ConnorGray
Copy link
Collaborator

ConnorGray commented Jun 20, 2024

Thank you for putting together such an excellent explanation for this work @gabrieldemarmiesse. This was a treat to review 🙂

My main suggestion boils down to basically I think we can keep the implementation of InlineArray itself mostly unchanged, but other than that this PR is straightforward and looks great 🔥

Thank you again for pushing on this work Gabriel, this will make a huge difference in how precisely we can model the invariants of our collection types.

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
@ConnorGray
Copy link
Collaborator

!sync

@modularbot
Copy link
Collaborator

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

@modularbot modularbot added merged-internally Indicates that this pull request has been merged internally merged-externally Merged externally in public mojo repo labels Jun 24, 2024
@modularbot
Copy link
Collaborator

Landed in 990c776! Thank you for your contribution 🎉

modularbot pushed a commit that referenced this pull request Jun 25, 2024
…(#42041)

[External] [stdlib] Make `InlineArray` call its elements' destructor

Fix this issue:
*  #2869

A little explanation of what is going on here, this should make the
review easier.

### Modifications to `InlineArray`
`InlineArray` made use of the `@value` decorator, which generates, in
addition to a constructor, the following methods: `__moveinit__`,
`__copyinit__` and `__del__`. While it's easy to understand what is
generated when working with attributes that are values, it's less clear
what is happenning when working with plain MLIR types like we do here.

The next step is to manually define those to make sure they do what we
want. **We will always assume that the `InlineArray` is completely
initialized**. Thus when copying, moving or deleting, we must copy, move
and delete each element in the `InlineArray` too.

To make the implementation easier, instead of using an `UnsafePointer`
and its methods to control the lifecycle of the values
(`move_from_pointee` & friends), we fill the array with
`UnsafeMaybeUninitialized`. This struct has methods that allows us to
control the lifecycle without necessarily work with pointers.

### What do we do with `InlineArray[...](unsafe_uninitialized=True)`?

This method is a massive footgun. This is because throughout the methods
of `InlineArray`, we always assume that the values are initialized. So
doing any moving, copying or deleting with uninitialized memory is UB.

This is not much of an issue for very trivial types where moving and
copying are just copying bits and the destructor does nothing. This can
still cause UB if the user uses some uninitialized memory but it's fine
if the caller is careful. For example, in the stdlib, we use
`unsafe_uninitialized` to store some `UInt8` here and there and it's
fine.

As such, the constructor was kept, but an extensive docstring was added
to explain in which situation it is okay to use it.

### How `InlineArray` is now used by `InlineList`

`InlineList` wants to carefully manage the lifecycle of each element in
the `InlineArray` while still making the `InlineArray` believe that it
owns the lifecycle of its elements. In those kind of situation,
`UnsafeMaybeUninitialized` is the right struct to use.

So we use an `InlineArray[UnsafeMaybeUninitialized[ElementType]]` inside
`InlineList`. I know it can be confusing since `InlineArray` itself also
uses `UnsafeMaybeUninitialized` but this shouldn't be visible from
outside `InlineArray` so it should be fine. Basically it disables all
lifecycle methods, and `__del__` is a no-op.

This allows `InlineList` to manually control the lifecycle of each
elements which has an index below the size (above the size of the
`InlineList`, it's truly uninitialized memory).

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes #2965
MODULAR_ORIG_COMMIT_REV_ID: 5d98bc8f2a9bfc35ee0a004b8ee6a36f4a475452
@modularbot modularbot closed this Jun 25, 2024
@ConnorGray
Copy link
Collaborator

Re-opening to try to merge this internally again.

@ConnorGray ConnorGray reopened this Jun 27, 2024
@ConnorGray
Copy link
Collaborator

!sync

@gabrieldemarmiesse
Copy link
Contributor Author

Re-opening to try to merge this internally again.

I saw that it got reverted for performance reasons. If you can pinpoint the reason or have tools to help pinpoint the issue, I'd be happy to know, just to satisfy my curiosity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported-internally Signals that a given pull request has been imported internally. merged-externally Merged externally in public mojo repo merged-internally Indicates that this pull request has been merged internally
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants