Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Add optional small buffer optimization to List, take 2 #2825

Open
wants to merge 103 commits into
base: nightly
Choose a base branch
from

Conversation

gabrieldemarmiesse
Copy link
Contributor

@gabrieldemarmiesse gabrieldemarmiesse commented May 25, 2024

This PR solves part of #2467

This PR is part of three PRs to read and merge in the following order

The small buffer behavior

We use InlineArray to store up to small_buffer_size elements on the stack.
When the user requests a capacity of N, if N <= small_buffer_size, then the small buffer is used. We make the pointer of the List point to the InlineArray. And we set the self.capacity to small_buffer_size since we can store up to small_buffer_size elements before doing a reallocation.

Constructing from existing pointers

When constructing from an existing pointer, to avoid copying the values, the pointer is saved directly in self.data without moving the values, even if the size of the pointer is smaller than the buffer. This is to avoid moves for performance reasons. To reallocate the values on the small buffer (if it is big enough) one can simply do a copy of the List.

Alternative considered: Variant

You might wonder, why not using a Variant since we don't need to store the capacity and the pointer data (and the size could be smaller, like an UInt8) when using the small buffer? This is because by setting the pointer to the stack allocated InlineArray, there is no overhead introduced compared to a normal list when doing to following operations: reading, checking bounds, getting the size, getting the capacity. The exact same steps are done, with just different values inside the variables. You might get some speedups when reading the values in the buffer due to cache locality.

Checking where are the elements stored

Sometimes we need to know if the InlineArray is used or if we allocated data with UnsafePointer.alloc. When this is the case, we can do the following check if self.data == self._small_buffer.unsafe_ptr(). Note that this produces a compiler bug when materializing. See the next PR.

Backward compatibility

By default, the small buffer size is 0. Some @parameter if self.sbo_enabled here and there make sure that we don't add a single instruction when the small buffer size is null. So the default behavior of List doesn't change at all.
A notable difference is that List[Int] now means a list with no buffer optimization. Using List[Int, _] can be used in a signature to signify that the function works with any small buffer sizes.

Unit tests

I take all the tests in test_list.mojo and run them with 10 different small buffer sizes. I also sometimes mix buffer sizes to make sure that methods like extends work with different SBO sizes.

I added a materialization test. But it only works with a SBO size of 0 due to the following compiler bug: #2637

Since the bug won't be solved anytime soon, I made #2826 as a workaround

@gabrieldemarmiesse gabrieldemarmiesse changed the title [stdlib] Add optional small buffer optimization to List [stdlib] Add optional small buffer optimization to List, take 2 May 25, 2024
@JoeLoser
Copy link
Collaborator

Any initial perf numbers with this compiler workaround approach?

@gabrieldemarmiesse
Copy link
Contributor Author

gabrieldemarmiesse commented May 25, 2024

I'm not there yet, I still need some more work before being able to run the benchmarks :) I'll ping you with a summary when everything is done. If that is possible to review quickly afterwards, I would be grateful, as I may get a lot of conflicts otherwise.

@JoeLoser
Copy link
Collaborator

I'm not there yet, I still need some more work before being able to run the benchmarks :) I'll ping you with a summary when everything is done. If that is possible to review quickly afterwards, I would be grateful, as I may get a lot of conflicts otherwise.

Sounds good, thanks! I'll review ASAP once it's ready.

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
gabrieldemarmiesse added a commit to gabrieldemarmiesse/mojo that referenced this pull request May 25, 2024
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
@gabrieldemarmiesse gabrieldemarmiesse marked this pull request as ready for review May 25, 2024 22:41
@gabrieldemarmiesse gabrieldemarmiesse requested a review from a team as a code owner May 25, 2024 22:41
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
@JoeLoser
Copy link
Collaborator

!sync

@JoeLoser
Copy link
Collaborator

!sync

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
@JoeLoser
Copy link
Collaborator

!sync

fn count[T: ComparableCollectionElement](self: List[T], value: T) -> Int:
fn count[
U: ComparableCollectionElement
](self: Reference[List[U, Self.small_buffer_size], _, _], value: U) -> Int:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to use references in those situations otherwise the compiler wasn't happy at all with the extra parameter. The behavior didn't change. Same for other methods like __str__

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you file compiler bugs please? Your PRs are definitely hitting a few and I want to make sure they're all captured! Perhaps a comment in the code too referencing such bugs?

Copy link
Contributor Author

@gabrieldemarmiesse gabrieldemarmiesse May 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind filing those bugs but can it be after the PR is merged? Making a minimal reproducible example can sometimes take a few minutes but it can also sometimes take a lot of time. I can make another PRs to add the TODOs later if that's okay with you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, totally fine.

@JoeLoser
Copy link
Collaborator

JoeLoser commented May 26, 2024

Unfortunately this is hitting a nasty compiler crash for internal things with 500+ stack frame trace 😞 .

So it won't land today. I'll have to investigate this (won't have time until the work week) and file compiler repros, especially since it's not something repro'ing in the public stdlib alone.

@nmsmith
Copy link
Contributor

nmsmith commented May 27, 2024

The parameter name small_buffer_size isn't very illuminating. When I read it, I think "okay, the buffer is small, but why should I care?" IMO what you want to convey is that the list has an inline buffer. This is the property that matters for performance. If you tell somebody the buffer is inline—and if they know anything about how memory works—then they will conclude that having an inline buffer might help with performance.

TL;DR: Maybe the parameter should be called inline_buffer_size. And a similar tweak to the names self._small_buffer and self._small_buffer_type would make sense.

@nmsmith
Copy link
Contributor

nmsmith commented May 27, 2024

Also if I'm reading the code right, the "buffer size" is actually the capacity of the list, when it is stored inline. So something like inline_capacity would be even more accurate.

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
…nstructor

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
…_branch

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
… remove_copyable_from_collectionelement

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
@gabrieldemarmiesse
Copy link
Contributor Author

gabrieldemarmiesse commented Sep 14, 2024

I've hit an issue in the unit test. Not sure if it's a compiler bug or an issue with my code. I'll take a look at it later on.

EDIT: I get the strangest behavior. When calling reverse (all other tests work well) and printing the list inside the reverse function, the values are correctly swapped. But as soon as we exit the reverse function, printing the list shows the original values. Since it's an inout argument, I don't understand why it would do this. This triggers only when SBO is in use and writing to the InlineArray.

I really have no idea what is happening. I tried printing the addresses at different points in the stack but everything looks correct. I could use some help there.

@JoeLoser
Copy link
Collaborator

!sync

@JoeLoser JoeLoser added the blocked Blocked by another issue that must be resolved first label Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Blocked by another issue that must be resolved first imported-internally Signals that a given pull request has been imported internally.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants