Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Add small string optimization to the String struct #2507

Closed

Conversation

gabrieldemarmiesse
Copy link
Contributor

@gabrieldemarmiesse gabrieldemarmiesse commented May 4, 2024

Fix part of #2467

This PR will stay in draft as it's too big to be merged at once, I'll split it further into multiple PRs. I also have some cleanup to do and some benchmarking. But it can give the general idea of how it works.

In a nutshell:

  • I created a type _InlineBytesList which is like InlineArray but specialized in Int8 and has a size and capacity (list-like)
  • I created a type _BytesListWithSmallSizeOptimization which has the interface of a List, but can fall back to stack allocated memory with _InlineBytesList for small sizes.
  • Replaced List[Int8] in String by _BytesListWithSmallSizeOptimization. Adapt a few lines here and there.

The tests are passing locally. I'm open to feedback as I work on the cleanup+split.

Benchmark

See #2467 (comment)

@gabrieldemarmiesse gabrieldemarmiesse requested a review from a team as a code owner May 4, 2024 18:29
@gabrieldemarmiesse gabrieldemarmiesse marked this pull request as draft May 4, 2024 18:29


@value
struct _InlineBytesList[capacity: Int](Sized, CollectionElement):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought I'm not convinced this is a better solution than making InlineFixedVector use an InlineVector instead of StaticTuple, but that's also less trivial an update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea! I didn't know InlineFixedVector existed actually. There is not much code to change and can be a good way to split this PR into multiple ones. Definitly in my TODO list.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI InlineFixedVector is going to switch over to AnyType or CollectionElement after we fix the compiler bug(s) blocking #2377

@@ -400,3 +400,11 @@ struct InlineArray[ElementType: CollectionElement, size: Int](Sized):
return Reference(self)[]._get_reference_unsafe[mutability, self_life](
normalized_idx
)

fn get_storage_unsafe_pointer(self) -> UnsafePointer[ElementType]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We explicitly chose not to reveal this in the safe API in the original PR. If you really think we should be adding it, we should at least prefix it with an _, but the general stance is that getting an unsafe pointer is not recommended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are wayyyy too many places in the stdlib that uses a string pointer, so I don't mind removing it but this PR is not the place to do it.

Concerning the naming, I'm just following the instructions given in the contributing guide. There is nothing stating that unsafe functions should be private, but we should be explicit about the unsafe part (like in swift):
https://github.com/modularml/mojo/blob/main/stdlib/docs/vision.md#objectives-and-goals

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by comment: we have an internal PR which is harmonizing all the "get an unsafe pointer" APIs to unsafe_ptr().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty rare in python to abbreviate words, but I guess that folks coming from low level languages too used to write "ptr" everywhere XD I don't want to bikeshed too much, but just saying that there is no reason to use "ptr" instead of "pointer"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I don't feel strongly on the eventual unified/final name. Mostly care about unifying the various different namings we have now across the types.

@ematejska ematejska added the mojo-repo Tag all issues with this label label May 6, 2024
JoeLoser pushed a commit that referenced this pull request May 8, 2024
…plicit (#39465)

[External] [stdlib] Make the use of the unsafe constructor of List
explicit

This PR is a small piece of #2507

We want unsafe things to be explicit, as stated in [the vision
guide](https://github.com/modularml/mojo/blob/nightly/stdlib/docs/vision.md#objectives-and-goals)

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes #2523
MODULAR_ORIG_COMMIT_REV_ID: 861433431c0fc7fd6ab1d905dfe09f3e3a5792c1
JoeLoser pushed a commit that referenced this pull request May 11, 2024
…39560)

[External] [stdlib] Add `InlineList` struct (stack-allocated List)

This struc is very useful to implement SSO, it's related to
* #2467
* #2507

If this is merged, I can take advantage of this in my PR that has the
SSO POC

About `InlineFixedVector`: `InlineList` is different. Notably,
`InlineList` have its capacity decided at compile-time, and there is no
heap allocation (unless the elements have heap-allocated data of
course).

`InlineFixedVector` stores the first N element on the stack, and the
next elements on the heap. Since not all elements are not in the same
spot, it makes it hard to work with pointers there as the data is not
contiguous.

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes #2587
MODULAR_ORIG_COMMIT_REV_ID: 86df7b19f0f38134fbaeb8a23fe9aef27e47c554
@JoeLoser
Copy link
Collaborator

ea1f2a5 landed in today's nightly, so feel free to rebase with that if you'd like.

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
@gabrieldemarmiesse
Copy link
Contributor Author

gabrieldemarmiesse commented May 11, 2024

FYI, we got huge slowdowns (x2 in some cases) in nightly for the benchmarks presented above. The numbers for my branch didn't change si it makes my SSO implementation look better XD

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
lsh pushed a commit to lsh/mojo that referenced this pull request May 17, 2024
…plicit (#39465)

[External] [stdlib] Make the use of the unsafe constructor of List
explicit

This PR is a small piece of modularml#2507

We want unsafe things to be explicit, as stated in [the vision
guide](https://github.com/modularml/mojo/blob/nightly/stdlib/docs/vision.md#objectives-and-goals)

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes modularml#2523
MODULAR_ORIG_COMMIT_REV_ID: 861433431c0fc7fd6ab1d905dfe09f3e3a5792c1

Signed-off-by: Lukas Hermann <lukashermann28@gmail.com>
lsh pushed a commit to lsh/mojo that referenced this pull request May 17, 2024
…39560)

[External] [stdlib] Add `InlineList` struct (stack-allocated List)

This struc is very useful to implement SSO, it's related to
* modularml#2467
* modularml#2507

If this is merged, I can take advantage of this in my PR that has the
SSO POC

About `InlineFixedVector`: `InlineList` is different. Notably,
`InlineList` have its capacity decided at compile-time, and there is no
heap allocation (unless the elements have heap-allocated data of
course).

`InlineFixedVector` stores the first N element on the stack, and the
next elements on the heap. Since not all elements are not in the same
spot, it makes it hard to work with pointers there as the data is not
contiguous.

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes modularml#2587
MODULAR_ORIG_COMMIT_REV_ID: 86df7b19f0f38134fbaeb8a23fe9aef27e47c554

Signed-off-by: Lukas Hermann <lukashermann28@gmail.com>
@gabrieldemarmiesse
Copy link
Contributor Author

Closing in favor of #2827

modularbot pushed a commit that referenced this pull request Jun 7, 2024
…plicit (#39465)

[External] [stdlib] Make the use of the unsafe constructor of List
explicit

This PR is a small piece of #2507

We want unsafe things to be explicit, as stated in [the vision
guide](https://github.com/modularml/mojo/blob/nightly/stdlib/docs/vision.md#objectives-and-goals)

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes #2523
MODULAR_ORIG_COMMIT_REV_ID: 861433431c0fc7fd6ab1d905dfe09f3e3a5792c1
modularbot pushed a commit that referenced this pull request Jun 7, 2024
…39560)

[External] [stdlib] Add `InlineList` struct (stack-allocated List)

This struc is very useful to implement SSO, it's related to
* #2467
* #2507

If this is merged, I can take advantage of this in my PR that has the
SSO POC

About `InlineFixedVector`: `InlineList` is different. Notably,
`InlineList` have its capacity decided at compile-time, and there is no
heap allocation (unless the elements have heap-allocated data of
course).

`InlineFixedVector` stores the first N element on the stack, and the
next elements on the heap. Since not all elements are not in the same
spot, it makes it hard to work with pointers there as the data is not
contiguous.

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes #2587
MODULAR_ORIG_COMMIT_REV_ID: 86df7b19f0f38134fbaeb8a23fe9aef27e47c554
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mojo-repo Tag all issues with this label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants