-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Add small string optimization to the String
struct
#2507
Conversation
|
||
|
||
@value | ||
struct _InlineBytesList[capacity: Int](Sized, CollectionElement): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought I'm not convinced this is a better solution than making InlineFixedVector
use an InlineVector
instead of StaticTuple
, but that's also less trivial an update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great idea! I didn't know InlineFixedVector existed actually. There is not much code to change and can be a good way to split this PR into multiple ones. Definitly in my TODO list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI InlineFixedVector
is going to switch over to AnyType
or CollectionElement
after we fix the compiler bug(s) blocking #2377
stdlib/src/utils/static_tuple.mojo
Outdated
@@ -400,3 +400,11 @@ struct InlineArray[ElementType: CollectionElement, size: Int](Sized): | |||
return Reference(self)[]._get_reference_unsafe[mutability, self_life]( | |||
normalized_idx | |||
) | |||
|
|||
fn get_storage_unsafe_pointer(self) -> UnsafePointer[ElementType]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We explicitly chose not to reveal this in the safe API in the original PR. If you really think we should be adding it, we should at least prefix it with an _
, but the general stance is that getting an unsafe pointer is not recommended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are wayyyy too many places in the stdlib that uses a string pointer, so I don't mind removing it but this PR is not the place to do it.
Concerning the naming, I'm just following the instructions given in the contributing guide. There is nothing stating that unsafe functions should be private, but we should be explicit about the unsafe part (like in swift):
https://github.com/modularml/mojo/blob/main/stdlib/docs/vision.md#objectives-and-goals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by comment: we have an internal PR which is harmonizing all the "get an unsafe pointer" APIs to unsafe_ptr()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty rare in python to abbreviate words, but I guess that folks coming from low level languages too used to write "ptr" everywhere XD I don't want to bikeshed too much, but just saying that there is no reason to use "ptr" instead of "pointer"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I don't feel strongly on the eventual unified/final name. Mostly care about unifying the various different namings we have now across the types.
…plicit (#39465) [External] [stdlib] Make the use of the unsafe constructor of List explicit This PR is a small piece of #2507 We want unsafe things to be explicit, as stated in [the vision guide](https://github.com/modularml/mojo/blob/nightly/stdlib/docs/vision.md#objectives-and-goals) Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2523 MODULAR_ORIG_COMMIT_REV_ID: 861433431c0fc7fd6ab1d905dfe09f3e3a5792c1
…39560) [External] [stdlib] Add `InlineList` struct (stack-allocated List) This struc is very useful to implement SSO, it's related to * #2467 * #2507 If this is merged, I can take advantage of this in my PR that has the SSO POC About `InlineFixedVector`: `InlineList` is different. Notably, `InlineList` have its capacity decided at compile-time, and there is no heap allocation (unless the elements have heap-allocated data of course). `InlineFixedVector` stores the first N element on the stack, and the next elements on the heap. Since not all elements are not in the same spot, it makes it hard to work with pointers there as the data is not contiguous. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2587 MODULAR_ORIG_COMMIT_REV_ID: 86df7b19f0f38134fbaeb8a23fe9aef27e47c554
ea1f2a5 landed in today's nightly, so feel free to rebase with that if you'd like. |
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
1da59d4
to
3271101
Compare
FYI, we got huge slowdowns (x2 in some cases) in nightly for the benchmarks presented above. The numbers for my branch didn't change si it makes my SSO implementation look better XD |
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
…plicit (#39465) [External] [stdlib] Make the use of the unsafe constructor of List explicit This PR is a small piece of modularml#2507 We want unsafe things to be explicit, as stated in [the vision guide](https://github.com/modularml/mojo/blob/nightly/stdlib/docs/vision.md#objectives-and-goals) Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes modularml#2523 MODULAR_ORIG_COMMIT_REV_ID: 861433431c0fc7fd6ab1d905dfe09f3e3a5792c1 Signed-off-by: Lukas Hermann <lukashermann28@gmail.com>
…39560) [External] [stdlib] Add `InlineList` struct (stack-allocated List) This struc is very useful to implement SSO, it's related to * modularml#2467 * modularml#2507 If this is merged, I can take advantage of this in my PR that has the SSO POC About `InlineFixedVector`: `InlineList` is different. Notably, `InlineList` have its capacity decided at compile-time, and there is no heap allocation (unless the elements have heap-allocated data of course). `InlineFixedVector` stores the first N element on the stack, and the next elements on the heap. Since not all elements are not in the same spot, it makes it hard to work with pointers there as the data is not contiguous. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes modularml#2587 MODULAR_ORIG_COMMIT_REV_ID: 86df7b19f0f38134fbaeb8a23fe9aef27e47c554 Signed-off-by: Lukas Hermann <lukashermann28@gmail.com>
Closing in favor of #2827 |
…plicit (#39465) [External] [stdlib] Make the use of the unsafe constructor of List explicit This PR is a small piece of #2507 We want unsafe things to be explicit, as stated in [the vision guide](https://github.com/modularml/mojo/blob/nightly/stdlib/docs/vision.md#objectives-and-goals) Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2523 MODULAR_ORIG_COMMIT_REV_ID: 861433431c0fc7fd6ab1d905dfe09f3e3a5792c1
…39560) [External] [stdlib] Add `InlineList` struct (stack-allocated List) This struc is very useful to implement SSO, it's related to * #2467 * #2507 If this is merged, I can take advantage of this in my PR that has the SSO POC About `InlineFixedVector`: `InlineList` is different. Notably, `InlineList` have its capacity decided at compile-time, and there is no heap allocation (unless the elements have heap-allocated data of course). `InlineFixedVector` stores the first N element on the stack, and the next elements on the heap. Since not all elements are not in the same spot, it makes it hard to work with pointers there as the data is not contiguous. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2587 MODULAR_ORIG_COMMIT_REV_ID: 86df7b19f0f38134fbaeb8a23fe9aef27e47c554
Fix part of #2467
This PR will stay in draft as it's too big to be merged at once, I'll split it further into multiple PRs. I also have some cleanup to do and some benchmarking. But it can give the general idea of how it works.
In a nutshell:
_InlineBytesList
which is likeInlineArray
but specialized inInt8
and has a size and capacity (list-like)_BytesListWithSmallSizeOptimization
which has the interface of a List, but can fall back to stack allocated memory with_InlineBytesList
for small sizes.List[Int8]
inString
by_BytesListWithSmallSizeOptimization
. Adapt a few lines here and there.The tests are passing locally. I'm open to feedback as I work on the cleanup+split.
Benchmark
See #2467 (comment)