Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upAvoid double strlen for string operator+ and implement P1165R1 #467
Conversation
This comment has been minimized.
This comment has been minimized.
|
Co-authored by: https://github.com/ArtemSarmini |
…p\regress\objsize test.
This comment has been minimized.
This comment has been minimized.
|
Thanks for catching my number scramble @StephanTLavavej :) |
|
This approach looks correct to me, after resolving these issues. |
| explicit basic_string(_String_constructor_concat_tag, const basic_string& _Alsource, const _Elem* const _Left_ptr, | ||
| const size_type _Left_size, const _Elem* const _Right_ptr, const size_type _Right_size) | ||
| : _Mypair( | ||
| _One_then_variadic_args_t(), _Alty_traits::select_on_container_copy_construction(_Alsource._Getal())) { |
This comment has been minimized.
This comment has been minimized.
StephanTLavavej
Jan 28, 2020
Member
We're currently inconsistent, but in new code, I'd like to use {} empty braces for tags like _One_then_variadic_args_t. I'll file an issue about cleaning up our sources; I don't think we have one yet.
This comment has been minimized.
This comment has been minimized.
BillyONeal
Jan 28, 2020
Author
Member
Do you want me to fix up the tags in this file or just this one? (I'm not a fan of making it inconsistent with all the other ctors over this but if we're at least consistent within a file that'd be OK)
This comment has been minimized.
This comment has been minimized.
StephanTLavavej
Jan 28, 2020
Member
I filed #468 for the STL-wide overhaul, so I think I'd prefer to see parens used here for consistency and avoiding making this PR bigger.
| @@ -2432,6 +2438,33 @@ public: | |||
| _Proxy._Release(); | |||
| } | |||
|
|
|||
| explicit basic_string(_String_constructor_concat_tag, const basic_string& _Alsource, const _Elem* const _Left_ptr, | |||
This comment has been minimized.
This comment has been minimized.
StephanTLavavej
Jan 28, 2020
Member
The naming _Alsource is a little weird; we generally use _Almeow to refer to different flavors of allocators, but this is a basic_string serving as the source of an allocator. Any other naming would be fine, e.g. _Source_of_alloc or something.
| const size_type _Left_size, const _Elem* const _Right_ptr, const size_type _Right_size) | ||
| : _Mypair( | ||
| _One_then_variadic_args_t(), _Alty_traits::select_on_container_copy_construction(_Alsource._Getal())) { | ||
| _STL_INTERNAL_CHECK(_Left_size < max_size()); |
This comment has been minimized.
This comment has been minimized.
StephanTLavavej
Jan 28, 2020
Member
I believe our usual convention is to permit sizes equal to max_size() as suggested by the name. It should be acceptable to concatenate maximum and zero. Note that your third check permits equality.
This comment has been minimized.
This comment has been minimized.
BillyONeal
Jan 28, 2020
Author
Member
Hmm interesting. This is a bug in the _STL_INTERNAL_CHECK but not in the product and suggests I need to add zero-length appends to the test I added.
| _STL_INTERNAL_CHECK(max_size() - _Left_size >= _Right_size); | ||
| const auto _New_size = _Left_size + _Right_size; | ||
| size_type _New_capacity = _BUF_SIZE - 1; | ||
| _Elem* _Ptr = _Mypair._Myval2._Bx._Buf; |
This comment has been minimized.
This comment has been minimized.
StephanTLavavej
Jan 28, 2020
Member
You repeat _Mypair._Myval2 several times; I believe we usually factor this out into _My_data which is zero-overhead.
| _One_then_variadic_args_t(), _Alty_traits::select_on_container_copy_construction(_Alsource._Getal())) { | ||
| _STL_INTERNAL_CHECK(_Left_size < max_size()); | ||
| _STL_INTERNAL_CHECK(_Right_size < max_size()); | ||
| _STL_INTERNAL_CHECK(max_size() - _Left_size >= _Right_size); |
This comment has been minimized.
This comment has been minimized.
StephanTLavavej
Jan 28, 2020
Member
Style: I would prefer for all three comparisons to be <=; flipping the sense of the third one is more work to reason through.
This comment has been minimized.
This comment has been minimized.
BillyONeal
Jan 28, 2020
Author
Member
Right, I didn't try to make them consistent because I was using < for some and <= for others, but as you point out in the other comment, that was incorrect. I'll make them consistent.
| _Xlen_string(); | ||
| } | ||
|
|
||
| return basic_string<_Elem, _Traits, _Alloc>( |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
BillyONeal
Jan 28, 2020
Author
Member
Actually going to eliminate string_type and apply the de-explicit you suggested in the other comment.
| _Xlen_string(); | ||
| } | ||
|
|
||
| return basic_string<_Elem, _Traits, _Alloc>( |
This comment has been minimized.
This comment has been minimized.
| @@ -2195,6 +2195,12 @@ public: | |||
| template <class _Ty> | |||
| constexpr size_t _Size_after_ebco_v = is_empty_v<_Ty> ? 0 : sizeof(_Ty); // get _Ty's size after being EBCO'd | |||
|
|
|||
| struct _String_constructor_concat_tag {}; | |||
This comment has been minimized.
This comment has been minimized.
StephanTLavavej
Jan 28, 2020
Member
Idea: I observe that you've made this implicitly default constructible, while the basic_string ctor is explicit. What if you gave this tag an explicit default constructor (as some tags in the Standard do), forbidding {} from being used to construct the tag, and then made the basic_string ctor implicit? That would allow you to say return { _String_constructor_concat_tag{}, ARGS } below, while not seriously affecting safety in any way. Feel free to disregard though.
This comment has been minimized.
This comment has been minimized.
BillyONeal
Jan 28, 2020
Author
Member
I made the basic_string ctor explicit because I want to be as paranoid as possible about conversion sequences affecting basic_string's public interface, and I dislike giving these tags ctors because our ABI won't pass such a tag in a register. However, it looks like if we make an inline constexpr variable in release mode that gets eaten so that's OK: https://gcc.godbolt.org/z/jAU-cr
Hmmm and for some reason it looks like in debug mode that's actually an improvement: I find that surprising. https://gcc.godbolt.org/z/0IHz4F Maybe we should start doing this for all tags (including one_then_variadic_args and friends)?
This comment has been minimized.
This comment has been minimized.
BillyONeal
Jan 28, 2020
Author
Member
(That is to say, my initial reaction was "I don't want to do that because codegen", then I actually checked the codegen, and now my reaction is "I want to do that because codegen" :)
This comment has been minimized.
This comment has been minimized.
StephanTLavavej
Jan 28, 2020
Member
Can you add that suggestion (inline constexpr) to #468 and explain your codegen discovery? We didn't have inline constexpr back when we implemented the old tags. (Also, we should check whether C++14 without inline variables tolerates this as well.)
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
CaseyCarter
Jan 28, 2020
Member
However, it looks like if we make an inline constexpr variable in release mode that gets eaten so that's OK
You need to uncheck the ".LX0" and ".text" filters in CE to see the definition of the variable; it's not emitted in the non-USE_INLINE cases since it isn't ODR-used, but is emitted for /DUSE_INLINE. The compiler apparently doesn't realize that the object representation is only padding bits that it need not copy when making a copy of the variable; it copies a byte of uninitialized stack space for non-USE_INLINE, and copies the representation tagInstance for USE_INLINE.
This comment has been minimized.
This comment has been minimized.
BillyONeal
Jan 29, 2020
Author
Member
Ah, Casey is right. In that case release codegen for creating the temporary (what we already do) is best.
| _Ans += _Left; | ||
| _Ans += _Right; | ||
| return _Ans; | ||
| if (_Left.max_size() - _Left.size() < _Right.size()) { |
This comment has been minimized.
This comment has been minimized.
StephanTLavavej
Jan 28, 2020
Member
Debug codegen question for all operators: Should you factor out the calls to each .size() since they are repeated in both the overflow check and the actual construction?
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I wish I had a better way to put benchmark results here than screenshots as they aren't great for anyone but fully sighted users, consider this a vote for google/benchmark#441 |
This comment has been minimized.
This comment has been minimized.
|
There are currently multiple PRs for libc++ that try to optimize the SSO case of the basic_string contructors through a tail-call optimization e.g. for the copy constructor. I have no idea how the SSO case is currently handeled in the STL machinery, but it might be worth it to consider that kind of optimization when touching all those operators anyway |
This comment has been minimized.
This comment has been minimized.
I think this sounds like a separate consideration and we would want a pattern to fix all the constructors to follow probably. It should be noted that we are different than libc++ in a major respect here; namely, that we can reuse the initialization code between our small and large representations, whereas for the most part they might find that difficult because their layouts are totally different in the different modes. If someone wanted to investigate that area I think they would want to start with the Line 2560 in fd04f77 I do note that for copies we do do the "just copy over the entire string data structure" optimization; see 19-28 here: https://gcc.godbolt.org/z/hP9jo9 |
This comment has been minimized.
This comment has been minimized.
Filed #472 |
…stently use <= and fix <= that was <, extract _My_data in the ctor, add throws annotation to _Proxy construction, memoize size.
…ffer" branch. * Turn on alias check for CONTAINER_DEBUG_LEVEL rather than ITERATOR_DEBUG_LEVEL * Fix off-by-1 error in capacity check when attempting to use left buffer.
| _My_data._Alloc_proxy(_GET_PROXY_ALLOCATOR(_Alty, _Getal())); // throws, hereafter nothrow in this block | ||
| _Take_contents(_Right, bool_constant<_Can_memcpy_val>{}); | ||
| const auto _Ptr = _Unfancy(_My_data._Bx._Ptr); | ||
| _Traits::move(_Ptr + _Left_size, _Ptr, _My_data._Mysize); |
This comment has been minimized.
This comment has been minimized.
StephanTLavavej
Jan 31, 2020
Member
Same comment about avoiding a null terminator assignment at the end - you could simply move down the freshly taken null terminator here. This will also improve locality a little, hopefully.
This comment has been minimized.
This comment has been minimized.
BillyONeal
Jan 31, 2020
Author
Member
Shouldn't improve locality at all; after all we just wrote the character before that null terminator so that cache location is hot.
This comment has been minimized.
This comment has been minimized.
StephanTLavavej
Jan 31, 2020
Member
You were performing the following actions: take right's buffer, move down right's data (without the null terminator), copy in left's data to the beginning of the buffer, then go back to the end of the buffer and write the null terminator. If you had scribbled the null terminator immediately after moving down right's data, then I would agree that there is no locality change.
This comment has been minimized.
This comment has been minimized.
* Introduce _Fits_in_left and _Fits_in_right which clarify proof block * Use +1 instead of explicit null termination * Reuse _Left_size and _Right_size more * Go back to IDL for alias check.
| _NODISCARD constexpr bool _Allocators_equal(const _Alloc& _Lhs, const _Alloc& _Rhs) noexcept { | ||
| if | ||
| _CONSTEXPR_IF(allocator_traits<_Alloc>::is_always_equal::value) { | ||
| (void) _Lhs; |
This comment has been minimized.
This comment has been minimized.
barcharcraz
Jan 31, 2020
Contributor
why do we need to cast these to void here? is it just because the compiler will complain about them being unused without realizing that they are used in the other branch of the constexpr if?
This comment has been minimized.
This comment has been minimized.
StephanTLavavej
Jan 31, 2020
Member
Yes, this is (Microsoft-internal) VSO-486357, which is so frequently encountered that we don't mark every occurrence as TRANSITION (although we could, I suppose).
This comment has been minimized.
This comment has been minimized.
BillyONeal
Jan 31, 2020
Author
Member
I would prefer an STL-wide update like tagging all of them be its own change.
This comment has been minimized.
This comment has been minimized.
miscco
Feb 1, 2020
Contributor
I think the whole purpose of if constexpr is that the other branches are not processed by the Compiler. Adding scanning for uses of such variables defeats its purpose
This comment has been minimized.
This comment has been minimized.
StephanTLavavej
Feb 1, 2020
Member
They still have to be parsed (it’s not “token soup”) and other compilers avoid warning in this scenario.
| @@ -2195,6 +2195,14 @@ public: | |||
| template <class _Ty> | |||
| constexpr size_t _Size_after_ebco_v = is_empty_v<_Ty> ? 0 : sizeof(_Ty); // get _Ty's size after being EBCO'd | |||
|
|
|||
| struct _String_constructor_concat_tag { | |||
This comment has been minimized.
This comment has been minimized.
barcharcraz
Jan 31, 2020
Contributor
I'd like a comment saying what this tag does.
Something like:
// used in operator+ when we've already computed the size, to avoid another strlen check
(not sure if that's a 100% accurate description of what it does)
| // therefore: (by the distributive property) | ||
| // (!_Fits_in_left && _Fits_in_right) // implying _Right has more capacity | ||
| // || (_Right_capacity > _Left_capacity && _Fits_in_right) // tests that _Right has more capacity | ||
| // therefore: _Right must have more than the minimum capacity, so it must be _Large_string_engaged() |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| _My_data._Alloc_proxy(_GET_PROXY_ALLOCATOR(_Alty, _Getal())); // throws, hereafter nothrow in this block | ||
| _Take_contents(_Right, bool_constant<_Can_memcpy_val>{}); | ||
| const auto _Ptr = _Unfancy(_My_data._Bx._Ptr); | ||
| _Traits::move(_Ptr + _Left_size, _Ptr, _My_data._Mysize + 1); |
This comment has been minimized.
This comment has been minimized.
StephanTLavavej
Jan 31, 2020
Member
Apologies for not noticing earlier - _My_data._Mysize can/should be _Right_size after taking the contents of _Right.
This comment has been minimized.
This comment has been minimized.
BillyONeal
Jan 31, 2020
Author
Member
No need to apologize -- I thought I got all of them but I guess I missed one.
BillyONeal commentedJan 28, 2020
•
edited
This change is a competing proposal to GH-419.
Resolves GH-53.
Resolves GH-456.
This change adds a bespoke constructor to
basic_stringto handle string concat use cases, removing any EH states we previously emitted in our operator+s, avoiding double strlen in our operator+s,The EH states problem comes from our old pattern:
Here, the compiler does not know that the append operation can't throw, because it doesn't understand
basic_stringand doesn't know thereservehas made that always safe. As a result, the compiler emitted EH handing code to callresult's destructor after each of the reserve andoperator+=calls.Using a bespoke concatenating constructor avoids these problems because there is only one throwing operation (in IDL0 mode). As expected, this results in a small performance win in all concats due to avoiding needing to set up EH stuff, and a large performance win for the
const char*concats due to the avoided secondstrlen:Performance:
Times are in NS on a Ryzen Threadripper 3970X, improvements are
((Old/New)-1)*100Code size:
Sizes are in bytes for the
.obj, "Times Original" is New/Old,cl /EHsc /W4 /WX /c /O2 .\code_size.cpp:Checklist
Be sure you've read README.md and understand the scope of this repo.
If you're unsure about a box, leave it unchecked. A maintainer will help you.
_Uglyas perhttps://eel.is/c++draft/lex.name#3.1 or there are no product code changes.
verified by an STL maintainer before automated testing is enabled on GitHub,
leave this unchecked for initial submission).
members, adding virtual functions, changing whether a type is an aggregate
or trivially copyable, etc.).
the C++ Working Draft (including any cited standards), other WG21 papers
(excluding reference implementations outside of proposed standard wording),
and LWG issues as reference material. If they were derived from a project
that's already listed in NOTICE.txt, that's fine, but please mention it.
If they were derived from any other project (including Boost and libc++,
which are not yet listed in NOTICE.txt), you must mention it here,
so we can determine whether the license is compatible and what else needs
to be done.