Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce cost of resetting row attributes #15497

Merged
merged 3 commits into from Jun 15, 2023

Conversation

lhecker
Copy link
Member

@lhecker lhecker commented Jun 2, 2023

Performance of printing enwik8.txt at the following block sizes:
4KiB (printf): 54MB/s -> 54MB/s
128KiB (cat): 101MB/s -> 104MB/s

Validation Steps Performed

This change is easily verifiable via review.

@lhecker lhecker added Product-Conhost For issues in the Console codebase Area-Performance Performance-related issue labels Jun 2, 2023
Comment on lines +625 to +627
// This is a very unsafe shortcut to free the buffer and get a direct
// hold to the _buffer. The caller can then fill it with `size` items.
[[nodiscard]] T* unsafe_shrink_to_size(size_t size) noexcept
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on why this is unsafe and why it is okay for us to do it anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an excellent question. I would love to have it in a comment.

My best guess is, "it returns a raw pointer to the internal storage of the small_vector"... which literally lets us poke at its innards.

Leonard, quick question: why is it faster for us to do this (poke at the innards) than to have an operator= or a .resize_to_one_and_reset() or something on rle?

Does this introduce a primitive that is used in future PRs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading from uninitialized memory with automatic storage duration (= allocated on the stack) is technically undefined behavior, even for trivial structs and arrays. Trivial structs are basically structs that consist entirely of primitive types like char or int. The pointer this function returns points into the small-vector's internal buffer that contains exactly such uninitialized memory and so the caller has to technically make sure to not read from it, or otherwise the CPU might trap / throw an exception, etc. (This doesn't really apply to any modern CPU architecture.)

More importantly however, since the array is not initialized, the returned pointer cannot be used for non-trivial structs. For instance, imagine I have this code:

// A vector with space for 4 items on the stack, without the need for heap allocations.
til::small_vector<std::string, 4> vec;
// Make room for 1 item on the stack.
const auto ptr = vec.unsafe_shrink_to_size(1);
// Initialize that first item in the vector.
*ptr = std::string{ "foobar" };

This would fail in terrible ways, because the backing buffer on the stack is uninitialized. Its memory may contain any potential garbage, leftover data. When you then assign a std::string to it, the string's operator=() will try to free any potential memory the target (*ptr) may have had, in order to assign the new string contents. When it then reads the uninitialized/garbage values, it'll think that the target / *ptr had something allocated and try to free it. But that's wrong. It didn't have anything.

For cases like this the right choice is usually to use std::uninitialized_default_construct_n, which would initialize the memory and ensure that the returned pointer always points to valid data, in case of non-trivial structs. And even better, for trivial structs uninitialized_default_construct_n will leave the memory uninitialized, which is great for performance. (If this behavior is surprising, you can compare it with how in C/C++ you need to always write char c = 0; and int x = 123;, because int x; by itself will leave x with an undefined value. If this behavior is unwanted, the alternative is std::uninitialized_value_construct_n.)

But there's a problem... TextAttributes is not a trivial struct. Any struct that has a constructor - or even worse, destructor - cannot be trivial anymore. After all, if it has a constructor it surely wants to be initialized to some value. For instance, this struct:

struct Foo {
    int x = 0;
};

is not trivial anymore, because the = 0 forces the addition of an implicit constructor, which initializes the value of x during construction to 0. Now you can safely write Foo foo; in some function and rest assured that reading foo.x will always return some initialized value. But while this is great for the general case, "97%" of all code basically, this is not great for the "3%" case, the code that's the hot path. Because then this added initialization does have a noticeable impact, if the compiler fails to optimize it away. (And in this case, it does.)

Copy link
Member Author

@lhecker lhecker Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leonard, quick question: why is it faster for us to do this (poke at the innards) than to have an operator= or a .resize_to_one_and_reset() or something on rle?

resize_to_one_and_reset would only work for one specific use case: Writing one value. A operator= would have the same limitation unless we have it accept a std::span or similar, but a std::span is not a trivial struct and thus not cost free (especially not on the Windows x64 ABI and especially-especially not with MSVC's still poor ability to remove struct parameter copies after inlining a call). The unsafe_shrink_to_size function on the other hand works with arbitrary size parameter values and can be fully inlined into the caller / optimized away.

Basically, IMO we should either keep code simple or fast. So, if an optimization can't be simple, it better be fast. I'm basically min-maxing. 😄

_attr = { _columnCount, attr };
// Constructing and then moving objects into place isn't free.
// Modifying the existing object is _much_ faster.
*_attr.runs().unsafe_shrink_to_size(1) = til::rle_pair{ attr, _columnCount };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clever

Comment on lines +625 to +627
// This is a very unsafe shortcut to free the buffer and get a direct
// hold to the _buffer. The caller can then fill it with `size` items.
[[nodiscard]] T* unsafe_shrink_to_size(size_t size) noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an excellent question. I would love to have it in a comment.

My best guess is, "it returns a raw pointer to the internal storage of the small_vector"... which literally lets us poke at its innards.

Leonard, quick question: why is it faster for us to do this (poke at the innards) than to have an operator= or a .resize_to_one_and_reset() or something on rle?

Does this introduce a primitive that is used in future PRs?

@carlos-zamora carlos-zamora added the Needs-Second It's a PR that needs another sign-off label Jun 5, 2023
@microsoft-github-policy-service microsoft-github-policy-service bot removed the Needs-Second It's a PR that needs another sign-off label Jun 14, 2023
@DHowett
Copy link
Member

DHowett commented Jun 14, 2023

@lhecker if you want to update the numbers before merging, I'm cool with. otherwise, leave em! :)

@lhecker
Copy link
Member Author

lhecker commented Jun 15, 2023

@carlos-zamora Could you give this another ✅?
Edit: Seems like it wasn't necessary? Weird.

@lhecker lhecker added the AutoMerge Marked for automatic merge by the bot when requirements are met label Jun 15, 2023
@microsoft-github-policy-service microsoft-github-policy-service bot merged commit b8f402f into main Jun 15, 2023
14 of 15 checks passed
@microsoft-github-policy-service microsoft-github-policy-service bot deleted the dev/lhecker/vt-perf1 branch June 15, 2023 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Performance Performance-related issue AutoMerge Marked for automatic merge by the bot when requirements are met Product-Conhost For issues in the Console codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants