Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Transient Allocations during Bulk Text Output #8617

Merged
7 commits merged into from Jan 5, 2021
Merged

Conversation

miniksa
Copy link
Member

@miniksa miniksa commented Dec 18, 2020

Make a few changes to memory usage throughout the application to reduce transient allocations from the big.txt test from ~213,000 to ~53,000.

PR Checklist

Detailed Description of the Pull Request / Additional comments

Transient allocations are those that are new'd, used, then delete'd. Going back and forth to the system allocator for things we're just going to throw away or use rapidly again is a performance detriment. Not only is it a bunch of time to go ask the system with a syscall, it also hits a whole bunch of locks on the allocators. This PR identifies a few places where we were accidentally allocating and didn't mean to or were allocating and freeing just to turn around and allocate again. I chose other strategies to avoid this.

Validation Steps Performed

  • Ran big.txt sample (~6MB file) before and after. Observed heap allocations with WPR.

…ace that it is a counted string. (Could also just make sure both strings fit within small string optimization so no allocs... but this is better.)
…n there are no links found. Update caller to handle appropriately.
@miniksa
Copy link
Member Author

miniksa commented Dec 18, 2020

Before:
image

After:
image

(Note that the ConsoleIoThread is pretty much gone from Transient allocations. The Render thread has bulked up in how many it has done... probably just a matter of circumstance changing the timing behavior between it and the IO thread.)

src/server/ApiMessage.h Outdated Show resolved Hide resolved
@miniksa miniksa added Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) Area-Performance Performance-related issue Area-Server Down in the muck of API call servicing, interprocess communication, eventing, etc. Product-Conhost For issues in the Console codebase Product-Conpty For console issues specifically related to conpty Issue-Bug It either shouldn't be doing this or needs an investigation. labels Dec 19, 2020
src/buffer/out/AttrRow.cpp Outdated Show resolved Hide resolved
Co-authored-by: Carlos Zamora <carlos.zamora@microsoft.com>
@miniksa miniksa added the AutoMerge Marked for automatic merge by the bot when requirements are met label Jan 5, 2021
@ghost
Copy link

ghost commented Jan 5, 2021

Hello @miniksa!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit f087d03 into main Jan 5, 2021
@ghost ghost deleted the dev/miniksa/zipzoom branch January 5, 2021 18:06
@ghost
Copy link

ghost commented Jan 28, 2021

🎉Windows Terminal Preview v1.6.10272.0 has been released which incorporates this pull request.:tada:

Handy links:

mpela81 pushed a commit to mpela81/terminal that referenced this pull request Jan 28, 2021
Make a few changes to memory usage throughout the application to reduce transient allocations from the `big.txt` test from ~213,000 to ~53,000.

## PR Checklist
* [x] Supports microsoft#3075
* [x] I work here.
* [x] Tested manually and WPR'd. Test suite should still pass.
* [x] Am core contributor

## Detailed Description of the Pull Request / Additional comments

Transient allocations are those that are new'd, used, then delete'd. Going back and forth to the system allocator for things we're just going to throw away or use rapidly again is a performance detriment. Not only is it a bunch of time to go ask the system with a syscall, it also hits a whole bunch of locks on the allocators. This PR identifies a few places where we were accidentally allocating and didn't mean to or were allocating and freeing just to turn around and allocate again. I chose other strategies to avoid this.

## Validation Steps Performed
- Ran `big.txt` sample (~6MB file) before and after. Observed heap allocations with WPR.
ghost pushed a commit that referenced this pull request Feb 16, 2021
…nd bitmap runs and utf8 conversions (#8621)

## References
* See also #8617 

## PR Checklist
* [x] Supports #3075
* [x] I work here.
* [x] Manual test.

## Detailed Description of the Pull Request / Additional comments

### Window Title Generation
Every time the renderer checks the title, it's doing two bad things that
I've fixed:
1. It's assembling the prefix to the full title doing a concatenation.
   No one ever gets just the prefix ever after it is set besides the
   concat. So instead of storing prefix and the title, I store the
   assembled prefix + title and the bare title.
2. A copy must be made because it was returning `std::wstring` instead
   of `std::wstring&`. Now it returns the ref.

### Dirty Area Return
Every time the renderer checks the dirty area, which is sometimes
multiple times per pass (regular text printing, again for selection,
etc.), a vector is created off the heap to return the rectangles. The
consumers only ever iterate this data. Now we return a span over a
rectangle or rectangles that the engine must store itself.
1. For some renderers, it's always a constant 1 element. They update
   that 1 element when dirty is queried and return it in the span with a
   span size of 1.
2. For other renderers with more complex behavior, they're already
   holding a cached vector of rectangles. Now it's effectively giving
   out the ref to those in the span for iteration.

### Bitmap Runs
The `til::bitmap` used a `std::optional<std::vector<til::rectangle>>`
inside itself to cache its runs and would clear the optional when the
runs became invalidated. Unfortunately doing `.reset()` to clear the
optional will destroy the underlying vector and have it release its
memory. We know it's about to get reallocated again, so we're just going
to make it a `std::pmr::vector` and give it a memory pool. 

The alternative solution here was to use a `bool` and
`std::vector<til::rectangle>` and just flag when the vector was invalid,
but that was honestly more code changes and I love excuses to try out
PMR now.

Also, instead of returning the ref to the vector... I'm just returning a
span now. Everyone just iterates it anyway, may as well not share the
implementation detail.

### UTF-8 conversions
When testing with Terminal and looking at the `conhost.exe`'s PTY
renderer, it spends a TON of allocation time on converting all the
UTF-16 stuff inside to UTF-8 before it sends it out the PTY. This was
because `ConvertToA` was allocating a string inside itself and returning
it just to have it freed after printing and looping back around again...
as a PTY does.

The change here is to use `til::u16u8` that accepts a buffer out
parameter so the caller can just hold onto it.

## Validation Steps Performed
- [x] `big.txt` in conhost.exe (GDI renderer)
- [x] `big.txt` in Terminal (DX, PTY renderer)
- [x] Ensure WDDM and BGFX build under Razzle with this change.
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) Area-Performance Performance-related issue Area-Server Down in the muck of API call servicing, interprocess communication, eventing, etc. AutoMerge Marked for automatic merge by the bot when requirements are met Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Conhost For issues in the Console codebase Product-Conpty For console issues specifically related to conpty
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants