Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage caused by hardware rendering #16983

Closed
wdscxsj opened this issue Apr 1, 2024 · 10 comments
Closed

High memory usage caused by hardware rendering #16983

wdscxsj opened this issue Apr 1, 2024 · 10 comments
Labels
Area-Performance Performance-related issue Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Tag-Fix Doesn't match tag requirements Product-Terminal The new Windows Terminal.
Milestone

Comments

@wdscxsj
Copy link

wdscxsj commented Apr 1, 2024

Windows Terminal version

1.19.10821.0

Windows build number

10.0.22631.3296

Other Software

No response

Steps to reproduce

  1. Turn off "Use software rendering", and close Terminal.
  2. Launch Terminal, with only one tab open. The memory usage of WindowsTerminal.exe is 185 MB.
  3. Increase to 10 tabs. The memory usage increases roughly linearly, with about 90 MB added for each tab. The end result is 1002 MB.
  4. Turn on "Use software rendering" and try again, the results are 98 MB and 124 MB.

Expected Behavior

The hardware rendering option should use much less memory.

Actual Behavior

Already reported. I suspect this issue may be hardware related. It happens on a new ThinkBook 16 Gen 6+ laptop, with an Intel Core Ultra 7 CPU and Intel Arc graphics card. The driver is up to date (v31.0.101.5008). It's not affected by which shell is opened.

On the same machine, no other programs (e.g. Chrome and VSCode) have this issue. It doesn't happens on other machines I've tested, with exactly the same Windows Terminal configuration.

In VMMap, it can be observed that the Private Data takes 9/10 of the committed memory. It contains multiple regions of 65,536 KB marked as Thread Environment Block.

@wdscxsj wdscxsj added Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting labels Apr 1, 2024
Copy link

github-actions bot commented Apr 1, 2024

Hi I'm an AI powered bot that finds similar issues based off the issue title.

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it. Thank you!

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

@lhecker
Copy link
Member

lhecker commented Apr 1, 2024

Can you please install our nightly ("canary") release? You can find it here: https://aka.ms/terminal-canary-installer

Afterwards please take the following steps:

  • Open 10 tabs as you did before
  • Open the settings tab (Ctrl+,) and its "Rendering", then change the Graphics API to Direct3D 11:
    image
    Click save and wait >10s.
  • Then enable the WARP setting
    image
    Click save again and wait another >10s. How did the memory usage change?

If the memory usage drops after the last step and only after the last step, we can already be extremely certain that it's due to your graphics driver.

However, we can debug it further if you'd like. There are two ways to do so:

  • Send us a full memory dump!
    You can do that in Task Manager: https://github.com/microsoft/terminal/wiki/Troubleshooting-Tips#capture-with-task-manager
    I'm not 100% sure whether Task Manager does a full memory dump, but I think it does. Afterwards, you can share it via email with us. You can find my email address in my GitHub profile. The dump file will be very large so you'll have to use some kind of file hoster.
  • Correlate the Thread Environment Block (TEB) with the thread!
    To do so open the TEB section and find one of the IDs:
    image
    Then use your favorite application to inspect threads. Since you're using VMMap, you may also be familiar with Process Explorer. In Process Explorer double click WindowsTerminal.exe, navigate to the Threads tab and find your ID in the TID column:
    image
    Then tell us the Start Address of that TID. A screenshot would be preferable. 🙂

@lhecker lhecker added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Apr 1, 2024
@wdscxsj
Copy link
Author

wdscxsj commented Apr 2, 2024

Thanks for your detailed response! I've tried again with Terminal Canary, and the result is roughly the same as before. With 10 tabs open:

  • Graphics API = Automatic: 1012 MB
  • Graphics API = Direct 3D 11 (after 10s): 1028 MB
  • Use software rendering (WARP) (after 10s): 165 MB

I also suspect it's due to the graphics driver. This Intel Arc graphics card is not yet recognized by the latest GPU-Z...

A full memory dump would be around 100 GB. So I run VMMap as admin, and this is a screenshot of the Total memory:

a

None of the Private Data regions shows a thread ID (otherwise I would notice that yesterday). After 1 hour of waiting and some activities in Terminal Canary, a refreshed view shows each 65,536 KB Private Data regions still has 1 Read/Write.

The top ASLR Image is igc64.dll (65.7 MB of file size) from the graphics driver, the Intel Graphics Shader Compiler for Intel(R) Graphics Accelerator. It's also loaded by dwm, IGCC, Chrome, VSCode, etc.

I guess it's better to stay with WARP until an updated driver brings good luck, right?

@microsoft-github-policy-service microsoft-github-policy-service bot added Needs-Attention The core contributors need to come back around and look at this ASAP. and removed Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something labels Apr 2, 2024
@lhecker
Copy link
Member

lhecker commented Apr 2, 2024

A full memory dump would be around 100 GB.

I only meant a dump of WindowsTerminal.exe. It should only be ~1028MB as you noted. However, I sort of realized that this is not needed anyways. There are better ways to investigate the issue...

If you have Windows Performance Recorder (WPR) installed, you can

  • select the checkbox for "VirtualAlloc usage"
  • unselect all other checkboxes
  • click "Start"
  • launch Windows Terminal
  • click "Save"

It can then be debugged in the Windows Performance Analyzer (WPA). You can find the latter in the app store, which I believe should also install the former. In any case, this will net us something like this:
image

It would tell us exactly where it's coming from. I probably don't have access to the symbols for Intel's drivers, but I know people who do, so I could send it to them. If you want to do such a WPR trace, I'd be happy to check it out!

None of the Private Data regions shows a thread ID (otherwise I would notice that yesterday). After 1 hour of waiting and some activities in Terminal Canary, a refreshed view shows each 65,536 KB Private Data regions still has 1 Read/Write.

Any allocation via VirtualAlloc is labeled with "Thread Environment Block" for whatever reason. Only those with an ID next to them are actual TEBs and refer to stack memory. Since your allocations don't have an ID, they must be VirtualAlloc calls with a 64MiB size. If I had to take a guess, I suspect that Intel's driver is using arena/linear allocators and forgot that you're supposed to MEM_RESERVE the address space and only then gradually MEM_COMMIT it. 😅

However, it's very suspicious that only we're affected and no one else. One thing you could try instead of using WARP is to set the "Graphics API" to "Direct2D" (and with WARP disabled).

@wdscxsj
Copy link
Author

wdscxsj commented Apr 3, 2024

Thank you very much! I've learned a lot again. The download link for a WPR trace file with a full memory dump has been sent to your email.

@lhecker
Copy link
Member

lhecker commented Apr 3, 2024

To explain how WPA is commonly used... When you open it, it'll look something like this:
image

Each tab can contain an arbitrary number of panes. When you click on the graph types on the left, new panes will be added to the current tab. As such, I usually first close all tabs and then open the graph that I want. In this case we want the "Virtual Allocations" graph which is in the "Others" section on the left. This will list all the processes that were recorded:

image

In the table view, everything to the left of the vertical yellow line are columns which group data and everything to right of the yellow line is an aggregate. It's a little bit like working with a database. Basically, everything to the left "maps" / "groups" and everything to right "reduces" / "sums".

Columns in WPA are special however: They can have complex rules and configurations to customize everything to your liking. If you're interested in this, click on the wheel icon at the top of the pane (next to the red "3" marker).

Here you can do a couple things, which I've marked with the red numbers:

  1. By right clicking on the process(es) that you actually want, you get to choose "Filter to Selection", which will remove all the noise.
  2. By right clicking on the headers of the table you can choose which columns to see. Here you can choose "Stack". The "Stack" column is configured by default to be to the left of the yellow line. That way you get a cumulative amount of allocations per stack trace.
  3. The graph button allows you to switch to a "Flame" (-graph) which is IMO the most useful view. If you aren't familiar with flame graphs: Each bar is a function call and it represents your call stack. The width of each bar represents the percentage it makes up compared to the total. For instance if your app allocates 1GB of memory and 1 function allocates 100MB, then its bar will be exactly 10% wide.
  4. Here you can change the view to hide the table and only show the graph. There's a "maximize" button which you can click to resize the pane to fit the tab size.

To get function names you have to load symbols. Unfortunately, even if you use a "Filter" it'll load symbols for all applications by default. This takes a long time. So what you can do is add a filter for symbol loading:

image

At the end it'll look something like this:
image

@lhecker
Copy link
Member

lhecker commented Apr 3, 2024

For some reason my WinDbg can't search any heaps anymore, but I need that because otherwise I can't find the addresses of the AtlasEngine instances in the memory. So, I'll have to unfortunately respond later when it comes to the dump.

However, given the stack trace in the WPR I believe it's likely that the driver allocates a 64MiB ring buffer for uploading Direct3D resources that have D3D11_CPU_ACCESS_WRITE.

In any case, I believe this may be another indication why we need #15186 much more urgently than it may seem.

@wdscxsj
Copy link
Author

wdscxsj commented Apr 3, 2024

Thanks a lot for your help and detailed explanation. Now I have my stack view and flame graph, too!

The laptop is using the latest graphics driver from the OEM (Lenovo), but there is a newer version from Intel released on March 27. After a public holiday leave of 3 days, I can try my luck again with an update.

@wdscxsj
Copy link
Author

wdscxsj commented Apr 7, 2024

I'm glad to report that this issue is solved by the latest Intel graphics driver (31.0.101.5382), updated from the OEM-provided driver 31.0.101.5008. It works in both the latest release (1.21.921.0) of Terminal and the Canary.

On the same laptop, Canary with 10 tabs uses about 380 MB (Automatic or Direct3D 11). Software rendering uses a much lower 100 MB, but it's quite acceptable. Roughly the same numbers for the latest release with the AtlasEngine.

Frankly I didn't expect the new driver to work so well, since its release note doesn't mention a word about this issue. There are multiple releases in between, so it must be one of them that comes up with the fix.

Here is how a 10-tab Canary process now looks in VMMap. The 64 MB regions of 1 Read/Write are gone, replaced by roughly 32 MB for each tab with some activities.

Screenshot338

Huge thanks to @lhecker again. Your help and guidance are truly invaluable!

@carlos-zamora
Copy link
Member

Thank you so much for following up! We'll close this and keep it around for anybody that asks this questions. 😊

@carlos-zamora carlos-zamora closed this as not planned Won't fix, can't repro, duplicate, stale Apr 10, 2024
@carlos-zamora carlos-zamora added this to the Backlog milestone Apr 10, 2024
@carlos-zamora carlos-zamora added Area-Performance Performance-related issue Product-Terminal The new Windows Terminal. and removed Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Attention The core contributors need to come back around and look at this ASAP. labels Apr 10, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot added the Needs-Tag-Fix Doesn't match tag requirements label Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Performance Performance-related issue Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Tag-Fix Doesn't match tag requirements Product-Terminal The new Windows Terminal.
Projects
None yet
Development

No branches or pull requests

3 participants