Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

track type of native allocations when available #857

Merged
merged 3 commits into from Mar 20, 2019

Conversation

@kayle
Copy link

@kayle kayle commented Jan 17, 2019

This PR demonstrates how to retrieve the type of native allocations as described in #780

This would be useful functionality to have in the TraceEvent nuget package to allow for scenarios like dotnet/BenchmarkDotNet#457

There's a few questions in the PR on where certain functionality belongs.

src/PerfView/PerfViewData.cs Outdated Show resolved Hide resolved
if (allocationSites.TryGetValue(module, out var heapAllocationSites) &&
heapAllocationSites.FirstOrDefault(x => x.Rva == cs.CodeAddress.Address - module.ImageBase)?.TypeName is string name)
{
typeName = name;
Copy link
Author

@kayle kayle Jan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this currently resolves most allocations from STL containers as type "void" because the std::allocator implementation uses void* as the return type. I think the Visual Studio implementation of the feature may special case this to extract the T, but I wanted to keep this simple for a first pass.

src/PerfView/PerfViewData.cs Outdated Show resolved Hide resolved
src/PerfView/PerfViewData.cs Outdated Show resolved Hide resolved
src/PerfView/PerfViewData.cs Outdated Show resolved Hide resolved
src/PerfView/PerfViewData.cs Outdated Show resolved Hide resolved
src/PerfView/PerfViewData.cs Outdated Show resolved Hide resolved
src/PerfView/PerfViewData.cs Outdated Show resolved Hide resolved
src/PerfView/PerfViewData.cs Outdated Show resolved Hide resolved
Copy link
Contributor

@vancem vancem left a comment

I left some proposed changes in the line-by-line comments

@vancem
Copy link
Contributor

@vancem vancem commented Jan 22, 2019

Thanks for following up to make this PR. I like the general ideal, and commented on some of the details.

@codecov-io
Copy link

@codecov-io codecov-io commented Jan 23, 2019

Codecov Report

Merging #857 into master will increase coverage by 0.13%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master     #857      +/-   ##
==========================================
+ Coverage   17.38%   17.51%   +0.13%     
==========================================
  Files         222      225       +3     
  Lines      137987   138989    +1002     
  Branches    12219    12316      +97     
==========================================
+ Hits        23984    24350     +366     
- Misses     113065   113696     +631     
- Partials      938      943       +5
Flag Coverage Δ
#2017 17.51% <0%> (+0.13%) ⬆️
#Debug 17.51% <0%> (+0.13%) ⬆️
Impacted Files Coverage Δ
src/PerfView/PerfViewData.cs 3.85% <0%> (-0.03%) ⬇️
src/TraceEvent/Symbols/NativeSymbolModule.cs 0% <0%> (ø) ⬆️
src/PerfView/StackViewer/StackWindow.xaml.cs 24.8% <0%> (-0.04%) ⬇️
src/MemoryGraph/graph.cs 0% <0%> (ø) ⬆️
src/PerfView/GuiUtilities/WindowBase.cs 83.33% <0%> (ø) ⬆️
src/PerfView/UserCommands.cs 0% <0%> (ø) ⬆️
src/MemoryGraph/MemoryGraph.cs 0% <0%> (ø) ⬆️
src/FastSerialization/StreamReaderWriter.cs 75.19% <0%> (ø) ⬆️
src/FastSerialization/GrowableArray.cs 59.74% <0%> (ø) ⬆️
...c/TraceEvent/Stacks/SpeedScopeStackSourceWriter.cs 60.73% <0%> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eca7c00...61bb21b. Read the comment docs.

@@ -5713,6 +5713,10 @@ protected internal override StackSource OpenStackSourceImpl(string streamName, T
var heapParser = new HeapTraceProviderTraceEventParser(eventSource);
Dictionary<Address, StackSourceSample> lastHeapAllocs = null;

Dictionary<TraceModuleFile, NativeSymbolModule> loadedModules = new Dictionary<TraceModuleFile, NativeSymbolModule>();
var allocationTypeNames = new Dictionary<CallStackIndex, string>();
var symReader = GetSymbolReader(log);
Copy link
Author

@kayle kayle Jan 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the appropriate way to get a SymbolReader? The CacheOnly argument can skip pdbs, which results in not finding any typenames for allocations from those modules. Even if I later load symbols for a module, I don't see a way to refresh the entire stackSource.

if (!loadedModules.TryGetValue(module, out var symbolModule))
{
loadedModules[module] = symbolModule =
(module.PdbSignature != Guid.Empty
Copy link
Author

@kayle kayle Jan 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected there to be a direct call to get a SymbolModule from a TraceModuleFile, but I didn't find one. This is my best guess based on some other areas of the code.

private IDiaDataSource3 m_source;
private IDiaEnumSymbolsByAddr m_symbolsByAddr;
internal readonly IDiaSession m_session;
private readonly SymbolReader m_reader;
Copy link
Author

@kayle kayle Jan 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few style questions for this area:

  1. Is it good to switch to constructors to allow for readonly fields?
  2. Any reason for the bool/data field combination instead of just using Lazy
  3. Should the m_heapAllocationSites initializer be embedded in constructor or extracted to static method?

@kayle
Copy link
Author

@kayle kayle commented Jan 23, 2019

Thanks for the feedback. I left a few more questions as comments in the updated files.

{
if (!allocationTypeNames.TryGetValue(csi, out var typeName))
{
for (var current = csi; current != CallStackIndex.Invalid; current = eventLog.CallStacks.Caller(current))
Copy link
Contributor

@vancem vancem Jan 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reasonable bound we can put on number of frames of the stack crawl? (e.g. are the number of frames below the alloc allways less than say 10?) That would be nice as it cuts down search by potentially a large number (stacks are often 50 or more, and we are searching every unique stack).

Copy link
Author

@kayle kayle Jan 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but unfortunately I think 10 is a bit too small. From my limited testing, 15 frames was the longest chain I saw, so I'm thinking of starting with a limit of 20.

Name
wchar_t

  • wow64!Wow64SystemServiceEx
  • wow64cpu!ServiceNoTurbo
  • wow64cpu!BTCpuSimulate
  • wow64!RunCpuSimulation
    + wow64!Wow64LdrpInitialize
    • ntdll!LdrpInitializeProcess
    • ntdll!??_LdrpInitialize
    • ntdll!LdrpInitialize
    • ntdll!LdrInitializeThunk
      + ntdll!NtTraceEvent
      • ntdll!RtlpLogHeapAllocateEvent
      • ntdll!??RtlpAllocateHeapInternal
      • ntdll!RtlAllocateHeap
      • combase!utInitModuleNameCache

Copy link
Contributor

@vancem vancem Jan 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20 is fine, indeed 25 is fine (that still avoids 1/2 the cost for almost nothing).

@@ -5713,6 +5713,10 @@ protected internal override StackSource OpenStackSourceImpl(string streamName, T
var heapParser = new HeapTraceProviderTraceEventParser(eventSource);
Dictionary<Address, StackSourceSample> lastHeapAllocs = null;

Dictionary<TraceModuleFile, NativeSymbolModule> loadedModules = new Dictionary<TraceModuleFile, NativeSymbolModule>();
Copy link
Contributor

@vancem vancem Jan 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't require this, but you can turn loadedModules into an array of NativeSymbolModule. Every TraceModuleFile has a Index property which is a small integer whose bound is given by TraceModuleFiles.Count. Thus you can turn the dictionary into an array.

Copy link
Author

@kayle kayle Jan 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit awkward to keep track of initialization with an array. Would you use something like Optional[] or (NativeSymbolModule, bool)[]?

@vancem
Copy link
Contributor

@vancem vancem commented Jan 23, 2019

Only a few minor things (mentioned in line comments) left.

@adamsitnik
Copy link
Member

@adamsitnik adamsitnik commented Mar 19, 2019

@vancem what is the status of this PR? I can see that @kayle has pushed some code cleanup commit after your review.

@vancem
Copy link
Contributor

@vancem vancem commented Mar 19, 2019

@adamsitnik - I will take a look later today or tomorrow.

@vancem
Copy link
Contributor

@vancem vancem commented Mar 20, 2019

I looked this over. It still has the big non-pay-for-play issue (basically it will try to download every PDB in the trace (often 100s since it includes every OS dll from every process). Thus the window will not open for typically 10s of seconds (maybe 100s of seconds).

But I took some time to implement the work-around that I suggested in the PR. Thus I will check this in and immediately follow up with my PR to bring back pay-for-play.

@vancem vancem merged commit b5f7fd9 into microsoft:master Mar 20, 2019
2 checks passed
@vancem
Copy link
Contributor

@vancem vancem commented Mar 20, 2019

A big thank you to @kayle for implementing this. It is very nice to have the C++ types available and it was pretty non-trivial to track down the new information in the PDBS that enabled this and then learn enough about PerfView to incorporate it. Thanks!

vancem added a commit to vancem/perfview that referenced this issue Mar 20, 2019
This improves microsoft#857.
In particulare it does not look up every PDB agressively.
@kayle
Copy link
Author

@kayle kayle commented Mar 21, 2019

Thanks @vancem for getting this in and adding the fix to allow refreshing after loading symbols.

@kayle kayle deleted the native-allocation-type-tracking branch Mar 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants