Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TraceLog streaming/in-memory EventPipe support #1867

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

vaind
Copy link

@vaind vaind commented May 4, 2023

I'm trying to make changes to enable TraceLog use without creating an ETLX file from an EventPipe (nettrace). I've taken the existing ETW streaming support as a baseline and tried to adapt that. However, I'm having trouble getting the newly added tests to pass because the events coming from TraceLog don't match what I'd get from EventPipeEventSource (the previous test case, Streaming in the same file).

@brianrob maybe if you could have a look at my changes, you could spot an issue what I'm doing wrong (or not doing and I should be) to get this working properly? Or is there some bigger problem I'm missing completely that prevents this from happening?

The use-case I'm going after is a continuously running sampling profiler that only samples as needed (because actually starting the profiler takes a lot of time in the CLR so I want to do it only once).

This is a follow up, though not strictly an implementation of, #1829

@vaind
Copy link
Author

vaind commented May 8, 2023

@microsoft-github-policy-service agree

@brianrob
Copy link
Member

brianrob commented May 8, 2023

@vaind, thanks for working on this. I think at a high level this should work. A couple of thoughts:

  1. It would be worthwhile to architect this similiar to how we do with TraceEventSession for live sessions. This allows the TraceLog to know that it is being used in a live mode instead of a file-based mode. There are some differences in how one can use the TraceLog when its live, because it hasn't seen the "end" of the trace, unlike in the file-based mode.
  2. Can you share what differences you're seeing? It's possible that it's just some piece of data that hasn't made it over to the TraceLog.

@vaind
Copy link
Author

vaind commented May 9, 2023

@vaind, thanks for working on this. I think at a high level this should work. A couple of thoughts:

Thanks for getting back to me, it indeed seems to work from my testing in the Sentry profiling SDK, with some quirks so far (I didn't get stack trace method resolution working yet, but from my understanding it should work because all the info has already come from the runtime). Also the ActivityComputer (and others) don't seem to work when streaming at the moment so that would also need some tuning I guess. I was able to get around them for now but I think they're useful so I'll likely get back to them in the future.

  1. It would be worthwhile to architect this similiar to how we do with TraceEventSession for live sessions. This allows the TraceLog to know that it is being used in a live mode instead of a file-based mode. There are some differences in how one can use the TraceLog when its live, because it hasn't seen the "end" of the trace, unlike in the file-based mode.

Yes, I've made the change by reusing the same code, really, with a minor change of setting up the kernelSession in the TraceEventSession factory instead of the common constructor.

  1. Can you share what differences you're seeing? It's possible that it's just some piece of data that hasn't made it over to the TraceLog.

e.g the following line is missing from the captured output but is present in the eventpipe-dotnetcore2.1-linux-x64-objver3.netperf.baseline.txt:

System.Threading.Tasks.TplEventSource/SetActivityId, 1, <Event MSec=   "111.4517" PID="80749" PName="Process(80749)" TID="80749" EventName="SetActivityId" ProviderName="System.Threading.Tasks.TplEventSource" NewId="10000000-0000-0000-0000-000000000001"/>

Also, all the custom events are missing, as present in eventpipe-dotnetcore2.1-linux-x64-tracelogging.netperf.baseline.txt

I'm pretty sure it's a test-code issue, because as I've mentioned earlier, the overall approach seems to work in general, based on my changes to the profiling code in the Sentry SDK which now starts the profiler early in the process and slices profiles when actually needed (drops the other samples if not currently needed). I think this functionality will need better tests here, instead of trying to reuse what was already present in EventPipeParsing/Streaming.

@vaind
Copy link
Author

vaind commented May 11, 2023

So the issue I'm having with stacktrace info not being available is due to missing modules & method info because the rundown on a session only happens when a session is stopped, while I'm trying to access this info while the sample-profiler is running (on each sample). I've raised an inquiry for this issue: dotnet/runtime#86103

I was wondering whether I could start & stop a session, in the beginning, to force the rundown to happen and then use these events to feed TraceLog. Afterwards, the runtime information would rely on the runtime provider to get updates. @brianrob do you think this could work, conceptually or do you see any issues that would need to be overcome (or prevent this completely)?

@vaind vaind force-pushed the feat/eventpipe-tracelog-streaming branch from 3140653 to 031ce2d Compare May 19, 2023 17:57
@vaind
Copy link
Author

vaind commented May 19, 2023

I was wondering whether I could start & stop a session, in the beginning, to force the rundown to happen and then use these events to feed TraceLog. Afterwards, the runtime information would rely on the runtime provider to get updates. @brianrob do you think this could work, conceptually or do you see any issues that would need to be overcome (or prevent this completely)?

So I've verified this works fine, I'll just need to add test cases to this repo so that it's covered here.

@vaind
Copy link
Author

vaind commented May 24, 2023

@brianrob Any chance you could have a first look at TraceLog.cs changes if they seem reasonable? If so, I'd update tests to cover the changes & mark this PR as Ready for review

@brianrob
Copy link
Member

@vaind, apologies for not getting to this sooner. I am reviewing your code.

@brianrob
Copy link
Member

@vaind, the concept here looks good. One thing I think we will need to resolve before this is ready for review is the rundown design. Your instinct to trigger a rundown at the beginning of the trace is the right one. I was going to recommend that you use the StartRundown keyword to do this, but it doesn't look like this is plumbed through the runtime, so it wouldn't have any impact. If I recall correctly, EventPipe has a relatively hardcoded rundown setup, in which case we may not have many options here.

@davmason, do you know if rundown is configurable at all? If I recall correctly, end rundown is hardcoded at the end of the session. Is that right? I'm trying to figure out the best option for @vaind to trigger a start rundown at the beginning of streaming so that TraceLog gets method rundown information and can resolve stacks.

@vaind, once this is resolved, then I think it will make sense to review this more. I saw some diffs that I want to examine further, but I'm not on a great network connection today and so I don't have access to a quality diff tool.

@vaind
Copy link
Author

vaind commented May 25, 2023

@vaind, the concept here looks good. One thing I think we will need to resolve before this is ready for review is the rundown design. Your instinct to trigger a rundown at the beginning of the trace is the right one. I was going to recommend that you use the StartRundown keyword to do this, but it doesn't look like this is plumbed through the runtime, so it wouldn't have any impact. If I recall correctly, EventPipe has a relatively hardcoded rundown setup, in which case we may not have many options here.

Thanks for getting back to me @brianrob. And yes, there doesn't seem a way to trigger a rundown on the same session. I've discussed that in the runtime issue, I've created two weeks ago: dotnet/runtime#86103

@vaind vaind force-pushed the feat/eventpipe-tracelog-streaming branch from 8a3ac13 to 0aeb827 Compare May 30, 2023 20:19
@davmason
Copy link
Member

@davmason, do you know if rundown is configurable at all? If I recall correctly, end rundown is hardcoded at the end of the session. Is that right? I'm trying to figure out the best option for @vaind to trigger a start rundown at the beginning of streaming so that TraceLog gets method rundown information and can resolve stacks.

Rundown is not currently configurable with EventPipe. You are remembering correctly, it is hardcoded to run at the end of the session. The internal code has the option to configure whether to do rundown or not, but we don't expose it over the IPC commands so customers can't turn it off or on. It would be fairly easy to expose the option to have rundown or not, and I don't think it would be that much work to have it at the start of a trace. I think we could reuse all the existing code

@brianrob
Copy link
Member

@davmason, do you know if rundown is configurable at all? If I recall correctly, end rundown is hardcoded at the end of the session. Is that right? I'm trying to figure out the best option for @vaind to trigger a start rundown at the beginning of streaming so that TraceLog gets method rundown information and can resolve stacks.

Rundown is not currently configurable with EventPipe. You are remembering correctly, it is hardcoded to run at the end of the session. The internal code has the option to configure whether to do rundown or not, but we don't expose it over the IPC commands so customers can't turn it off or on. It would be fairly easy to expose the option to have rundown or not, and I don't think it would be that much work to have it at the start of a trace. I think we could reuse all the existing code

Thanks @davmason. Sounds like we should keep dotnet/runtime#86103 open to track making it possible to configure rundown via IPC, but for now, hacking things a bit here to use EndRundown sounds fine to me.

@vaind, once you're ready, go ahead and mark this as ready for review and I'll take a more thorough pass on it.

@vaind vaind marked this pull request as ready for review May 31, 2023 15:20
@vaind
Copy link
Author

vaind commented May 31, 2023

@vaind, once you're ready, go ahead and mark this as ready for review and I'll take a more thorough pass on it.

I've done so. It does need tests but I want to make sure the approach and APIs are OK-ish before trying to devise tests for them.

@vaind
Copy link
Author

vaind commented May 31, 2023

P.S. I've tested it as a consumer in Sentry .NET SDK. What I meant by needing tests is that it needs tests in this repo.

Copy link
Member

@brianrob brianrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience. Here's a first cut at the review. I think we may have a few iterations, but overall, this is looking good.

src/TraceEvent/TraceEvent.csproj Outdated Show resolved Hide resolved
src/TraceEvent/TraceLog.cs Outdated Show resolved Hide resolved
src/TraceEvent/TraceLog.cs Outdated Show resolved Hide resolved
src/TraceEvent/TraceLog.cs Outdated Show resolved Hide resolved
src/TraceEvent/TraceLog.cs Show resolved Hide resolved
src/TraceEvent/TraceLog.cs Show resolved Hide resolved
src/TraceEvent/TraceLog.cs Show resolved Hide resolved
src/TraceEvent/TraceLog.cs Outdated Show resolved Hide resolved
Copy link
Author

@vaind vaind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made some changes as requested and replied to some questions.

Keeping the testing open until the questions are resolved.

src/TraceEvent/TraceEvent.csproj Outdated Show resolved Hide resolved
src/TraceEvent/TraceLog.cs Show resolved Hide resolved
src/TraceEvent/TraceLog.cs Outdated Show resolved Hide resolved
src/TraceEvent/TraceLog.cs Outdated Show resolved Hide resolved
src/TraceEvent/TraceLog.cs Outdated Show resolved Hide resolved
@vaind
Copy link
Author

vaind commented Jul 6, 2023

I've made some changes as requested and replied to some questions.

Keeping the testing open until the questions are resolved.

@brianrob Have you been able to finish reviewing this? Any more concerns or do you think it would be acceptable after adding tests for the changes/new APIs? Also, do you have any preferences for the testing approach?

@bruno-garcia
Copy link

@brianrob this would really help unblock some work we want to do for .NET here at Sentry. Another round of review much appreciated

@vaind vaind force-pushed the feat/eventpipe-tracelog-streaming branch from 4433321 to 53024ea Compare August 9, 2023 16:41
@vaind
Copy link
Author

vaind commented Aug 9, 2023

FYI, I am in the middle of reviewing this, but need to stop right now. Things are looking good though - I need to look at the test changes and the CI failures.

I've tried a couple of things to get this to work but am always hitting a roadblock.
Compilation is fixed by switching to .NET 6.0 in the UseDotNet task - it still seems to compile for net462 just fine. Not sure what the task does exactly as I don't have experience with Azure pipelines.
However, tests then start to fail (test runner crashes). I've figured out, this is caused by WPF tests in StackWindowTests.cs. Since I can't find a clean solution without drawbacks, I think it will need some input from you which way you'd want to go from here.

@brianrob
Copy link
Member

FYI I am doing some experiments to see if I can help unblock the tests: #1894.

@brianrob
Copy link
Member

In my testing over in #1894 I discovered that running each of the test assemblies separately works. It seems that the combination of all of the test runs is an issue. I'm wondering if the mix of .NET Framework and .NET Core is the issue - I don't have a clear picture of exactly how that's happening on the test side. I wonder if we need to find a way to do an explicit .NET Core test run and an explicit .NET Framework test run, and choose the specific assemblies that should run for each. This would allow us to avoid targeting the WPF tests at .NET Core, and would allow us to make sure that we're not attempting to run .NET Framework code against .NET Core and vice versa. Thoughts?

@vaind
Copy link
Author

vaind commented Aug 11, 2023

In my testing over in #1894 I discovered that running each of the test assemblies separately works. It seems that the combination of all of the test runs is an issue. I'm wondering if the mix of .NET Framework and .NET Core is the issue - I don't have a clear picture of exactly how that's happening on the test side. I wonder if we need to find a way to do an explicit .NET Core test run and an explicit .NET Framework test run, and choose the specific assemblies that should run for each. This would allow us to avoid targeting the WPF tests at .NET Core, and would allow us to make sure that we're not attempting to run .NET Framework code against .NET Core and vice versa. Thoughts?

In that case, how about merging this PR first and then there's not so much duplication going on: #1896

@bruno-garcia
Copy link

@brianrob @vaind wondering if there's any update on this? Thanks

@ericsampson
Copy link

bump?

@vaind
Copy link
Author

vaind commented Mar 20, 2024

Hi @brianrob, I've updated this PR with the latest changes from the main branch and all the CI jobs pass. Is there anything I should do to so that these changes can eventually land?

Copy link
Member

@brianrob brianrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vaind thank you for your patience on this and for all of the work that you've done thus far. I think we're getting close. I've added a few comments here, but I don't think any are big.

I see the following as the path to merge:

  1. @noahfalk, can I get you to give this a review please? This is for livesession support for EventPipe.
  2. I would like to validate that we don't regress live session support for ETW traces. Have you done any testing in this space? I know we don't have tests here in the repo, but presumably, we can do some adhoc testing, similar to https://github.com/brianrob/examples/tree/main/src/realtime-session-with-stacks.

Assuming we can resolve this, I think we'll be ready to merge (or at least know what we need to do).

src/PerfView.Tests/StackViewer/StackWindowTests.cs Outdated Show resolved Hide resolved
src/TraceEvent/TraceLog.cs Show resolved Hide resolved
src/TraceEvent/TraceLog.cs Outdated Show resolved Hide resolved
noahfalk
noahfalk previously approved these changes Apr 2, 2024
Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looked fine to me, but I warn my knowledge of the TraceLog portion of TraceEvent is limited so I wouldn't read too much into it. If there were anything subtly wrong I don't expect I would catch it without spending a while researching the internals of TraceLog.

@brianrob
Copy link
Member

brianrob commented Apr 3, 2024

This looked fine to me, but I warn my knowledge of the TraceLog portion of TraceEvent is limited so I wouldn't read too much into it. If there were anything subtly wrong I don't expect I would catch it without spending a while researching the internals of TraceLog.

Thanks @noahfalk. No worries - I just wanted to get another set of eyes on this, and also to make you aware of the functionality.

@bruno-garcia
Copy link

wooo thanks @noahfalk !

Getting there

@vaind vaind force-pushed the feat/eventpipe-tracelog-streaming branch from 418dd93 to 5153769 Compare May 23, 2024 10:47
@vaind
Copy link
Author

vaind commented May 23, 2024

@brianrob thanks for the review, somehow I've missed it until now.

I've addressed your requests and updated the branch. Let me know if there's anything else needed.

@vaind
Copy link
Author

vaind commented May 23, 2024

P. S. besides review changes, there was one more fix 36d2e2c (#1867) - these structures would grow indefinitely otherwise (getsentry/sentry-dotnet#3375). Similar is done for the other realtime session (when queue is used), but in that scenario it keeps a buffer of events - here, we don't need it because events are processed one by one as they come so after each event, all the callbacks have already been called.

@bruno-garcia
Copy link

This PR had its 1 year anniversary 😅

It's got all the review points addressed. Could we please get some help pushing this through 🙏

@ericsampson
Copy link

@bruno-garcia too bad that Elon tanked Twitter 😝

Copy link
Member

@brianrob brianrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. One oustanding issue and then I think we're about ready to merge.

@@ -51,6 +51,7 @@
</PropertyGroup>

<ItemGroup>
<PackageReference Include="Microsoft.Diagnostics.NETCore.Client" Version="$(MicrosoftDiagnosticsNETCoreClientVersion)" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also require a corresponding addition to <NuspecProperties> above, as well as an entry in the dependencies section of the nuspec file. This is what ensures that NuGet restores the dependency.

@brianrob
Copy link
Member

This PR had its 1 year anniversary 😅

It's got all the review points addressed. Could we please get some help pushing this through 🙏

Yup! Happy Anniversary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants