Performance Tracing
I have tried several options to trace the execution in order to visualize performance. The process has 2 separate steps that can be run in different environments.
- Collect a trace while running the game. See How to capture an ETW trace for VR performance analysis
- Analyze in one of the ETL visualization tools
XPerf is a highly configurable command line tool to launch captures, usually invoked with batch files (for example, log.cmd packaged with GPUView or Oculus ovrlog.cmd).
WPR is a GUI version of Xperf, using XML profiles instead of batch files (for example, log.wprp packaged with GPUView).
https://developer.oculus.com/documentation/native/pc/dg-performance-tutorial/
- Tool from oculus based on XPerf
- The launching script includes specific provider GUIDs to capture their events. It can be modified to capture also OpenVR/SteamVR events (just adding one line like in SteamVR instructions to install GPUView).
- The launching script needs to be fixed to run in Spanish language versions of Windows, or it will fail to detect the available memory.
- It includes VR vsync events that can be displayed later in GPUView (see the end of the tutorial linked above).
https://github.com/google/UIforETW
- Opensource GUI for XPerf, to ease the capturing of traces. It has some advantages over WPR but it's not adapted to get GPUView captures.
https://graphics.stanford.edu/~mdfisher/GPUView.html https://docs.microsoft.com/en-us/windows-hardware/drivers/display/using-gpuview
- Microsoft tool to visualize graphics-related events captured with on XPerf.
- SteamVR Install instructions include specific provider GUID for VR events generated by SteamVR
- CLI script (log.cmd) to launch and stop tracing. Must be run in CMD.exe (not PowerShell!)
- The launching script needs to be fixed to run in Spanish language versions of Windows, or it will fail to detect the available memory.
- Generates a Merged.etl file, that can be visualized with standard Windows Performance Analyzer tool for statistics.
- GPUView GUI tool to analyze the trace frame by frame, call by call.
- VSync signal (enabled with F8) is the one from the monitor. To display the HMD vSync you need to capture a specific provider and highlight its events (see the "How to analyze a VR trace in GPUView" section).
Mostly for statistical analysis of CPU usage: what functions the program spends most time on.
https://randomascii.wordpress.com/2012/06/19/wpaxperf-trace-analysis-reimagined/
https://docs.nvidia.com/nsight-systems/tracing/
- You need a (free) NVIDIA developer account to download it
- GUI based capture and configuration
- Proprietary storage format, can only be visualized with NSIGHTS, and exported into some other formats (SQLite, JSON...), but not .etl (the standard used by Windows tools).
https://software.intel.com/content/www/us/en/develop/tools/graphics-performance-analyzers.html
All the tools are based on Event Tracing for Windows, to be able to register events with a very low impact on performance. They can also take a snapshot the stack by periodically checking what function is running and collect the stack trace.
The default sampling rate is 1kHz, that is, 1 sample every 1ms.
If more resolution is needed to analyze individual frames (ideally lasting up to 11.1ms), it can be configured up to 8KHz, but it can impact performance.
- UIforETW: enable "Fast sampling" in the main window.
- XPerf: run
Xperf -setprofint 1221
(or add it to the launch script) - WPR: run
wpr -setprofint 1221
as Administrator before launching WPR. - NVIDIA Nsight: configured graphically in the profile project.
To be able to identify the functions that are being executed, you need to have the PDB file available for the executable modules (EXE, DLL). Since we don't have one or XwingAlliance.exe, we can only see the functions happening inside ddraw.dll, and optionally the Hook_*.dll files if we have compiled them locally.
It's possible to confgure the symbols in WPA and GPUView. I have failed to load them in NVIDIA Nsight Systems and that is why that tool ended up being quite useless in my case.
It should be possible to reuse the IDA reverse engineered database transformed into a PDB
https://github.com/Mixaill/FakePDB
-
Trace->Configure symbol paths
.- Add the paths to the target compilation of ddraw.dll and any hooks you may have.
- (Optional) Add the path to DirectXTK build folder to see what D3D calls take longer.
-
Trace->Load Symbols
. It will take some time.
Options->Symbol Path...
- The symbol paths are set in the capture configuration
Sample target process->Collect call stacks of executing threads->Symbol locations
Collect thread activity->Collect call stacks of blocked threads->Symbol locations
https://developer.nvidia.com/nvidia-driver-symbol-server
It's possible to instrument the code with markers to group different actions in a hierarchy. For D3D11 you can use ID3DUserDefinedAnnotation()
This can be later used in some visualization tools:
- https://docs.microsoft.com/en-us/visualstudio/profiling/gpu-usage?view=vs-2019
- https://docs.nvidia.com/nsight-graphics/2018.4/content/nsight_graphics/performance_markers_d3d11.htm
https://docs.microsoft.com/en-us/windows/win32/tracelogging/trace-logging-portal
https://knarkowicz.wordpress.com/2013/05/25/simple-gpuview-custom-event-markers/
From ovrlog_win10.cmd:
- TRACE_OVR_USBVID=091292F9-4F6C-47E1-B483-35D399D45C4C
- TRACE_OVR_LIB=553787FC-D3D7-4F5E-ACB2-1597C7209B3C
- TRACE_OVR_UE4_LATELATCHING=B3E9FB28-DD14-477C-8FEC-24FE806D32CF
- TRACE_NV_DIRECT_MODE=9FC6A966-F8CE-4488-9438-38A247ADEE3C
- TRACE_AMD_DIRECT_MODE=33AEC352-AA8D-4905-B5AE-DBFF3B5F369D
From valve log.cmd for SteamVR (https://pastebin.com/T0nbnPvK):
- TRACE_VR=8c8f13b1-60eb-4b6a-a433-de86104115ac
- TRACE_D3D11=db6f6ddb-ac77-4e88-8253-819df9bbf140:0xffffffffffffffff:6:'stack'
From my own PC (logman query providers
):
- NVFT-ETW-OPENVR {B37F4CA5-5507-42CF-B8C7-BABE280601D2}
From Mattifestation's big list of ETW providers:
- Microsoft.Windows.Holographic.MixedRealityMode {60d6e217-d25b-504f-83d5-c2deb6a854e5}
From PresentMon
- Microsoft.Windows.Analog.SpectrumContinuous {356e1338-04ad-420e-8b8a-a2eb678541cf}
https://ikrima.dev/ue4guide/performance-optimization/gpu-perf-optimization/gpuview/ https://gist.github.com/pixelmager/7e5bb79f106287d7d353e14a7d0cfa47