Skip to content

Performance Tracing

Marcos Orallo edited this page Jul 2, 2023 · 17 revisions

I have tried several options to trace the execution in order to visualize performance. The process has 2 separate steps that can be run in different environments.

  1. Collect a trace while running the game. See How to capture an ETW trace for VR performance analysis
  2. Analyze in one of the ETL visualization tools

Tools

Trace capture

XPerf and Windows Performance Recorder

XPerf is a highly configurable command line tool to launch captures, usually invoked with batch files (for example, log.cmd packaged with GPUView or Oculus ovrlog.cmd).

WPR is a GUI version of Xperf, using XML profiles instead of batch files (for example, log.wprp packaged with GPUView).

ovrlog

https://developer.oculus.com/documentation/native/pc/dg-performance-tutorial/

  • Tool from oculus based on XPerf
  • The launching script includes specific provider GUIDs to capture their events. It can be modified to capture also OpenVR/SteamVR events (just adding one line like in SteamVR instructions to install GPUView).
  • The launching script needs to be fixed to run in Spanish language versions of Windows, or it will fail to detect the available memory.
  • It includes VR vsync events that can be displayed later in GPUView (see the end of the tutorial linked above).

UIforETW

https://github.com/google/UIforETW

Analysis and visualization

GPUView

https://graphics.stanford.edu/~mdfisher/GPUView.html https://docs.microsoft.com/en-us/windows-hardware/drivers/display/using-gpuview

  • Microsoft tool to visualize graphics-related events captured with on XPerf.
  • SteamVR Install instructions include specific provider GUID for VR events generated by SteamVR
  • CLI script (log.cmd) to launch and stop tracing. Must be run in CMD.exe (not PowerShell!)
  • The launching script needs to be fixed to run in Spanish language versions of Windows, or it will fail to detect the available memory.
  • Generates a Merged.etl file, that can be visualized with standard Windows Performance Analyzer tool for statistics.
  • GPUView GUI tool to analyze the trace frame by frame, call by call.
  • VSync signal (enabled with F8) is the one from the monitor. To display the HMD vSync you need to capture a specific provider and highlight its events (see the "How to analyze a VR trace in GPUView" section).

Windows Performance Analyzer

Mostly for statistical analysis of CPU usage: what functions the program spends most time on.

https://randomascii.wordpress.com/2012/06/19/wpaxperf-trace-analysis-reimagined/

NVIDIA Nsight Systems

https://docs.nvidia.com/nsight-systems/tracing/

  • You need a (free) NVIDIA developer account to download it
  • GUI based capture and configuration
  • Proprietary storage format, can only be visualized with NSIGHTS, and exported into some other formats (SQLite, JSON...), but not .etl (the standard used by Windows tools).

Intel Graphics Performance Analyzers

See https://github.com/morallo/xwa_ddraw_d3d11/wiki/Debugging-XWA#frame-analysis-with-intel-graphics-performance-analyzer

https://software.intel.com/content/www/us/en/develop/tools/graphics-performance-analyzers.html

https://software.intel.com/content/www/us/en/develop/documentation/gpa-getting-started/top/get-started-with-intel-gpa-for-windows-host.html

Sampling rate

All the tools are based on Event Tracing for Windows, to be able to register events with a very low impact on performance. They can also take a snapshot the stack by periodically checking what function is running and collect the stack trace.

The default sampling rate is 1kHz, that is, 1 sample every 1ms.

If more resolution is needed to analyze individual frames (ideally lasting up to 11.1ms), it can be configured up to 8KHz, but it can impact performance.

  • UIforETW: enable "Fast sampling" in the main window.
  • XPerf: run Xperf -setprofint 1221 (or add it to the launch script)
  • WPR: run wpr -setprofint 1221 as Administrator before launching WPR.
  • NVIDIA Nsight: configured graphically in the profile project.

Debugging Symbols

To be able to identify the functions that are being executed, you need to have the PDB file available for the executable modules (EXE, DLL). Since we don't have one or XwingAlliance.exe, we can only see the functions happening inside ddraw.dll, and optionally the Hook_*.dll files if we have compiled them locally.

It's possible to confgure the symbols in WPA and GPUView. I have failed to load them in NVIDIA Nsight Systems and that is why that tool ended up being quite useless in my case.

Symbols from xwingalliance.exe

It should be possible to reuse the IDA reverse engineered database transformed into a PDB

https://github.com/Mixaill/FakePDB

Symbols in WPA

  1. Trace->Configure symbol paths.
    • Add the paths to the target compilation of ddraw.dll and any hooks you may have.
    • (Optional) Add the path to DirectXTK build folder to see what D3D calls take longer.
  2. Trace->Load Symbols. It will take some time.

Symbols in GPUView

  • Options->Symbol Path...

Symbols in NVIDIA Nsight Systems

  • The symbol paths are set in the capture configuration
  • Sample target process->Collect call stacks of executing threads->Symbol locations
  • Collect thread activity->Collect call stacks of blocked threads->Symbol locations

https://developer.nvidia.com/nvidia-driver-symbol-server

Code instrumentation

Performance markers

It's possible to instrument the code with markers to group different actions in a hierarchy. For D3D11 you can use ID3DUserDefinedAnnotation()

This can be later used in some visualization tools:

Custom ETW events

https://docs.microsoft.com/en-us/windows/win32/tracelogging/trace-logging-portal

https://knarkowicz.wordpress.com/2013/05/25/simple-gpuview-custom-event-markers/

ETW providers reference

From ovrlog_win10.cmd:

  • TRACE_OVR_USBVID=091292F9-4F6C-47E1-B483-35D399D45C4C
  • TRACE_OVR_LIB=553787FC-D3D7-4F5E-ACB2-1597C7209B3C
  • TRACE_OVR_UE4_LATELATCHING=B3E9FB28-DD14-477C-8FEC-24FE806D32CF
  • TRACE_NV_DIRECT_MODE=9FC6A966-F8CE-4488-9438-38A247ADEE3C
  • TRACE_AMD_DIRECT_MODE=33AEC352-AA8D-4905-B5AE-DBFF3B5F369D

From valve log.cmd for SteamVR (https://pastebin.com/T0nbnPvK):

  • TRACE_VR=8c8f13b1-60eb-4b6a-a433-de86104115ac
  • TRACE_D3D11=db6f6ddb-ac77-4e88-8253-819df9bbf140:0xffffffffffffffff:6:'stack'

From my own PC (logman query providers):

  • NVFT-ETW-OPENVR {B37F4CA5-5507-42CF-B8C7-BABE280601D2}

From Mattifestation's big list of ETW providers:

  • Microsoft.Windows.Holographic.MixedRealityMode {60d6e217-d25b-504f-83d5-c2deb6a854e5}

From PresentMon

  • Microsoft.Windows.Analog.SpectrumContinuous {356e1338-04ad-420e-8b8a-a2eb678541cf}

Other references

https://ikrima.dev/ue4guide/performance-optimization/gpu-perf-optimization/gpuview/ https://gist.github.com/pixelmager/7e5bb79f106287d7d353e14a7d0cfa47