Skip to content

Conversation

@nikita-pletnev
Copy link

Description

Make AMF encoder utilize all VCNs (HW encoding units in AMD GPUs) which are present in used GPU, currently only one VCN is used. Reduce amount of D3D devices being created by making one shared device being used by all AMF encode sessions.

Motivation and Context

When use obs-multi-rtmp plugin to run multiple streams at once this change significantly increases performance allowing to run more streams simultaneously.

How Has This Been Tested?

Run OBS on Windows 10, AMD Radeon RX 6800, and run multiple streams with obs-multi-rtmp plugin

Types of changes

  • Performance enhancement (non-breaking change which improves efficiency)

Checklist:

  • My code has been run through clang-format.
  • I have read the contributing document.
  • My code is not on the master branch.
  • The code has been tested.
  • All commit messages are properly formatted and commits squashed where appropriate.
  • I have included updates to all appropriate documentation.

VCNs are HW units which can process encoders in parallel boosting
performance if several encoders are run simultaneously. Added controller
to balance load of VCNs. Made AMF context and D3D11 device and context
shared between all encoder instances.

Using KeyedMutex to synchronize copying from texture from OBS core to
texture allocated and used in encoder caused submitting a lot of HW fences
to all HW queues leading to parallel encode queues waiting for other
queue's job's done and hence significant performance degradation.
To work this around, encoder's texture is created as shared, it's opened
and copy is done on a device used in OBS core, using synchronization on CPU.
@nikita-pletnev
Copy link
Author

Addressed comments above, squashed commits and rebased on the latest master

@WizardCM WizardCM added the Enhancement Improvement to existing functionality label Feb 2, 2025
@derrod
Copy link
Member

derrod commented Mar 11, 2025

This really should be two separate commits, if not PRs entirely. Because there are two independent changes:

  1. Move to shared DX context
  2. VCN engine election

I do also have to wonder if this is even necessary anymore, at least on my machine with a 7700 XT and the latest driver I can observe that multiple FFmpeg sessions will automatically be balanced among VCN units without manual intervention.

Additoinally, the number of sessions per VCN should be global, not per-codec, as all codecs share throughput on AMD cards. Ideally it should also track the configured pixels per second to more smartly balance across instances. Though of course ideally driver/SDK should do this automatically.

@nikita-pletnev
Copy link
Author

It will be separated into two PRs.

Auto-balancing works automatically only if hardware scheduler (HAGS) is enabled. It is not always the case and in-app balancing gives the app more control. In the future the app can decide on VCN instance selection based on more parameters like projected VCN load derived from resolution, framerate, etc.

@Wallboy
Copy link

Wallboy commented Feb 1, 2026

Any update on getting this merged?

HAGS does not seem available on Win 10 with AMD cards unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement Improvement to existing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants