Skip to content

[UR][L0v2] Add graph support for batched queue#21324

Open
KFilipek wants to merge 15 commits intointel:syclfrom
KFilipek:07-graph_for_batched
Open

[UR][L0v2] Add graph support for batched queue#21324
KFilipek wants to merge 15 commits intointel:syclfrom
KFilipek:07-graph_for_batched

Conversation

@KFilipek
Copy link
Copy Markdown
Contributor

This PR adds support for graph capture and execution in the Level Zero v2 batched queue implementation.

Changes:

  • Add command list determination mechanism that switches between immediate and regular command lists based on graph
    capture state
  • Implement previously unsupported graph API methods:
    • queueBeginGraphCapteExp() - begin graph capture
    • queueBeginCapteIntoGraphExp() - begin capture into existing graph
    • queueEndGraphCapteExp() - end graph capture
    • queueIsGraphCapteEnabledExp() - check capture status
    • enqueueGraphExp() - execute captured graph
  • Update operations to use appropriate command list and event pool during graph capture

@KFilipek KFilipek requested a review from a team as a code owner February 19, 2026 13:07
@KFilipek KFilipek marked this pull request as draft February 19, 2026 13:07
@KFilipek KFilipek self-assigned this Feb 19, 2026
@KFilipek KFilipek force-pushed the 07-graph_for_batched branch from 560edb4 to 7c9779b Compare February 19, 2026 13:12
@KFilipek KFilipek force-pushed the 07-graph_for_batched branch from 09eaf7d to 2200aa5 Compare February 20, 2026 10:43
@KFilipek KFilipek force-pushed the 07-graph_for_batched branch from 2200aa5 to 3f6298f Compare February 20, 2026 10:53
Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.hpp
Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.hpp
Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.hpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.hpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.hpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp
Comment on lines +569 to +574
ur_bool_t graph_supported = false;
ASSERT_SUCCESS(urDeviceGetInfo(
device, UR_DEVICE_INFO_GRAPH_RECORD_AND_REPLAY_SUPPORT_EXP,
sizeof(graph_supported), &graph_supported, nullptr));

if (!graph_supported) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part could be moved to a helper method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread unified-runtime/test/adapters/level_zero/v2/batched_queue_test.cpp Outdated
Comment thread unified-runtime/test/adapters/level_zero/v2/batched_queue_test.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp Outdated
@KFilipek KFilipek force-pushed the 07-graph_for_batched branch from 116ac26 to c975178 Compare March 27, 2026 13:32
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setGraphCapture() is not set to true here, but it is set to false in ur_queue_batched_t::queueEndGraphCapteExp(). Is it intentional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — setGraphCapture(true) is now called at line 1048 in queueBeginCapteIntoGraphExp, symmetrically with the false in queueEndGraphCapteExp. Both begin variants set the flag.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

markIssuedCommandInBatch() is intended to be used only with regular lists. Are we sure that getListManager() does not return an immediate list? According to this comment and this comment, it is likely that we work on an immediate list here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

markIssuedCommandInBatch() now returns early when isGraphCaptureActive() is true (lines 164–168). During capture, commands go to the immediate list, so there's nothing to track in the batch counter.

Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to record queueFlush() or queueFinish() on the graph?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe not

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could verify whether the data is correctly written to the buffer.

@KFilipek KFilipek force-pushed the 07-graph_for_batched branch 3 times, most recently from 5bb0193 to 24efe33 Compare April 2, 2026 10:23
ur_result_t ur_queue_batched_t::queueFlush() {
auto batchLocked = currentCmdLists.lock();

if (batchLocked->isActiveBatchEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should first check whether the graph capture mode is active, and then, if active - enqueue the graph, return from the function.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong, see the comment for queueFinish

}

ur_result_t
ur_queue_batched_t::queueFinishUnlocked(locked<batch_manager> &batchLocked) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, we should check whether the graph capture is active and enqueue the graph if active.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that recorded graph should be enqueued when graph capture is active and queueFinish is called. There's a separate enqueueGraph function for enqueueing graphs, so for active graph capture queueFinish should probably be noop.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kswiecicki It's what I've added, noop when graph capture is active.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this be noop? I don't mean that it's bad, I just want to understand the reasoning behind this decision.
Maybe I don't understand what are the consequences of the active graph mode. Are operations immediately submitted then?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't synchronize the current queue with host, since the queue is in graph capture mode, and the graph wasn't finalized yet. Hence I think that this should either return some INVALID error or be a noop.

@KFilipek KFilipek force-pushed the 07-graph_for_batched branch from e59a819 to 3109250 Compare April 3, 2026 09:50
// Firstly, enqueue the current batch (a regular list), then enqueue the
// command buffer batch (also a regular list) to preserve the order of
// operations
if (!lockedBatch->isActiveBatchEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work with the current implementation and the graph capture being enabled?

@KFilipek KFilipek force-pushed the 07-graph_for_batched branch 2 times, most recently from cb286b4 to 375088d Compare April 13, 2026 10:36
@KFilipek
Copy link
Copy Markdown
Contributor Author

Rebased

KFilipek added 13 commits April 15, 2026 11:46
During graph capture, commands are appended to an immediate command list
instead of the regular batch.
Before beginning graph capture, enqueue the current batch (regular command
list) to preserve operation order. This ensures the queue is empty before
switching to immediate list mode for graph capture, similar to command
buffer handling.

Apply to both queueBeginGraphCapteExp and queueBeginCapteIntoGraphExp.
During graph capture, operations are recorded to the immediate command
list. Synchronization and flushing operations don't apply to graph
recording, so return early when graph capture is active.

Also added a clarifying comment in queueIsGraphCapteEnabledExp about
the returned command list.
This ensures consistency with other enqueue methods and provides proper
context for event handling during graph capture.
@KFilipek KFilipek force-pushed the 07-graph_for_batched branch from 375088d to a1f61c9 Compare April 15, 2026 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants