Skip to content

[UR][L0v2] Fix double zeCommandListClose() in batched queue flush#21660

Merged
kswiecicki merged 2 commits intointel:syclfrom
ldorau:URL0v2_Fix_double_zeCommandListClose_in_batched_queue_flush
Apr 15, 2026
Merged

[UR][L0v2] Fix double zeCommandListClose() in batched queue flush#21660
kswiecicki merged 2 commits intointel:syclfrom
ldorau:URL0v2_Fix_double_zeCommandListClose_in_batched_queue_flush

Conversation

@ldorau
Copy link
Copy Markdown
Contributor

@ldorau ldorau commented Mar 31, 2026

Root cause

ur_queue_batched_t::queueFlushUnlocked is called after every
enqueueUSMHostAllocExp / enqueueUSMSharedAllocExp / enqueueUSMDeviceAllocExp
/ enqueueUSMFreeExp to eagerly submit the current batch. The call sequence
is:

  queueFlushUnlocked
    enqueueCurrentBatchUnlocked()   <- (1) zeCommandListClose
                                       + zeCommandListImmediateAppendCommandListsExp
    renewBatchUnlocked()
      if runBatches.size() >= initialSlotsForBatches (10):
        queueFinishUnlocked()
          if !isActiveBatchEmpty(): <- true: enqueuedOperationsCounter > 0
            enqueueCurrentBatchUnlocked() <- (2) BUG: double zeCommandListClose
                                                  + double submit
          hostSynchronize()         <- may hang on some driver versions

enqueuedOperationsCounter is incremented by markIssuedCommandInBatch before
queueFlushUnlocked is called, but it is not cleared by
enqueueCurrentBatchUnlocked, so queueFinishUnlocked's !isActiveBatchEmpty()
guard does not protect against the re-entry.

With 2 queues and 256 iterations the bug fires ~92 times. On certain GPU
driver versions the immediate command list enters a state from which
zeCommandListHostSynchronize never returns, hanging the test:

  UR_ADAPTERS_FORCE_LOAD=lib/libur_adapter_level_zero_v2.so \
  UR_L0_V2_FORCE_BATCHED=1 \
  ./test/adapters/level_zero/enqueue_alloc-test \
  --gtest_filter=*urL0EnqueueAllocMultiQueueSameDeviceTest.SuccessMt*

Fix

Remove the pre-close of the active batch from queueFlushUnlocked and move
enqueueCurrentBatchUnlocked() into renewBatchUnlocked's else branch.
This ensures that when the batch-slot limit is reached the active batch is
still open, so the existing delegation to queueFinishUnlocked closes and
submits it exactly once via its !isActiveBatchEmpty() guard:

  renewBatchUnlocked()
    if runBatches.size() >= initialSlotsForBatches (10):
      queueFinishUnlocked()
        if !isActiveBatchEmpty(): <- closes + submits exactly once
          enqueueCurrentBatchUnlocked()
        hostSynchronize()
        queueFinishPoolsUnlocked()
        batchFinish()
    else:
      enqueueCurrentBatchUnlocked()  <- normal path
      renewRegularUnlocked()

This keeps all finish logic inside queueFinishUnlocked, making the code
easier to maintain and less prone to bugs if queueFinishUnlocked changes.

With queueFlushUnlocked reduced to a single call to renewBatchUnlocked,
the wrapper is no longer needed. Remove it and call renewBatchUnlocked
directly at all former call sites.

Tested-by:

  UR_ADAPTERS_FORCE_LOAD=lib/libur_adapter_level_zero_v2.so \
  UR_L0_V2_FORCE_BATCHED=1 \
  ./test/adapters/level_zero/enqueue_alloc-test \
  (81 passed, 6 skipped, 0 failed)

@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented Mar 31, 2026

Please review @pbalcer @EuphoricThinking

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Level Zero v2 batched-queue flush/renew corner case where reaching the submitted-batch slot limit could re-enter submission and call zeCommandListClose()/submit twice on the same command list, potentially hanging in zeCommandListHostSynchronize() on some driver versions.

Changes:

  • Update ur_queue_batched_t::renewBatchUnlocked() to avoid delegating to queueFinishUnlocked() when the batch-slot limit is reached; instead directly synchronize, clean pools, and batchFinish() to reset state without re-submitting.
  • Clarify batch_manager::batchFinish() comments to reflect that the active batch may already have been submitted via queueFlushUnlocked().
  • Unskip batched-queue execution in the multi-queue USM alloc test to allow exercising this path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp Prevents double close/double submit on slot-limit renew by changing the synchronization/reset sequence.
unified-runtime/test/adapters/level_zero/enqueue_alloc.cpp Removes batched-queue skips so the multi-queue USM alloc test can run under batched submission.

Comment thread unified-runtime/test/adapters/level_zero/enqueue_alloc.cpp Outdated
@ldorau ldorau force-pushed the URL0v2_Fix_double_zeCommandListClose_in_batched_queue_flush branch from ce528a7 to 111db9f Compare March 31, 2026 10:31
@ldorau ldorau requested a review from Copilot March 31, 2026 10:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented Mar 31, 2026

The "E2E (Preview Mode, ["Linux", "gen12"]" CI job failed because of #21023

@EuphoricThinking
Copy link
Copy Markdown
Contributor

Good catch. I would also say that we could replace the content of queueFlushUnlocked with renewBatchUnlocked, with minor adjustments in order to avoid re-enqueueing the batch, since renewBatchUnlocked is used only in queueFlushUnlocked:

ur_result_t
ur_queue_batched_t::renewBatchUnlocked(locked<batch_manager> &batchLocked) {
  if (batchLocked->isLimitOfUsedCommandListsReached()) {
    // enqueue already in queueFinish
    return queueFinishUnlocked(batchLocked);
  } else {
    // Add enqueue here - maybe with checking for emptiness?
    UR_CALL(batchLocked->enqueueCurrentBatchUnlocked());
    ////

    return batchLocked->renewRegularUnlocked(getNewRegularCmdList());
  }
}

This way, we don't scatter the implementation of queueFinish across different parts of the code, reducing the possibility of bugs if we change the queueFinish implementation and forget to update the other components.

@ldorau ldorau force-pushed the URL0v2_Fix_double_zeCommandListClose_in_batched_queue_flush branch from 111db9f to 27761c6 Compare March 31, 2026 16:03
@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented Mar 31, 2026

Good catch. I would also say that we could replace the content of queueFlushUnlocked with renewBatchUnlocked, with minor adjustments in order to avoid re-enqueueing the batch, since renewBatchUnlocked is used only in queueFlushUnlocked:

ur_result_t
ur_queue_batched_t::renewBatchUnlocked(locked<batch_manager> &batchLocked) {
  if (batchLocked->isLimitOfUsedCommandListsReached()) {
    // enqueue already in queueFinish
    return queueFinishUnlocked(batchLocked);
  } else {
    // Add enqueue here - maybe with checking for emptiness?
    UR_CALL(batchLocked->enqueueCurrentBatchUnlocked());
    ////

    return batchLocked->renewRegularUnlocked(getNewRegularCmdList());
  }
}

This way, we don't scatter the implementation of queueFinish across different parts of the code, reducing the possibility of bugs if we change the queueFinish implementation and forget to update the other components.

Thanks! Fixed.

@ldorau ldorau requested a review from Copilot March 31, 2026 16:07
@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented Mar 31, 2026

Please review @EuphoricThinking @pbalcer

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp
@ldorau ldorau requested a review from Copilot April 1, 2026 08:06
@ldorau ldorau marked this pull request as ready for review April 1, 2026 08:07
@ldorau ldorau requested a review from a team as a code owner April 1, 2026 08:07
@ldorau ldorau changed the title [DRAFT] [UR][L0v2] Fix double zeCommandListClose() in batched queue flush [UR][L0v2] Fix double zeCommandListClose() in batched queue flush Apr 1, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented Apr 1, 2026

Please review @pbalcer @EuphoricThinking @intel/unified-runtime-reviewers-level-zero

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


struct urL0EnqueueAllocMultiQueueSameDeviceTest
: uur::urContextTestWithParam<EnqueueAllocMultiQueueTestParam> {
: uur::urContextTestWithParam<
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a fixture urMultiQueueTypeTestWithParam, which is defined as urContextTestWithParam<MultiQueueParam<T>> (which is exactly what is defined here).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EuphoricThinking @pbalcer I think the proposed replacement is not viable, because urMultiQueueTypeTestWithParam<T> creates exactly one ur_queue_handle_t queue in its SetUp.
But urL0EnqueueAllocMultiQueueSameDeviceTest creates a variable-length vector of queues
(std::vector<ur_queue_handle_t> queues) sized by param.numQueues. If the test inherited from urMultiQueueTypeTestWithParam, the base SetUp would create a single unused queue handle in addition to the test's own queues vector, what IMO would be wasteful and misleading.

@ldorau ldorau force-pushed the URL0v2_Fix_double_zeCommandListClose_in_batched_queue_flush branch from 27761c6 to 1e4274b Compare April 1, 2026 12:57
@ldorau ldorau requested a review from EuphoricThinking April 1, 2026 12:57
Root cause
----------
ur_queue_batched_t::queueFlushUnlocked is called after every
enqueueUSMHostAllocExp / enqueueUSMSharedAllocExp / enqueueUSMDeviceAllocExp
/ enqueueUSMFreeExp to eagerly submit the current batch. The call sequence
is:

  queueFlushUnlocked
    enqueueCurrentBatchUnlocked()   <- (1) zeCommandListClose
                                       + zeCommandListImmediateAppendCommandListsExp
    renewBatchUnlocked()
      if runBatches.size() >= initialSlotsForBatches (10):
        queueFinishUnlocked()
          if !isActiveBatchEmpty(): <- true: enqueuedOperationsCounter > 0
            enqueueCurrentBatchUnlocked() <- (2) BUG: double zeCommandListClose
                                                  + double submit
          hostSynchronize()         <- may hang on some driver versions

enqueuedOperationsCounter is incremented by markIssuedCommandInBatch before
queueFlushUnlocked is called, but it is not cleared by
enqueueCurrentBatchUnlocked, so queueFinishUnlocked's !isActiveBatchEmpty()
guard does not protect against the re-entry.

With 2 queues and 256 iterations the bug fires ~92 times. On certain GPU
driver versions the immediate command list enters a state from which
zeCommandListHostSynchronize never returns, hanging the test:

  UR_ADAPTERS_FORCE_LOAD=lib/libur_adapter_level_zero_v2.so \
  UR_L0_V2_FORCE_BATCHED=1 \
  ./test/adapters/level_zero/enqueue_alloc-test \
  --gtest_filter=*urL0EnqueueAllocMultiQueueSameDeviceTest.SuccessMt*

Fix
---
Remove the pre-close of the active batch from queueFlushUnlocked and move
enqueueCurrentBatchUnlocked() into renewBatchUnlocked's else branch.
This ensures that when the batch-slot limit is reached the active batch is
still open, so the existing delegation to queueFinishUnlocked closes and
submits it exactly once via its !isActiveBatchEmpty() guard:

  renewBatchUnlocked()
    if runBatches.size() >= initialSlotsForBatches (10):
      queueFinishUnlocked()
        if !isActiveBatchEmpty(): <- closes + submits exactly once
          enqueueCurrentBatchUnlocked()
        hostSynchronize()
        queueFinishPoolsUnlocked()
        batchFinish()
    else:
      enqueueCurrentBatchUnlocked()  <- normal path
      renewRegularUnlocked()

This keeps all finish logic inside queueFinishUnlocked, making the code
easier to maintain and less prone to bugs if queueFinishUnlocked changes.

With queueFlushUnlocked reduced to a single call to renewBatchUnlocked,
the wrapper is no longer needed. Remove it and call renewBatchUnlocked
directly at all former call sites.

Tested-by:
  UR_ADAPTERS_FORCE_LOAD=lib/libur_adapter_level_zero_v2.so \
  UR_L0_V2_FORCE_BATCHED=1 \
  ./test/adapters/level_zero/enqueue_alloc-test \
  (81 passed, 6 skipped, 0 failed)

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
@ldorau ldorau force-pushed the URL0v2_Fix_double_zeCommandListClose_in_batched_queue_flush branch from 6d299f3 to d421dc7 Compare April 1, 2026 15:21
@ldorau ldorau requested a review from Copilot April 1, 2026 15:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@ldorau ldorau requested a review from Copilot April 2, 2026 08:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread unified-runtime/test/adapters/level_zero/enqueue_alloc.cpp Outdated
Comment thread unified-runtime/test/adapters/level_zero/enqueue_alloc.cpp Outdated
@ldorau ldorau force-pushed the URL0v2_Fix_double_zeCommandListClose_in_batched_queue_flush branch from d421dc7 to 7507428 Compare April 2, 2026 10:05
@ldorau ldorau requested a review from Copilot April 2, 2026 10:06
@ldorau ldorau force-pushed the URL0v2_Fix_double_zeCommandListClose_in_batched_queue_flush branch from 7507428 to c7c9a6f Compare April 2, 2026 10:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Comment thread unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp
@ldorau ldorau requested a review from EuphoricThinking April 14, 2026 12:35
@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented Apr 14, 2026

@pbalcer @EuphoricThinking Please re-review: #21660 (comment)

@ldorau ldorau force-pushed the URL0v2_Fix_double_zeCommandListClose_in_batched_queue_flush branch from c7c9a6f to 23d701a Compare April 14, 2026 12:51
… all queue types

Enable the urL0EnqueueAllocMultiQueueSameDeviceTest and parameterize it
over all queue submission modes (UR_QUEUE_FLAG_SUBMISSION_BATCHED and
UR_QUEUE_FLAG_SUBMISSION_IMMEDIATE) by:

- Removing SKIP_IF_BATCHED_QUEUE to enable the test for batched queues.
- Changing the base class template parameter from
  EnqueueAllocMultiQueueTestParam to
  uur::MultiQueueParam<EnqueueAllocMultiQueueTestParam> so that the
  queue mode becomes part of the test parameter.
- Adding getAllocParam() and getQueueFlags() helpers to the fixture for
  clean access to the two parts of the parameter tuple. Both call
  this->getParam() directly, consistent with the pattern used in
  urMultiQueueTypeTestWithParam in fixtures.h.
- Creating queues with the parameterized flag via ur_queue_properties_t
  instead of a hardcoded UR_QUEUE_FLAG_SUBMISSION_BATCHED.
- Switching the test suite macro from UUR_DEVICE_TEST_SUITE_WITH_PARAM
  to UUR_MULTI_QUEUE_TYPE_TEST_SUITE_WITH_PARAM and the printer to
  deviceTestWithParamPrinterMulti, which expands the suite to cover
  both queue modes automatically.
- Updating all three test bodies (SuccessMt, SuccessReuse,
  SuccessDependantMt) to use getAllocParam() instead of
  std::get<1>(this->GetParam()), and restoring the numQueues parameter
  in SuccessMt to getAllocParam().numQueues.

This ensures both batched and immediate queues are covered by default
test runs without requiring UR_L0_V2_FORCE_BATCHED=1.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@ldorau ldorau dismissed EuphoricThinking’s stale review April 15, 2026 09:57

Re-review requested

@ldorau ldorau requested a review from kswiecicki April 15, 2026 09:59
@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented Apr 15, 2026

@kswiecicki Review please

@kswiecicki kswiecicki merged commit 7d7012f into intel:sycl Apr 15, 2026
58 of 61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants