Skip to content

[SYCL][L0] urEventWait must not be called for an internal event #21704

@al42and

Description

@al42and

Describe the bug

A mix of several in-order queues and ext_oneapi_submit_barrier can trigger ur_die: urEventWait must not be called for an internal event from a valid code.

To reproduce

$ clang++ -fsycl -g sycl_event_barrier_shenanigans.cpp && ONEAPI_DEVICE_SELECTOR=level_zero:0 ./a.out 
Device: Intel(R) Arc(TM) A770 Graphics
Driver: 1.14.37020+3
ur_die: urEventWait must not be called for an internal event
terminate called without an active exception
Aborted (core dumped)

UR_L0_SERIALIZE=2 makes the code pass; UR_L0_DISABLE_EVENTS_CACHING=1 makes the code segfault.

sycl_event_barrier_shenanigans.cpp

#include <sycl/sycl.hpp>
#include <optional>
#include <cstdio>

int main() {
    sycl::device dev;
    printf("Device: %s\n", dev.get_info<sycl::info::device::name>().c_str());
    printf("Driver: %s\n", dev.get_info<sycl::info::device::driver_version>().c_str());

    sycl::context ctx(dev);
    sycl::property_list props{sycl::property::queue::in_order()};
    sycl::queue computeQ(ctx, dev, props);
    sycl::queue auxQ(ctx, dev, props);
    sycl::queue drainQ(ctx, dev, props);

    constexpr int N = 64;
    int *d_buf = sycl::malloc_device<int>(N, dev, ctx);

    std::optional<sycl::event> stale_event;

    for (int iter = 0; iter < 2000; iter++) {
        if (stale_event) {
            stale_event->wait();
            stale_event.reset();
        }

        computeQ.memset(d_buf, 0, N * sizeof(int));

        sycl::event auxEv = auxQ.memset(d_buf, 0, sizeof(int));
        sycl::event barrier = computeQ.ext_oneapi_submit_barrier({auxEv});

        sycl::event opt1 = computeQ.ext_oneapi_submit_barrier();
        sycl::event opt2 = computeQ.ext_oneapi_submit_barrier();
        sycl::event opt3 = computeQ.ext_oneapi_submit_barrier();

        barrier.wait();

        stale_event = std::move(opt3);

        { [[maybe_unused]] auto _ = std::move(barrier); }
        { [[maybe_unused]] auto _ = std::move(opt1); }
        { [[maybe_unused]] auto _ = std::move(opt2); }
        { [[maybe_unused]] auto _ = std::move(auxEv); }

        for (int i = 0; i < 4; i++) {
            sycl::event dep = drainQ.memset(d_buf, 0, sizeof(int));
            sycl::event flush = computeQ.ext_oneapi_submit_barrier({dep});
            flush.wait();
        }
    }

    if (stale_event) {
        stale_event->wait();
        stale_event.reset();
    }

    sycl::free(d_buf, ctx);
    printf("PASSED (no crash)\n");
    return 0;
}

Environment

  • OS: Ubuntu 24.04
  • Target device and vendor: Arc A770
  • DPC++ version: a826f6a
  • Dependencies version: [level_zero:gpu] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) A770 Graphics 12.55.8 [1.14.37020+3]

Additional context

The bug originally found in GROMACS which uses this pattern heavily.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinglevel-zeroIssues related to the Level Zero backend

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions