Skip to content

run_opencl_fft: NVIDIA CUDA queue check is inverted (raises on same queue) #270

@xywei

Description

@xywei

Summary

In sumpy.tools.run_opencl_fft, the NVIDIA CUDA queue validation condition appears inverted.

Current code on main:

for evt in wait_for:
    if not evt.command_queue != queue:
        raise RuntimeError("Different queues not supported with NVIDIA CUDA")

not (evt.command_queue != queue) is equivalent to evt.command_queue == queue, so this raises for the same queue (the safe/expected case), and does not raise for different queues (the unsafe case).

Expected behavior

Raise only when an event in wait_for belongs to a different queue:

if evt.command_queue != queue:
    raise RuntimeError("Different queues not supported with NVIDIA CUDA")

Why this matters

On NVIDIA CUDA OpenCL, valid FFT calls with same-queue wait_for events fail with:

RuntimeError: Different queues not supported with NVIDIA CUDA

This also breaks downstream CUDA tests (example observed in volumential):

  • test/test_volume_fmm.py::test_volume_fmm_laplace fails before the fix
  • passes after changing the condition to evt.command_queue != queue

Additional context

This looks like a logic typo introduced in the NVIDIA marker handling path (Fix enqueue_marker for NVIDIA CUDA, commit 9d1d611), likely unintended.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions