CUDA Host API

Device Management

Device detection and enquiry

The following functions are available for querying the available hardware:

numba.cuda.is_available

numba.cuda.detect

Context management

CUDA Python functions execute within a CUDA context. Each CUDA device in a system has an associated CUDA context, and Numba presently allows only one context per thread. For further details on CUDA Contexts, refer to the CUDA Driver API Documentation on Context Management and the CUDA C Programming Guide Context Documentation. CUDA Contexts are instances of the ~numba.cuda.cudadrv.driver.Context class:

numba.cuda.cudadrv.driver.Context

The following functions can be used to get or select the context:

numba.cuda.current_context

numba.cuda.require_context

The following functions affect the current context:

numba.cuda.synchronize

numba.cuda.close

Device management

Numba maintains a list of supported CUDA-capable devices:

numba.cuda.gpus

An indexable list of supported CUDA devices. This list is indexed by integer device ID.

Alternatively, the current device can be obtained:

numba.cuda.gpus.current

Return the currently-selected device.

Getting a device through numba.cuda.gpus always provides an instance of numba.cuda.cudadrv.devices._DeviceContextManager, which acts as a context manager for the selected device:

numba.cuda.cudadrv.devices._DeviceContextManager

One may also select a context and device or get the current device using the following three functions:

numba.cuda.select_device

numba.cuda.get_current_device

numba.cuda.list_devices

The numba.cuda.cudadrv.driver.Device class can be used to enquire about the functionality of the selected device:

The device associated with a particular context.

compute_capability

A tuple, (major, minor) indicating the supported compute capability.

id

The integer ID of the device.

name

The name of the device (e.g. "GeForce GTX 970").

uuid

The UUID of the device (e.g. "GPU-e6489c45-5b68-3b03-bab7-0e7c8e809643").

reset

Delete the context for the device. This will destroy all memory allocations, events, and streams created within the context.

supports_float16

Return True if the device supports float16 operations, False otherwise.

Compilation

Numba provides an entry point for compiling a Python function to PTX without invoking any of the driver API. This can be useful for:

Generating PTX that is to be inlined into other PTX code (e.g. from outside the Numba / Python ecosystem).
Generating code when there is no device present.
Generating code prior to a fork without initializing CUDA.

Note

It is the user's responsibility to manage any ABI issues arising from the use of compilation to PTX.

numba.cuda.compile_ptx

The environment variable NUMBA_CUDA_DEFAULT_PTX_CC can be set to control the default compute capability targeted by compile_ptx - see numba-envvars-gpu-support. If PTX for the compute capability of the current device is required, the compile_ptx_for_current_device function can be used:

numba.cuda.compile_ptx_for_current_device

Measurement

Profiling

The NVidia Visual Profiler can be used directly on executing CUDA Python code -it is not a requirement to insert calls to these functions into user code. However, these functions can be used to allow profiling to be performed selectively on specific portions of the code. For further information on profiling, see the NVidia Profiler User's Guide.

numba.cuda.profile_start

numba.cuda.profile_stop

numba.cuda.profiling

Events

Events can be used to monitor the progress of execution and to record the timestamps of specific points being reached. Event creation returns immediately, and the created event can be queried to determine if it has been reached. For further information, see the CUDA C Programming Guide Events section.

The following functions are used for creating and measuring the time between events:

numba.cuda.event

numba.cuda.event_elapsed_time

Events are instances of the numba.cuda.cudadrv.driver.Event class:

numba.cuda.cudadrv.driver.Event

Stream Management

Streams allow concurrency of execution on a single device within a given context. Queued work items in the same stream execute sequentially, but work items in different streams may execute concurrently. Most operations involving a CUDA device can be performed asynchronously using streams, including data transfers and kernel execution. For further details on streams, see the CUDA C Programming Guide Streams section.

Numba defaults to using the legacy default stream as the default stream. The per-thread default stream can be made the default stream by setting the environment variable NUMBA_CUDA_PER_THREAD_DEFAULT_STREAM to 1 (see the CUDA Environment Variables section <numba-envvars-gpu-support>). Regardless of this setting, the objects representing the legacy and per-thread default streams can be constructed using the functions below.

Streams are instances of numba.cuda.cudadrv.driver.Stream:

numba.cuda.cudadrv.driver.Stream

To create a new stream:

numba.cuda.stream

To get the default stream:

numba.cuda.default_stream

To get the default stream with an explicit choice of whether it is the legacy or per-thread default stream:

numba.cuda.legacy_default_stream

numba.cuda.per_thread_default_stream

To construct a Numba Stream object using a stream allocated elsewhere, the external_stream function is provided. Note that the lifetime of external streams must be managed by the user - Numba will not deallocate an external stream, and the stream must remain valid whilst the Numba Stream object is in use.

numba.cuda.external_stream

Runtime

Numba generally uses the Driver API, but it provides a simple wrapper to the Runtime API so that the version of the runtime in use can be queried. This is accessed through cuda.runtime, which is an instance of the numba.cuda.cudadrv.runtime.Runtime class:

numba.cuda.cudadrv.runtime.Runtime

Whether the current runtime is officially supported and tested with the current version of Numba can also be queried:

numba.cuda.is_supported_version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

host.rst

host.rst

CUDA Host API

Device Management

Device detection and enquiry

Context management

Device management

Compilation

Measurement

Profiling

Events

Stream Management

Runtime

Files

host.rst

Latest commit

History

host.rst

File metadata and controls

CUDA Host API

Device Management

Device detection and enquiry

Context management

Device management

Compilation

Measurement

Profiling

Events

Stream Management

Runtime