CUDA Fast Math

As noted in :ref:`fast-math`, for certain classes of applications that utilize floating point, strict IEEE-754 conformance is not required. For this subset of applications, performance speedups may be possible.

The CUDA target implements :ref:`fast-math` behavior with two differences.

First, the fastmath argument to the :func:`@jit decorator <numba.cuda.jit>` is limited to the values True and False. When True, the following optimizations are enabled:
- Flushing of denormals to zero.
- Use of a fast approximation to the square root function.
- Use of a fast approximation to the division operation.
- Contraction of multiply and add operations into single fused multiply-add operations.
See the documentation for nvvmCompileProgram for more details of these optimizations.
Secondly, calls to a subset of math module functions on float32 operands will be implemented using fast approximate implementations from the libdevice library.
- :func:`math.cos`: Implemented using __nv_fast_cosf.
- :func:`math.sin`: Implemented using __nv_fast_sinf.
- :func:`math.tan`: Implemented using __nv_fast_tanf.
- :func:`math.exp`: Implemented using __nv_fast_expf.
- :func:`math.log2`: Implemented using __nv_fast_log2f.
- :func:`math.log10`: Implemented using __nv_fast_log10f.
- :func:`math.log`: Implemented using __nv_fast_logf.
- :func:`math.pow`: Implemented using __nv_fast_powf.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fastmath.rst

fastmath.rst

CUDA Fast Math

Files

fastmath.rst

Latest commit

History

fastmath.rst

File metadata and controls

CUDA Fast Math