As noted in :ref:`fast-math`, for certain classes of applications that utilize floating point, strict IEEE-754 conformance is not required. For this subset of applications, performance speedups may be possible.
The CUDA target implements :ref:`fast-math` behavior with two differences.
First, the
fastmath
argument to the :func:`@jit decorator <numba.cuda.jit>` is limited to the valuesTrue
andFalse
. WhenTrue
, the following optimizations are enabled:- Flushing of denormals to zero.
- Use of a fast approximation to the square root function.
- Use of a fast approximation to the division operation.
- Contraction of multiply and add operations into single fused multiply-add operations.
See the documentation for nvvmCompileProgram for more details of these optimizations.
Secondly, calls to a subset of math module functions on
float32
operands will be implemented using fast approximate implementations from the libdevice library.- :func:`math.cos`: Implemented using __nv_fast_cosf.
- :func:`math.sin`: Implemented using __nv_fast_sinf.
- :func:`math.tan`: Implemented using __nv_fast_tanf.
- :func:`math.exp`: Implemented using __nv_fast_expf.
- :func:`math.log2`: Implemented using __nv_fast_log2f.
- :func:`math.log10`: Implemented using __nv_fast_log10f.
- :func:`math.log`: Implemented using __nv_fast_logf.
- :func:`math.pow`: Implemented using __nv_fast_powf.