Skip to content

Commit

Permalink
Merge branch 'master' into setup_py_optional_numpy
Browse files Browse the repository at this point in the history
Conflicts:
	setup.py
  • Loading branch information
pitrou committed Aug 22, 2016
2 parents 7937cb5 + 9b805cb commit 7d7a7dd
Show file tree
Hide file tree
Showing 51 changed files with 2,215 additions and 1,636 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ script:
- buildscripts/incremental/test.sh

after_success:
- if [ "$RUN_COVERAGE" == "yes" ]; then coverage combine; codecov; fi
- buildscripts/incremental/after_success.sh

notifications:
email: false
Expand Down
11 changes: 11 additions & 0 deletions buildscripts/incremental/after_success.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

source activate $CONDA_ENV

# Make sure any error below is reported as such
set -v -e

if [ "$RUN_COVERAGE" == "yes" ]; then
coverage combine
codecov
fi
2 changes: 1 addition & 1 deletion docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#

# You can set these variables from the command line.
SPHINXOPTS =
SPHINXOPTS = -j4
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
Expand Down
1 change: 0 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@
#'sphinx.ext.mathjax',
'sphinx.ext.autodoc',
#'sphinx.ext.graphviz',
'sphinxjp.themecore',
]

todo_include_todos = True
Expand Down
45 changes: 44 additions & 1 deletion docs/source/cuda/memory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ and from the device automatically. It can be used as drop-in replacement for
.. autoclass:: numba.SmartArray
:members: __init__, get, mark_changed


Thus, `SmartArray` objects may be passed as function arguments to jit-compiled
functions. Whenever a cuda.jit-compiled function is being executed, it will
Expand All @@ -135,3 +135,46 @@ references to that.
Thus, if the next operation is another invocation of a cuda.jit-compiled function,
the data does not need to be transferred again, making the compound operation more
efficient (and making the use of the GPU advantagous even for smaller data sizes).

Deallocation Behavior
=====================

Deallocation of all CUDA resources are tracked on a per-context basis.
When the last reference to a device memory is dropped, the underlying memory
is scheduled to be deallocated. The deallocation does not occur immediately.
It is added to a queue of pending deallocations. This design has two benefits:

1. Resource deallocation API may cause the device to synchronize; thus, breaking
any asynchronous execution. Deferring the deallocation could avoid latency
in performance critical code section.
2. Some deallocation errors may cause all the remaining deallocations to fail.
Continued deallocation errors can cause critical errors at the CUDA driver
level. In some cases, this could mean a segmentation fault in the CUDA
driver. In the worst case, this could cause the system GUI to freeze and
could only recover with a system reset. When an error occurs during a
deallocation, the remaining pending deallocations are cancelled. Any
deallocation error will be reported. When the process is terminated, the
CUDA driver is able to release all allocated resources by the terminated
process.

The deallocation queue is flushed automatically as soon as the following events
occur:

- An allocation failed due to out-of-memory error. Allocation is retried after
flushing all deallocations.
- The deallocation queue has reached its maximum size, which is default to 10.
User can override by setting the environment variable
`NUMBA_CUDA_MAX_PENDING_DEALLOCS_COUNT`. For example,
`NUMBA_CUDA_MAX_PENDING_DEALLOCS_COUNT=20`, increases the limit to 20.
- The maximum accumulated byte size of resources that are pending deallocation
is reached. This is default to 20% of the device memory capacity.
User can override by setting the environment variable
`NUMBA_CUDA_MAX_PENDING_DEALLOCS_RATIO`. For example,
`NUMBA_CUDA_MAX_PENDING_DEALLOCS_RATIO=0.5` sets the limit to 50% of the
capacity.

Sometimes, it is desired to defer resource deallocation until a code section
ends. Most often, users want to avoid any implicit synchronization due to
deallocation. This can be done by using the following context manager:

.. autofunction:: numba.cuda.defer_cleanup
2 changes: 2 additions & 0 deletions docs/source/reference/numpysupported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ Other methods

The following methods of Numpy arrays are supported:

* :meth:`~numpy.ndarray.argsort` (without arguments)
* :meth:`~numpy.ndarray.astype` (only the 1-argument form)
* :meth:`~numpy.ndarray.copy` (without arguments)
* :meth:`~numpy.ndarray.flatten` (no order argument; 'C' order only)
Expand Down Expand Up @@ -209,6 +210,7 @@ Other functions
The following top-level functions are supported:

* :func:`numpy.arange`
* :func:`numpy.argsort` (no optional arguments)
* :func:`numpy.array` (only the 2 first arguments)
* :func:`numpy.asfortranarray` (only the first argument)
* :func:`numpy.bincount` (only the 2 first arguments)
Expand Down
4 changes: 2 additions & 2 deletions docs/source/reference/pysupported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -176,8 +176,8 @@ The following built-in functions are supported:
* :class:`int`: only the one-argument form
* :func:`iter`: only the one-argument form
* :func:`len`
* :func:`min`: only the multiple-argument form
* :func:`max`: only the multiple-argument form
* :func:`min`
* :func:`max`
* :func:`next`: only the one-argument form
* :func:`print`: only numbers and strings; no ``file`` or ``sep`` argument
* :class:`range`: semantics are similar to those of Python 3 even in Python 2:
Expand Down
Loading

0 comments on commit 7d7a7dd

Please sign in to comment.