Merge branch 'master' into setup_py_optional_numpy

Conflicts: setup.py
numba · Aug 22, 2016 · 7d7a7dd · 7d7a7dd
2 parents 7937cb5 + 9b805cb
commit 7d7a7dd
Show file tree

Hide file tree

Showing 51 changed files with 2,215 additions and 1,636 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -28,7 +28,7 @@ script:
     - buildscripts/incremental/test.sh
 
 after_success:
-    - if [ "$RUN_COVERAGE" == "yes" ]; then coverage combine; codecov; fi
+    - buildscripts/incremental/after_success.sh
 
 notifications:
     email: false

diff --git a/buildscripts/incremental/after_success.sh b/buildscripts/incremental/after_success.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+source activate $CONDA_ENV
+
+# Make sure any error below is reported as such
+set -v -e
+
+if [ "$RUN_COVERAGE" == "yes" ]; then
+    coverage combine
+    codecov
+fi
diff --git a/docs/Makefile b/docs/Makefile
@@ -2,7 +2,7 @@
 #
 
 # You can set these variables from the command line.
-SPHINXOPTS    =
+SPHINXOPTS    = -j4
 SPHINXBUILD   = sphinx-build
 PAPER         =
 BUILDDIR      = _build

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -41,7 +41,6 @@
     #'sphinx.ext.mathjax',
     'sphinx.ext.autodoc',
     #'sphinx.ext.graphviz',
-    'sphinxjp.themecore',
 ]
 
 todo_include_todos = True

diff --git a/docs/source/cuda/memory.rst b/docs/source/cuda/memory.rst
@@ -124,7 +124,7 @@ and from the device automatically. It can be used as drop-in replacement for
 
 .. autoclass:: numba.SmartArray
    :members: __init__, get, mark_changed
-	       
+
 
 Thus, `SmartArray` objects may be passed as function arguments to jit-compiled
 functions. Whenever a cuda.jit-compiled function is being executed, it will
@@ -135,3 +135,46 @@ references to that.
 Thus, if the next operation is another invocation of a cuda.jit-compiled function,
 the data does not need to be transferred again, making the compound operation more
 efficient (and making the use of the GPU advantagous even for smaller data sizes).
+
+Deallocation Behavior
+=====================
+
+Deallocation of all CUDA resources are tracked on a per-context basis.
+When the last reference to a device memory is dropped, the underlying memory
+is scheduled to be deallocated.  The deallocation does not occur immediately.
+It is added to a queue of pending deallocations.  This design has two benefits:
+
+1. Resource deallocation API may cause the device to synchronize; thus, breaking
+   any asynchronous execution.  Deferring the deallocation could avoid latency
+   in performance critical code section.
+2. Some deallocation errors may cause all the remaining deallocations to fail.
+   Continued deallocation errors can cause critical errors at the CUDA driver
+   level.  In some cases, this could mean a segmentation fault in the CUDA
+   driver. In the worst case, this could cause the system GUI to freeze and
+   could only recover with a system reset.  When an error occurs during a
+   deallocation, the remaining pending deallocations are cancelled.  Any
+   deallocation error will be reported.  When the process is terminated, the
+   CUDA driver is able to release all allocated resources by the terminated
+   process.
+
+The deallocation queue is flushed automatically as soon as the following events
+occur:
+
+- An allocation failed due to out-of-memory error.  Allocation is retried after
+  flushing all deallocations.
+- The deallocation queue has reached its maximum size, which is default to 10.
+  User can override by setting the environment variable
+  `NUMBA_CUDA_MAX_PENDING_DEALLOCS_COUNT`.  For example,
+  `NUMBA_CUDA_MAX_PENDING_DEALLOCS_COUNT=20`, increases the limit to 20.
+- The maximum accumulated byte size of resources that are pending deallocation
+  is reached.  This is default to 20% of the device memory capacity.
+  User can override by setting the environment variable
+  `NUMBA_CUDA_MAX_PENDING_DEALLOCS_RATIO`. For example,
+  `NUMBA_CUDA_MAX_PENDING_DEALLOCS_RATIO=0.5` sets the limit to 50% of the
+  capacity.
+
+Sometimes, it is desired to defer resource deallocation until a code section
+ends.  Most often, users want to avoid any implicit synchronization due to
+deallocation.  This can be done by using the following context manager:
+
+.. autofunction:: numba.cuda.defer_cleanup
diff --git a/docs/source/reference/numpysupported.rst b/docs/source/reference/numpysupported.rst
@@ -139,6 +139,7 @@ Other methods
 
 The following methods of Numpy arrays are supported:
 
+* :meth:`~numpy.ndarray.argsort` (without arguments)
 * :meth:`~numpy.ndarray.astype` (only the 1-argument form)
 * :meth:`~numpy.ndarray.copy` (without arguments)
 * :meth:`~numpy.ndarray.flatten` (no order argument; 'C' order only)
@@ -209,6 +210,7 @@ Other functions
 The following top-level functions are supported:
 
 * :func:`numpy.arange`
+* :func:`numpy.argsort` (no optional arguments)
 * :func:`numpy.array` (only the 2 first arguments)
 * :func:`numpy.asfortranarray` (only the first argument)
 * :func:`numpy.bincount` (only the 2 first arguments)

diff --git a/docs/source/reference/pysupported.rst b/docs/source/reference/pysupported.rst
@@ -176,8 +176,8 @@ The following built-in functions are supported:
 * :class:`int`: only the one-argument form
 * :func:`iter`: only the one-argument form
 * :func:`len`
-* :func:`min`: only the multiple-argument form
-* :func:`max`: only the multiple-argument form
+* :func:`min`
+* :func:`max`
 * :func:`next`: only the one-argument form
 * :func:`print`: only numbers and strings; no ``file`` or ``sep`` argument
 * :class:`range`: semantics are similar to those of Python 3 even in Python 2: