## Generalized ufuncs

We've just seen how to make our own ufuncs using `vectorize`, but what if we need something that can operate on an input array in any way that is not element-wise?

Enter `guvectorize`.  

There are several important differences between `vectorize` and `guvectorize` that bear close examination.  Let's take a look at a few simple examples.

In [1]:
import numpy
from numba import guvectorize

In [2]:
@guvectorize('int64[:], int64, int64[:]', '(n),()->(n)')
def g(x, y, result):
    for i in range(x.shape[0]):
        result[i] = x[i] + y

* Declaration of input/output layouts
* No return statements

In [3]:
x = numpy.arange(10)

In the cell below we call the function `g` with a preallocated array for the result.

In [4]:
result = numpy.zeros_like(x)
result = g(x, 5, result)
print(result)

[ 5  6  7  8  9 10 11 12 13 14]


But wait!  We can still call `g` as if it were defined as `def g(x, y)`

```python
res = g(x, 5)
print(res)
```

We don't recommend this as it can have unintended consequences if some of the elements of the `results` array are not operated on by the function `g`.  (The advantage is that you can preserve existing interfaces to previously written functions).

In [5]:
@guvectorize('float64[:,:], float64[:,:], float64[:,:]', 
            '(m,n),(n,p)->(m,p)')
def matmul(A, B, C):
    m, n = A.shape
    n, p = B.shape
    for i in range(m):
        for j in range(p):
            C[i, j] = 0
            for k in range(n):
                C[i, j] += A[i, k] * B[k, j]

In [6]:
a = numpy.random.random((500, 500))

In [7]:
out = matmul(a, a, numpy.zeros_like(a))

In [8]:
%timeit matmul(a, a, numpy.zeros_like(a))

283 ms ± 96.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [9]:
%timeit a @ a

5.12 ms ± 677 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


And it also supports the `target` keyword argument

In [10]:
def g(x, y, res):
    for i in range(x.shape[0]):
        res[i] = x[i] + numpy.exp(y)
        
g_serial = guvectorize('float64[:], float64, float64[:]', 
                       '(n),()->(n)')(g)
g_par = guvectorize('float64[:], float64, float64[:]', 
                    '(n),()->(n)', target='parallel')(g)

In [11]:
%timeit res = g_serial(numpy.arange(1000000).reshape(1000, 1000), 3)
%timeit res = g_par(numpy.arange(1000000).reshape(1000, 1000), 3)

10 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.02 ms ± 455 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## [Exercise: Writing signatures](./exercises/08.GUVectorize.Exercises.ipynb#Exercise:-2D-Heat-Transfer-signature)

What's up with these boundary conditions?

```python
for i in range(I):
        Tn[i, 0] = T[i, 0]
        Tn[i, J - 1] = Tn[i, J - 2]

    for j in range(J):
        Tn[0, j] = T[0, j]
        Tn[I - 1, j] = Tn[I - 2, j]
```

We don't pass in `Tn` explicitly, which means Numba allocates it for us (thanks!) but it's allocated using `numpy.empty_like` so if we don't touch every value in `Tn` in the function, those empty values will stick around and cause trouble.  

Solutions?  The one above, or pass it in explicitly after doing something like `Tn = Ti.copy()`

## [Exercise: Remove the vanilla loops](./exercises/08.GUVectorize.Exercises.ipynb#Exercise:-2D-Heat-Transfer-Time-loop)

The example above loops in time outside of the `vectorize`d function.  That means it's looping in vanilla Python which is not the fastest thing in the world.  

Move the time loop inside the function.

## Demo: Why not `jit` the `run_ftcs` function?

Because, at the moment, it won't work.  (bummer).

In [12]:
@guvectorize('float64[:,:], float64[:,:]', '(n,n)->(n,n)')
def gucopy(a, b):
    I, J = a.shape
    for i in range(I):
        for j in range(J):
            b[i, j] = a[i, j]

In [13]:
from numba import jit

In [14]:
@jit
def make_a_copy():
    a = numpy.random.random((25,25))
    b = gucopy(a)
    
    return a, b

In [15]:
a, b = make_a_copy()
assert numpy.allclose(a, b)

In [16]:
make_a_copy.inspect_types()

make_a_copy ()
--------------------------------------------------------------------------------
# File: <ipython-input-14-66bff6a09e40>
# --- LINE 1 --- 
# label 0
#   del $0.1
#   del $0.2
#   del $const0.4
#   del $0.3
#   del $0.5
#   del $0.6
#   del $0.8
#   del b
#   del a
#   del $0.11

@jit

# --- LINE 2 --- 

def make_a_copy():

    # --- LINE 3 --- 
    #   $0.1 = global(numpy: <module 'numpy' from '/home/martinci/.local/share/anaconda3/lib/python3.6/site-packages/numpy/__init__.py'>)  :: pyobject
    #   $0.2 = getattr(value=$0.1, attr=random)  :: pyobject
    #   $0.3 = getattr(value=$0.2, attr=random)  :: pyobject
    #   $const0.4 = const(tuple, (25, 25))  :: pyobject
    #   $0.5 = call $0.3($const0.4, func=$0.3, args=[Var($const0.4, <ipython-input-14-66bff6a09e40> (3))], kws=(), vararg=None)  :: pyobject
    #   a = $0.5  :: pyobject

    a = numpy.random.random((25,25))

    # --- LINE 4 --- 
    #   $0.6 = global(gucopy: <ufunc 'gucopy'>)  :: pyobject
    #   $0.8 = c