## Generalized ufuncs

We've just seen how to make our own ufuncs using `vectorize`, but what if we need something that can operate on an input array in any way that is not element-wise?

Enter `guvectorize`.  

There are several important differences between `vectorize` and `guvectorize` that bear close examination.  Let's take a look at a few simple examples.

In [36]:
import numpy
from numba import guvectorize

In [37]:
@guvectorize('int64[:], int64, int64[:]', '(n),()->(n)')
def g(x, y, result):
    for i in range(x.shape[0]):
        result[i] = x[i] + y

* Declaration of input/output layouts
* No return statements

In [38]:
x = numpy.arange(10)

In the cell below we call the function `g` with a preallocated array for the result.

In [39]:
result = numpy.zeros_like(x)
result = g(x, 5, result)
print(result)

[ 5  6  7  8  9 10 11 12 13 14]


In [42]:
res = g(x, 5)
print(res)

[ 5  6  7  8  9 10 11 12 13 14]


But wait!  We can still call `g` as if it were defined as `def g(x, y)`

```python
res = g(x, 5)
print(res)
```

We don't recommend this as it can have unintended consequences if some of the elements of the `results` array are not operated on by the function `g`.  (The advantage is that you can preserve existing interfaces to previously written functions).

In [5]:
@guvectorize('float64[:,:], float64[:,:], float64[:,:]', 
            '(m,n),(n,p)->(m,p)')
def matmul(A, B, C):
    m, n = A.shape
    n, p = B.shape
    for i in range(m):
        for j in range(p):
            C[i, j] = 0
            for k in range(n):
                C[i, j] += A[i, k] * B[k, j]

In [22]:
n=500
a = numpy.random.random((n, n))

In [23]:
assert numpy.isclose(matmul(a, a, numpy.zeros_like(a)), numpy.dot(a,a)).all()

In [24]:
%timeit matmul(a, a, numpy.zeros_like(a))

134 ms ± 946 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [25]:
%timeit numpy.dot(a,a)

2.62 ms ± 263 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [26]:
%timeit a @ a

2.52 ms ± 315 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


And it also supports the `target` keyword argument

In [27]:
def g(x, y, res):
    for i in range(x.shape[0]):
        res[i] = x[i] + numpy.exp(y)
        
g_serial = guvectorize('float64[:], float64, float64[:]', 
                       '(n),()->(n)')(g)
g_par = guvectorize('float64[:], float64, float64[:]', 
                    '(n),()->(n)', target='parallel')(g)

In [28]:
%timeit res = g_serial(numpy.arange(1000000).reshape(1000, 1000), 3)
%timeit res = g_par(numpy.arange(1000000).reshape(1000, 1000), 3)

5.07 ms ± 64.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.49 ms ± 26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## [Exercise: Writing signatures](./exercises/08.GUVectorize.Exercises.ipynb#Exercise:-2D-Heat-Transfer-signature)

What's up with these boundary conditions?

```python
for i in range(I):
        Tn[i, 0] = T[i, 0]
        Tn[i, J - 1] = Tn[i, J - 2]

    for j in range(J):
        Tn[0, j] = T[0, j]
        Tn[I - 1, j] = Tn[I - 2, j]
```

We don't pass in `Tn` explicitly, which means Numba allocates it for us (thanks!) but it's allocated using `numpy.empty_like` so if we don't touch every value in `Tn` in the function, those empty values will stick around and cause trouble.  

Solutions?  The one above, or pass it in explicitly after doing something like `Tn = Ti.copy()`

## [Exercise: Remove the vanilla loops](./exercises/08.GUVectorize.Exercises.ipynb#Exercise:-2D-Heat-Transfer-Time-loop)

The example above loops in time outside of the `vectorize`d function.  That means it's looping in vanilla Python which is not the fastest thing in the world.  

Move the time loop inside the function.

## Demo: Why not `jit` the `run_ftcs` function?

Because, at the moment, it won't work.  (bummer).

In [None]:
@guvectorize('float64[:,:], float64[:,:]', '(n,n)->(n,n)')
def gucopy(a, b):
    I, J = a.shape
    for i in range(I):
        for j in range(J):
            b[i, j] = a[i, j]

In [None]:
from numba import jit

In [None]:
@jit
def make_a_copy():
    a = numpy.random.random((25,25))
    b = gucopy(a)
    
    return a, b

In [None]:
a, b = make_a_copy()
assert numpy.allclose(a, b)

In [None]:
make_a_copy.inspect_types()