# Why GT4Py?

This notebook compares a NumPy, CuPy and GT4Py implementation of the point-wise stencil
```
d[i, j, k] = a[i, j, k] + b[i, j, k] - c[i, j, k]
```

## NumPy

In [2]:
import numpy as np

shape = (512, 512, 128)

def f_numpy(a, b, c, d):
    d[...] = a + b - c
    
a = np.random.rand(*shape)
b = np.random.rand(*shape)
c = np.random.rand(*shape)
d = np.empty_like(a)

%timeit f_numpy(a, b, c, d)

187 ms ± 399 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


## CuPy

In [9]:
import cupy as cp

shape = (512, 512, 128)

def f_cupy(a, b, c, d):
    d[...] = a + b - c
    
a = cp.random.rand(*shape)
b = cp.random.rand(*shape)
c = cp.random.rand(*shape)
d = cp.empty_like(a)

%timeit f_cupy(a, b, c, d)

15.7 ms ± 62.5 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)


## GT4Py

In [8]:
import gt4py as gt
from gt4py import gtscript
import numpy as np

backend = "gtcuda"
shape = (512, 512, 128)

@gtscript.stencil(backend=backend, verbose=True)
def f_gt4py(
    a: gtscript.Field[float], 
    b: gtscript.Field[float], 
    c: gtscript.Field[float], 
    d: gtscript.Field[float]
):
    with computation(PARALLEL), interval(...):
        d = a + b - c
        
a_np = np.random.rand(*shape)
b_np = np.random.rand(*shape)
c_np = np.random.rand(*shape)

a = gt.storage.from_array(a_np, backend, (0, 0, 0))
b = gt.storage.from_array(b_np, backend, (0, 0, 0))
c = gt.storage.from_array(c_np, backend, (0, 0, 0))
d = gt.storage.empty(backend, (0, 0, 0), shape, float)

f_gt4py(a=a, b=b, c=c, d=d, origin=(0, 0, 0), domain=shape)
%timeit f_gt4py(a=a, b=b, c=c, d=d, origin=(0, 0, 0), domain=shape)

8.39 ms ± 1.34 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
