# Using the Python-Numpy Frontend in DaCe

In this tutorial, we will see how one can write code using the python-numpy frontend. The frontend supports a subset of the python language and array/matrix operations inspired by the numpy module.

Let's start with a first example that will showcase the basic elements of the DaCe program. First, we import the `dace` and `numpy` modules:

In [1]:
import dace
import numpy as np

Then, we declare the program parameters, which can be either symbols or constants:

In [2]:
M, N, K = 24, 24, 24

We proceed by writing the DaCe program as a regular python method, that is annotated with the `dace.program` annotation. The parameters of the python method must have type annotations. For example, below we define a `gemm` method, for implementing the generalized matrix-matrix multiplication operation. The first 3 parameters are the 32-bit floating-point matrices `A`, `B` and `C`. The last 2 parameters are the 32-bit floating-point scalar values `alpha` and `beta`. The implementation of the method is written the same way as in python, using the numpy module.

In [3]:
@dace.program
def gemm(A: dace.float32[M, K], B: dace.float32[K, N], C: dace.float32[M, N],
         alpha: dace.float32, beta: dace.float32):
    C[:] = alpha * A @ B + beta * C

The `[:]` slice expression is representing the whole range of the array/matrix. Note that in DaCe you are not allowed to redefine data in the same DaCe program. Therefore, if you define an array `C`, you may not assign to it a different value. For example, if we changed the implementation of the `gemm` method to `C = alpha * A @ B + beta * C`, we would get an error.

The DaCe program may be parsed to an SDFG and/or compiled using the the same methods from the SDFG API:

In [4]:
sdfg = gemm.to_sdfg()

Applied 5 StateFusion, 1 RedundantArray.


In [5]:
sdfg

## Supported Python/Numpy operators

The frontend supports the unary operators `{+, -, not, ~}`. Note that the `not` operator works the same way as the `numpy.logical_not` method.

In [6]:
@dace.program
def uadd(A: dace.int64[5, 5], B: dace.int64[5, 5]):
    B[:] = +A

@dace.program
def usub(A: dace.int64[5, 5], B: dace.int64[5, 5]):
    B[:] = -A

@dace.program
def logicalnot(A: dace.bool[5, 5], B: dace.bool[5, 5]):
    B[:] = not A

@dace.program
def invert(A: dace.int64[5, 5], B: dace.int64[5, 5]):
    B[:] = ~A

The frontend support the binary operators `{+, -, *, /, //, %, **, @, <<, >>, |, ^, &, and, or, ==, !=, <, <=, >, >=}`. Note that the boolean operators `{and, or}` work the same way as the methods `numpy.logical_and` and `numpy.logical_or`. Apart from the matrix-multiplication operator `@`, all the other operators are point-wise. Note that the return type of the operators is the one returned by [`numpy.result_type`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html). Exception to that are the boolean operators, which return boolean values.

In [7]:
@dace.program
def add(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.int64[5, 5]):
    C[:] = A + B

@dace.program
def sub(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.int64[5, 5]):
    C[:] = A - B

@dace.program
def mult(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.int64[5, 5]):
    C[:] = A * B

@dace.program
def div(A: dace.float64[5, 5], B: dace.float64[5, 5], C: dace.float64[5, 5]):
   C[:] = A / B

@dace.program
def floordiv(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.int64[5, 5]):
    C[:] = A // B

@dace.program
def mod(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.int64[5, 5]):
    C[:] = A % B

@dace.program
def pow(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.int64[5, 5]):
    C[:] = A ** B

@dace.program
def matmult(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.int64[5, 5]):
    C[:] = A @ B

@dace.program
def lshift(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.int64[5, 5]):
    C[:] = A << B

@dace.program
def rshift(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.int64[5, 5]):
    C[:] = A >> B

@dace.program
def bitor(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.int64[5, 5]):
    C[:] = A | B

@dace.program
def bitxor(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.int64[5, 5]):
    C[:] = A ^ B

@dace.program
def bitand(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.int64[5, 5]):
    C[:] = A & B

@dace.program
def logicaland(A: dace.bool[5, 5], B: dace.bool[5, 5], C: dace.bool[5, 5]):
    C[:] = A and B

@dace.program
def logicalor(A: dace.bool[5, 5], B: dace.bool[5, 5], C: dace.bool[5, 5]):
    C[:] = A or B

@dace.program
def eq(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.bool[5, 5]):
    C[:] = A == B

@dace.program
def noteq(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.bool[5, 5]):
    C[:] = A != B

@dace.program
def lt(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.bool[5, 5]):
    C[:] = A < B

@dace.program
def lte(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.bool[5, 5]):
    C[:] = A <= B

@dace.program
def gt(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.bool[5, 5]):
    C[:] = A > B

@dace.program
def gte(A: dace.int64[5, 5], B: dace.int64[5, 5], C: dace.bool[5, 5]):
    C[:] = A >= B

Python augmented assignments are supported for all operators. Some examples follow:

In [8]:
@dace.program
def augadd(A: dace.int64[5, 5], B: dace.int64[5, 5]):
    B += A

@dace.program
def augsub(A: dace.int64[5, 5], B: dace.int64[5, 5]):
    B -= A

@dace.program
def augmult(A: dace.int64[5, 5], B: dace.int64[5, 5]):
    B *= A

@dace.program
def augdiv(A: dace.float64[5, 5], B: dace.float64[5, 5]):
    B /= A

@dace.program
def augfloordiv(A: dace.int64[5, 5], B: dace.int64[5, 5]):
    B //= A

@dace.program
def augmod(A: dace.int64[5, 5], B: dace.int64[5, 5]):
    B %= A

@dace.program
def augpow(A: dace.int64[5, 5], B: dace.int64[5, 5]):
    B **= A

@dace.program
def auglshift(A: dace.int64[5, 5], B: dace.int64[5, 5]):
    B <<= A

@dace.program
def augrshift(A: dace.int64[5, 5], B: dace.int64[5, 5]):
    B >>= A

@dace.program
def augbitor(A: dace.int64[5, 5], B: dace.int64[5, 5]):
    B |= A

@dace.program
def augbitxor(A: dace.int64[5, 5], B: dace.int64[5, 5]):
    B ^= A

@dace.program
def augbitand(A: dace.int64[5, 5], B: dace.int64[5, 5]):
    B &= A

Operations between arrays/matrices and scalars or arrays of size 1 behave the same way as with numpy:

In [9]:
@dace.program
def addscalar(A: dace.int64[5, 5], B: dace.int64, C: dace.int64[5, 5]):
    C[:] = A + B

@dace.program
def subscalar(A: dace.int64[5, 5], B: dace.int64, C: dace.int64[5, 5]):
    C[:] = A - B

In [10]:
@dace.program
def multarrayscalar(A: dace.int64[5, 5], B: dace.int64[1], C: dace.int64[5, 5]):
    C[:] = A * B

@dace.program
def divparrayscalar(A: dace.float64[5, 5], B: dace.float64[1], C: dace.float64[5, 5]):
    C[:] = A / B

In [11]:
@dace.program
def floordivnumber(A: dace.int64[5, 5], C: dace.int64[5, 5]):
    C[:] = A // 5

@dace.program
def modlnumber(A: dace.int64[5, 5], C: dace.int64[5, 5]):
    C[:] = A % 5

## Defining Data, Maps and Sequential-Loops

Transient arrays can be defined with `dace.define_local` or just `numpy.ndarray`. Furthermore, transient scalars can be defined with `dace.define_local_scalar`.

In [12]:
@dace.program
def transient(A: dace.float32[M, N, K]):
    s = np.ndarray(shape=(M, N, K), dtype=np.int32)
    t = dace.define_local(A.shape, A.dtype)
    s[:] = A
    t[:] = A

Python for-loops are automatically converted to control-flow:

In [19]:
N, BS = (dace.symbol(name) for name in ['N', 'BS'])

@dace.program
def forloop(HD: dace.complex128[N, BS, BS], HE: dace.complex128[N, BS, BS],
             HF: dace.complex128[N, BS, BS],
             sigmaRSD: dace.complex128[N, BS, BS],
             sigmaRSE: dace.complex128[N, BS, BS],
             sigmaRSF: dace.complex128[N, BS, BS]):

    for n in range(N):
        if n < N - 1:
            HE[n] -= sigmaRSE[n]
        else:
            HE[n] = -sigmaRSE[n]
        if n > 0:
            HF[n] -= sigmaRSF[n]
        else:
            HF[n] = -sigmaRSF[n]
        HD[n] = HD[n] - sigmaRSD[n]
        
forloop.to_sdfg()

Applied 16 StateFusion.


Maps (parallel for-loops) can be created with `dace.map`:

In [14]:
Nkz, NE, Nqz, Nw, N3D, NA, NB, Norb = (
    dace.symbol(name)
    for name in ['Nkz', 'NE', 'Nqz', 'Nw',
                 'N3D', 'NA', 'NB', 'Norb'])

@dace.program
def maptest(neigh_idx: dace.int32[NA, NB],
            dH: dace.complex128[NA, NB, N3D, Norb, Norb],
            G: dace.complex128[Nkz, NE, NA, Norb, Norb],
            D: dace.complex128[Nqz, Nw, NA, NB, N3D, N3D],
            Sigma: dace.complex128[Nkz, NE, NA, Norb, Norb]):

    for k, E, q, w, i, j, a, b in dace.map[0:Nkz, 0:NE,
                                           0:Nqz, 0:Nw,
                                           0:N3D, 0:N3D,
                                           0:NA, 0:NB]:
        dHG = G[k-q, E-w, neigh_idx[a, b]] @ dH[a, b, i]
        dHD = dH[a, b, j] * D[q, w, a, b, i, j]
        Sigma[k, E, a] += dHG @ dHD
        
maptest.to_sdfg()

Applied 5 StateFusion, 1 InlineSDFG.


## Combining explicit dataflow with numpy

The python-numpy syntax can be used in combination with the explicit dataflow syntax:

In [15]:
N = dace.symbol('N')

@dace.program
def slicetest(A: dace.float64[N, N - 1], B: dace.float64[N - 1, N],
              C: dace.float64[N - 1, N - 1]):
    tmp = A[1:N] * B[:, 0:N - 1]
    for i, j in dace.map[0:4, 0:4]:
        with dace.tasklet:
            t << tmp[i, j]
            c >> C[i, j]
            c = t

In [16]:
@dace.program
def saoptest(A: dace.float64[5, 5], alpha: dace.float64,
             B: dace.float64[5, 5]):
    tmp = alpha * A * 5
    for i, j in dace.map[0:5, 0:5]:
        with dace.tasklet:
            t << tmp[i, j]
            c >> B[i, j]
            c = t

## Other operations and methods

Reductions can be defined with `dace.reduce`:

In [17]:
W = dace.symbol('W')
H = dace.symbol('H')

@dace.program(dace.float32[H, W], dace.float32[H, W], dace.float32[1])
def mapreduce_test(A, B, sum):
    tmp = dace.define_local([H, W], dace.float32)

    @dace.map(_[0:H, 0:W])
    def compute_tile(i, j):
        a << A[i, j]
        b >> B[i, j]
        t >> tmp[i, j]

        b = a * 5
        t = a * 5

    sum[:] = dace.reduce(lambda a, b: a + b, tmp)

The frontend support the basic math methods `{exp, sin, cos, sqrt, log, conj, real, imag}`:

In [18]:
M, N = 24, 24

@dace.program
def exponent(A: dace.complex64[M, N], B: dace.complex64[M, N]):
    B[:] = exp(A)

@dace.program
def sine(A: dace.complex64[M, N], B: dace.complex64[M, N]):
    B[:] = sin(A)

@dace.program
def cosine(A: dace.complex64[M, N], B: dace.complex64[M, N]):
    B[:] = cos(A)

@dace.program
def square_root(A: dace.complex64[M, N], B: dace.complex64[M, N]):
    B[:] = sqrt(A)

@dace.program
def logarithm(A: dace.complex64[M, N], B: dace.complex64[M, N]):
    B[:] = log(A)

@dace.program
def conjugate(A: dace.complex64[M, N], B: dace.complex64[M, N]):
    B[:] = conj(A)

@dace.program
def real_part(A: dace.complex64[M, N], B: dace.float32[M, N]):
    B[:] = real(A)

@dace.program
def imag_part(A: dace.complex64[M, N], B: dace.float32[M, N]):
    B[:] = imag(A)