# graphblas.matrix_multiply_reduce_to_scalar

This example will go over how to compile MLIR code (using the GraphBLAS dialect) to a function callable from Python.

The example MLIR code we’ll use here will demonstrate how the `graphblas.matrix_multiply_reduce_to_scalar` op from the GraphBLAS dialect works. 

Let’s first import some necessary modules and generate an instance of our JIT engine.

In [1]:
import mlir_graphblas
import mlir_graphblas.sparse_utils
import numpy as np

engine = mlir_graphblas.MlirJitEngine()

Here are the passes we'll use. The pass `--graphblas-lower` is necessary to lower the GraphBLAS dialect.

In [2]:
passes = [
    "--graphblas-lower",
    "--sparsification",
    "--sparse-tensor-conversion",
    "--linalg-bufferize",
    "--func-bufferize",
    "--tensor-bufferize",
    "--tensor-constant-bufferize",
    "--finalizing-bufferize",
    "--convert-linalg-to-loops",
    "--convert-scf-to-std",
    "--convert-std-to-llvm",
]

Similar to our examples using the GraphBLAS dialect, we'll need some helper functions to convert sparse tensors to dense tensors. 

We'll also need some helpers to convert our sparse matrices to CSC format. 

In [3]:
mlir_text = """
#trait_densify_csr = {
  indexing_maps = [
    affine_map<(i,j) -> (i,j)>,
    affine_map<(i,j) -> (i,j)>
  ],
  iterator_types = ["parallel", "parallel"]
}

#CSR64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (i,j)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

func @csr_densify4x4(%argA: tensor<4x4xf64, #CSR64>) -> tensor<4x4xf64> {
  %output_storage = constant dense<0.0> : tensor<4x4xf64>
  %0 = linalg.generic #trait_densify_csr
    ins(%argA: tensor<4x4xf64, #CSR64>)
    outs(%output_storage: tensor<4x4xf64>) {
      ^bb(%A: f64, %x: f64):
        linalg.yield %A : f64
    } -> tensor<4x4xf64>
  return %0 : tensor<4x4xf64>
}

#trait_densify_csc = {
  indexing_maps = [
    affine_map<(i,j) -> (j,i)>,
    affine_map<(i,j) -> (i,j)>
  ],
  iterator_types = ["parallel", "parallel"]
}

#CSC64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (j,i)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

func @csc_densify4x4(%argA: tensor<4x4xf64, #CSC64>) -> tensor<4x4xf64> {
  %output_storage = constant dense<0.0> : tensor<4x4xf64>
  %0 = linalg.generic #trait_densify_csc
    ins(%argA: tensor<4x4xf64, #CSC64>)
    outs(%output_storage: tensor<4x4xf64>) {
      ^bb(%A: f64, %x: f64):
        linalg.yield %A : f64
    } -> tensor<4x4xf64>
  return %0 : tensor<4x4xf64>
}

func @convert_csr_to_csc(%sparse_tensor: tensor<?x?xf64, #CSR64>) -> tensor<?x?xf64, #CSC64> {
    %answer = graphblas.convert_layout %sparse_tensor : tensor<?x?xf64, #CSR64> to tensor<?x?xf64, #CSC64>
    return %answer : tensor<?x?xf64, #CSC64>
}
"""

Let's compile our MLIR code. 

In [4]:
engine.add(mlir_text, passes)

['csr_densify4x4', 'csc_densify4x4', 'convert_csr_to_csc']

## Overview of graphblas.matrix_multiply_reduce_to_scalar

Here, we'll show how to use the `graphblas.matrix_multiply_reduce_to_scalar` op. 

`graphblas.matrix_multiply_reduce_to_scalar` is behaviorally equivalent to sequential calls to `graphblas.matrix_multiply` and `graphblas.matrix_reduce_to_scalar`. The. purpose of `graphblas.matrix_multiply_reduce_to_scalar` is to allow the lowering to add additional performance optimizations that wouldn't be available when using `graphblas.matrix_multiply` and `graphblas.matrix_reduce_to_scalar` independently.

Here's an example use of the `graphblas.matrix_multiply_reduce_to_scalar` op:
```
%answer = graphblas.matrix_multiply_reduce_to_scalar %a, %b { semiring = "plus_times", aggregator = "sum" } : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>) to f64
```

The options for the `semiring` and `aggregator` attributes are the same as those for `graphblas.matrix_multiply` and `graphblas.matrix_reduce_to_scalar`, respectively.

Let's create some example sparse input matrices.

In [5]:
indices = np.array(
    [
        [0, 3],
        [1, 3],
        [2, 0],
        [3, 0],
        [3, 1],
    ],
    dtype=np.uint64,
)
values = np.array([1, 2, 3, 4, 5], dtype=np.float64)
sizes = np.array([4, 4], dtype=np.uint64)
sparsity = np.array([False, True], dtype=np.bool8)

A = mlir_graphblas.sparse_utils.MLIRSparseTensor(indices, values, sizes, sparsity)

In [6]:
indices = np.array(
    [
        [0, 1],
        [0, 3],
        [1, 1],
        [1, 3],
        [2, 0],
        [2, 2],
        [3, 0],
        [3, 2],
    ],
    dtype=np.uint64,
)
values = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.float64)
sizes = np.array([4, 4], dtype=np.uint64)
sparsity = np.array([False, True], dtype=np.bool8)

B_csr = mlir_graphblas.sparse_utils.MLIRSparseTensor(indices, values, sizes, sparsity)
B = engine.convert_csr_to_csc(B_csr)

In [7]:
indices = np.array(
    [
        [0, 1],
        [0, 2],
        [1, 1],
        [1, 2],
        [2, 1],
        [2, 2],
        [3, 1],
        [3, 2],
    ],
    dtype=np.uint64,
)
values = np.array([-0.1, 0.2, -0.3, 0.4, -0.5, 0.6, -0.7, 0.8], dtype=np.float64)
sizes = np.array([4, 4], dtype=np.uint64)
sparsity = np.array([False, True], dtype=np.bool8)

mask = mlir_graphblas.sparse_utils.MLIRSparseTensor(indices, values, sizes, sparsity)

In [8]:
A_dense = engine.csr_densify4x4(A)

In [9]:
A_dense

array([[0., 0., 0., 1.],
       [0., 0., 0., 2.],
       [3., 0., 0., 0.],
       [4., 5., 0., 0.]])

In [10]:
B_dense = engine.csc_densify4x4(B)

In [11]:
B_dense

array([[0., 1., 0., 2.],
       [0., 3., 0., 4.],
       [5., 0., 6., 0.],
       [7., 0., 8., 0.]])

In [12]:
mask_dense = engine.csr_densify4x4(mask)

In [13]:
mask_dense

array([[ 0. , -0.1,  0.2,  0. ],
       [ 0. , -0.3,  0.4,  0. ],
       [ 0. , -0.5,  0.6,  0. ],
       [ 0. , -0.7,  0.8,  0. ]])

## graphblas.matrix_multiply_reduce_to_scalar (No Mask)

We'll show how to use `graphblas.matrix_multiply_reduce_to_scalar` without a mask here. 

We'll have code for `graphblas.matrix_multiply_reduce_to_scalar` as well as `graphblas.matrix_multiply` and `graphblas.matrix_reduce_to_scalar` to demonstrate the expected behavior.

In [14]:
mlir_text = """
#CSR64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (i,j)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

#CSC64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (j,i)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

module {
    
    func @matrix_multiply_plus_times(%a: tensor<?x?xf64, #CSR64>, %b: tensor<?x?xf64, #CSC64>) -> tensor<?x?xf64, #CSR64> {
        %answer = graphblas.matrix_multiply %a, %b { semiring = "plus_times" } : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>) to tensor<?x?xf64, #CSR64>
        return %answer : tensor<?x?xf64, #CSR64>
    }
    
    func @matrix_sum(%sparse_tensor: tensor<?x?xf64, #CSR64>) -> f64 {
        %answer = graphblas.matrix_reduce_to_scalar %sparse_tensor { aggregator = "sum" } : tensor<?x?xf64, #CSR64> to f64
        return %answer : f64
    }
    
    func @matrix_multiply_plus_times_sum(%a: tensor<?x?xf64, #CSR64>, %b: tensor<?x?xf64, #CSC64>) -> f64 {
        %answer = graphblas.matrix_multiply_reduce_to_scalar %a, %b { semiring = "plus_times", aggregator = "sum" } : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>) to f64
        return %answer : f64
    }

}
"""

In [15]:
engine.add(mlir_text, passes)

['matrix_multiply_plus_times', 'matrix_sum', 'matrix_multiply_plus_times_sum']

Let's first get some results from `graphblas.matrix_multiply` and `graphblas.matrix_reduce_to_scalar`.

In [16]:
matmul_result = engine.matrix_multiply_plus_times(A, B)

In [17]:
engine.csr_densify4x4(matmul_result)

array([[ 7.,  0.,  8.,  0.],
       [14.,  0., 16.,  0.],
       [ 0.,  3.,  0.,  6.],
       [ 0., 19.,  0., 28.]])

In [18]:
reduction_from_sequential = engine.matrix_sum(matmul_result)

In [19]:
reduction_from_sequential

101.0

Let's verify that our use of `graphblas.matrix_multiply_reduce_to_scalar` gets the same result. 

In [20]:
reduction_from_combined = engine.matrix_multiply_plus_times_sum(A, B)

In [21]:
reduction_from_combined

101.0

In [22]:
reduction_from_combined == reduction_from_sequential

True

## graphblas.matrix_multiply_reduce_to_scalar (With Mask)

`graphblas.matrix_multiply_reduce_to_scalar` also takes an optional mask.

In [23]:
mlir_text = """
#CSR64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (i,j)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

#CSC64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (j,i)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

module {
    
    func @matrix_multiply_plus_times_mask(%a: tensor<?x?xf64, #CSR64>, %b: tensor<?x?xf64, #CSC64>, %m: tensor<?x?xf64, #CSR64>) -> tensor<?x?xf64, #CSR64> {
        %answer = graphblas.matrix_multiply %a, %b, %m { semiring = "plus_times" } : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>, tensor<?x?xf64, #CSR64>) to tensor<?x?xf64, #CSR64>
        return %answer : tensor<?x?xf64, #CSR64>
    }
    
    func @matrix_multiply_plus_times_sum_mask(%a: tensor<?x?xf64, #CSR64>, %b: tensor<?x?xf64, #CSC64>, %m: tensor<?x?xf64, #CSR64>) -> f64 {
        %answer = graphblas.matrix_multiply_reduce_to_scalar %a, %b, %m { semiring = "plus_times", aggregator = "sum" } : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>, tensor<?x?xf64, #CSR64>) to f64
        return %answer : f64
    }

}
"""

In [24]:
engine.add(mlir_text, passes)

['matrix_multiply_plus_times_mask', 'matrix_multiply_plus_times_sum_mask']

Let's verify that we get the same result.

In [25]:
matmul_result = engine.matrix_multiply_plus_times_mask(A, B, mask)

In [26]:
engine.csr_densify4x4(matmul_result)

array([[ 0.,  0.,  8.,  0.],
       [ 0.,  0., 16.,  0.],
       [ 0.,  3.,  0.,  0.],
       [ 0., 19.,  0.,  0.]])

In [27]:
reduction_from_sequential = engine.matrix_sum(matmul_result)

In [28]:
reduction_from_sequential

46.0

In [29]:
reduction_from_combined = engine.matrix_multiply_plus_times_sum_mask(A, B, mask)

In [30]:
reduction_from_combined

46.0

In [31]:
reduction_from_combined == reduction_from_sequential

True