# Python Utilities for MLIR's Sparse Tensors

Before going into actual examples, we'll first go over some useful utilities for working with MLIR's sparse tensors in Python.

Let’s first import them.

In [60]:
import mlir_graphblas
from mlir_graphblas.sparse_utils import MLIRSparseTensor
from mlir_graphblas.cli import GRAPHBLAS_OPT_EXE
from mlir_graphblas.tools import tersify_mlir
from mlir_graphblas.tools.utils import sparsify_array, densify_csr, densify_csc, densify_vector

import tempfile
import numpy as np

The first useful thing to note is that `GRAPHBLAS_OPT_EXE` from `mlir_graphblas.cli` holds the location of the locally used `graphblas-opt`.

## Overview of `tersify_mlir`

When MLIR code is passed through `graphblas-opt` or `mlir-opt`, it can often become more verbose or difficult to read. This is true when using sparse tensors due to [sparse tensor encodings](https://mlir.llvm.org/docs/Dialects/SparseTensorOps/#sparsetensorencodingattr). 

For example, this code is fairly easy to read. 

In [61]:
mlir_text = """
#CSR64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (i,j)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

#CSC64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (j,i)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

func @mat_mul(%argA: tensor<?x?xf64, #CSR64>, %argB: tensor<?x?xf64, #CSC64>) -> tensor<?x?xf64, #CSR64> {
    %answer = graphblas.matrix_multiply %argA, %argB { semiring = "plus_times" } : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>) to tensor<?x?xf64, #CSR64>
    return %answer : tensor<?x?xf64, #CSR64>
}
"""

However, when passing it through `graphblas-opt` or `mlir-opt` with no passes (which will produce behaviorally identical code), the [aliases](https://mlir.llvm.org/docs/LangRef/#attribute-value-aliases) for the [sparse tensor encodings](https://mlir.llvm.org/docs/Dialects/SparseTensorOps/#sparsetensorencodingattr) get expanded and results in very verbose code. 

In [62]:
with tempfile.NamedTemporaryFile() as temp:
    temp_file_name = temp.name
    with open(temp_file_name, 'w') as f:
        f.write(mlir_text)
    temp.flush()

    verbose_mlir = ! cat $temp_file_name | $GRAPHBLAS_OPT_EXE
    verbose_mlir = "\n".join(verbose_mlir)

print(verbose_mlir)

builtin.module  {
  builtin.func @mat_mul(%arg0: tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 64, indexBitWidth = 64 }>>, %arg1: tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)>, pointerBitWidth = 64, indexBitWidth = 64 }>>) -> tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 64, indexBitWidth = 64 }>> {
    %0 = graphblas.matrix_multiply %arg0, %arg1 {semiring = "plus_times"} : (tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 64, indexBitWidth = 64 }>>, tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)>, pointerBitWidth = 6

We can make this resulting code less verbose and more readable using `tersify_mlir` from `mlir_graphblas.tools`.

In [63]:
print(tersify_mlir(verbose_mlir))

#CSR64 = #sparse_tensor.encoding<{
    dimLevelType = [ "dense", "compressed" ],
    dimOrdering = affine_map<(d0, d1) -> (d0, d1)>,
    pointerBitWidth = 64,
    indexBitWidth = 64
}>

#CSC64 = #sparse_tensor.encoding<{
    dimLevelType = [ "dense", "compressed" ],
    dimOrdering = affine_map<(d0, d1) -> (d1, d0)>,
    pointerBitWidth = 64,
    indexBitWidth = 64
}>

builtin.module  {
  builtin.func @mat_mul(%arg0: tensor<?x?xf64, #CSR64>, %arg1: tensor<?x?xf64, #CSC64>) -> tensor<?x?xf64, #CSR64> {
    %0 = graphblas.matrix_multiply %arg0, %arg1 {semiring = "plus_times"} : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>) to tensor<?x?xf64, #CSR64>
    return %0 : tensor<?x?xf64, #CSR64>
  }
}




`tersify_mlir` mostly moves [sparse tensor encodings](https://mlir.llvm.org/docs/Dialects/SparseTensorOps/#sparsetensorencodingattr) commonly used in the GraphBLAS dialect (i.e. the CSR, CSC, and compressed vector encodings) to [aliases](https://mlir.llvm.org/docs/LangRef/#attribute-value-aliases).

`tersify_mlir` is also available as a tool to be used at the command line. 

In [64]:
with tempfile.NamedTemporaryFile() as temp:
    temp_file_name = temp.name
    with open(temp_file_name, 'w') as f:
        f.write(verbose_mlir)
    temp.flush()

    terse_mlir_via_command_line = ! cat $temp_file_name | tersify_mlir 2> /dev/null
    terse_mlir_via_command_line = "\n".join(terse_mlir_via_command_line)

print(terse_mlir_via_command_line)

#CSR64 = #sparse_tensor.encoding<{
    dimLevelType = [ "dense", "compressed" ],
    dimOrdering = affine_map<(d0, d1) -> (d0, d1)>,
    pointerBitWidth = 64,
    indexBitWidth = 64
}>

#CSC64 = #sparse_tensor.encoding<{
    dimLevelType = [ "dense", "compressed" ],
    dimOrdering = affine_map<(d0, d1) -> (d1, d0)>,
    pointerBitWidth = 64,
    indexBitWidth = 64
}>

builtin.module  {
  builtin.func @mat_mul(%arg0: tensor<?x?xf64, #CSR64>, %arg1: tensor<?x?xf64, #CSC64>) -> tensor<?x?xf64, #CSR64> {
    %0 = graphblas.matrix_multiply %arg0, %arg1 {semiring = "plus_times"} : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>) to tensor<?x?xf64, #CSR64>
    return %0 : tensor<?x?xf64, #CSR64>
  }
}




## Overview of `sparsify_array`

Very often when debugging or testing, it is useful to convert a dense tensor represented as an array in [NumPy](https://numpy.org/). 

`sparsify_array` from `mlir_graphblas.tools.utils` let's us do that. 

Let's say we wanted to convert this vector into a `MLIRSparseTensor`.

In [65]:
dense_vector = np.array([0, 0, 12, 0, 0, 34, 0, 0], dtype=np.int32)
dense_vector

array([ 0,  0, 12,  0,  0, 34,  0,  0], dtype=int32)

We would normally have to explicitly pass in the indices, values, shape, etc. into the constructor for `MLIRSparseTensor` as shown below. 

In [66]:
indices = np.array([2, 5], dtype=np.uint64)
values = np.array([12, 34], dtype=np.int32)
sizes = np.array([8], dtype=np.uint64)
sparsity = np.array([True], dtype=np.bool8)
explicitly_generated_sparse_vector = MLIRSparseTensor(indices, values, sizes, sparsity)

In [67]:
explicitly_generated_sparse_vector

<mlir_graphblas.sparse_utils.MLIRSparseTensor at 0x7fa0c05fff40>

In [68]:
explicitly_generated_sparse_vector.shape

(8,)

In [69]:
explicitly_generated_sparse_vector.pointers

(array([0, 2], dtype=uint64),)

In [70]:
explicitly_generated_sparse_vector.indices

(array([2, 5], dtype=uint64),)

In [71]:
explicitly_generated_sparse_vector.values

array([12, 34], dtype=int32)

We can avoid writing such verbose code using `sparsify_array`. We only need to pass in the desired sparsity for each dimension.

In [72]:
sparse_vector = sparsify_array(dense_vector, [True])

In [73]:
sparse_vector

<mlir_graphblas.sparse_utils.MLIRSparseTensor at 0x7fa0c0606a90>

In [74]:
sparse_vector.shape

(8,)

In [75]:
sparse_vector.pointers

(array([0, 2], dtype=uint64),)

In [76]:
sparse_vector.indices

(array([2, 5], dtype=uint64),)

In [77]:
sparse_vector.values

array([12, 34], dtype=int32)

We'll show examples of how to use `sparsify_array` with matrices below. Note that `sparsify_array` works with any ranked tensor (not just vectors and matrices) as long as the appropriate sparsity values are provided. 

## Overview of `densify_*` Utilities

Very often when debugging or testing, it is useful to be able to convert a `MLIRSparseTensor` into a dense tensor represented as an array in [NumPy](https://numpy.org/). 

`densify_vector`, `densify_csr`, and `densify_csc` from `mlir_graphblas.tools.utils` allow us to do this. These functions will treat missing values as zeros. It's worth noting that this isn't necessarily the correct behavior for all applications, so it's always worth sanity checking what the assumed value is for the missing values.

Let's first convert the sparse vectors we created above into dense numpy vectors. 

In [78]:
densify_vector(explicitly_generated_sparse_vector)

array([ 0,  0, 12,  0,  0, 34,  0,  0], dtype=int32)

In [79]:
densify_vector(sparse_vector)

array([ 0,  0, 12,  0,  0, 34,  0,  0], dtype=int32)

We can also convert [CSR](https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_(CSR,_CRS_or_Yale_format)) and CSC matrices into [NumPy](https://numpy.org/) matrices.

Let's first create a [CSR](https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_(CSR,_CRS_or_Yale_format)) matrix via `sparsify_array`.

In [80]:
dense_matrix = np.array(
    [
        [1, 0, 0, 0, 0],
        [0, 2, 3, 0, 0],
        [0, 0, 4, 0, 0],
        [0, 0, 5, 6, 0],
        [0, 0, 0, 0, 0],
    ],
    dtype=np.float64,
)
csr_matrix = sparsify_array(dense_matrix, [False, True])

In [81]:
csr_matrix

<mlir_graphblas.sparse_utils.MLIRSparseTensor at 0x7fa0c0614360>

In [82]:
csr_matrix.shape

(5, 5)

In [83]:
csr_matrix.pointers

(array([], dtype=uint64), array([0, 1, 3, 4, 6, 6], dtype=uint64))

In [84]:
csr_matrix.indices

(array([], dtype=uint64), array([0, 1, 2, 2, 2, 3], dtype=uint64))

In [85]:
csr_matrix.values

array([1., 2., 3., 4., 5., 6.])

Let's now create a dense matrix from this [CSR](https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_(CSR,_CRS_or_Yale_format)) matrix.

In [86]:
round_trip_dense_matrix = densify_csr(csr_matrix)
round_trip_dense_matrix

array([[1., 0., 0., 0., 0.],
       [0., 2., 3., 0., 0.],
       [0., 0., 4., 0., 0.],
       [0., 0., 5., 6., 0.],
       [0., 0., 0., 0., 0.]])

In [87]:
round_trip_dense_matrix.dtype

dtype('float64')

In [88]:
np.all(dense_matrix == round_trip_dense_matrix)

True

As mentioned in the [ops reference](../../ops_reference.ipynb), the only difference between CSR and CSC is the indexing. Since MLIR's sparse tensor data structures do not store the indexing, `MLIRSparseTensor` also does not. `MLIRSparseTensor`'s constructor assumes that the indexing simply uses row oriented indexing. Thus, it's not possible to know whether a matrix with `[False, True]` sparsity uses a CSR or CSC layout. Thus, when converting a CSR or CSC `MLIRSparseTensor` instance into a dense [NumPy](https://numpy.org/) matrix, we must explicitly use `densify_csr` or `densify_csc`.

Since we can't explicitly create a CSC matrix via `MLIRSparseTensor`'s constructor alone, we'll delay showing demonstrations of how to use `densify_csc` until later tutorials using ops from the GraphBLAS dialect that manipulate a sparse tensor's layout, e.g. `graphblas.convert_layout`. 