# Review: Notebook 10, Exercise 8

Suppose you have a sparse matrix in coordinate (COO) format with the following row indices:

In [None]:
I = [0, 0, 1, 1, 2, 2] # row indices
assert I == sorted(I)

**Exercise:** Calculate the row pointers of a CSR representation. _(Recall that the column indices and nonzero values are the same as coordinate format.)_ For this example, if `R` is the desired output, then

```python
    R = [0, 2, 4, 6]
```

**Approach:** Loop over the indices and look for changes, which indicate where a new row begins. This pattern is similar to one we saw earlier in the semester, when we looked at differences between adjacent elements of a list or array, so let's adapt that approach here.

> In this case, let's produce an output that indicates whether an element and its right-neighbor differ, as well as the value of the right-neighbor.

In [None]:
diffs = [(i != j, j) for i, j in zip(I[:-1], I[1:])]
diffs

Given this information, we know that wherever `diffs[k][0] == True`, then the starting offset of row `diffs[k][1]` must be `k+1` in the row pointer array. The following code builds the row pointers accordingly.

In [None]:
# number of rows, which is most likely given but can also be
# calculated from `I` assuming no trailing empty rows:
n = max(I) + 1

# Initial row pointers (`n+1` of them, by CSR's convention)
R = [0] * (n+1)

# Look for differences and update the row pointers
for k, (d, j) in enumerate(diffs):
    if d:
        R[j] = k+1
R[-1] = len(I) # number of nonzeros (by CSR's convention)

print(R)

This solution looks right. But we're not done yet! The above code does not correctly handle the case of an _empty_ row:

In [None]:
# row indices: note: row 2 is empty!
I = [0, 0, 1, 1, 3, 3]

# Repeat solution code from before:
diffs = [(i != j, j) for i, j in zip(I[:-1], I[1:])]
n = max(I) + 1
R = [0] * (n+1)
for k, (d, j) in enumerate(diffs):
    if d:
        R[j] = k+1
R[-1] = len(I) # number of nonzeros (by CSR's convention)

print(R) # Rats!

The answer _should have been_ ```R = [0, 2, 2, 4, 6]```, but we can see that `R[2]` is set incorrectly to 0. How should we fix it?

We can try to patch up the above algorithm. However, an alternative is to run a postprocessing where we fix-up zeros. Here are two ways.

_Method 1:_ Loop over `R`, look for zeros, and replace them with their preceding value.

In [None]:
R_1 = R.copy() # Make a copy, so we can try other methods
print('Before:', R_1)
for k in range(1, len(R_1)):
    if R_1[k] == 0:
        R_1[k] = R_1[k-1]
print('After:', R_1)

_Method 2:_ Use max-scan:

In [None]:
from itertools import accumulate
R_2 = R.copy()
print('Before:', R_2)
R_2 = list(accumulate(R_2, max)) # Need `list(...)` because `accumulate` returns an iterator object
print('After:', R_2)