<a href="https://colab.research.google.com/github/stephenbeckr/randomized-algorithm-class/blob/master/Demos/demo04_FrobeniusNorm_sparse.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Demo 4: calculating the Frobenius norm, looping over rows vs columns, **sparse** matrices

Demonstrates effect of stride length, and row- or column-based storage

This is similar to Demo 3, but now with sparse matrices, not dense matrices

Stephen Becker, Aug 2021, APPM 5650 Randomized Algorithms, University of Colorado Boulder

In [1]:
import numpy as np
import scipy.sparse
import scipy.sparse.linalg
rng = np.random.default_rng(12345)

In [None]:
def FrobeniusNormByRow(A, use_blas = True):
  """ Outer loop over rows (inner loop over columns) """
  if scipy.sparse.issparse(A) and use_blas:
    norm = scipy.sparse.linalg.norm
  else:
    norm = np.linalg.norm
  m,n = A.shape
  nrm = 0.
  if use_blas:
    for row in range(m):
      nrm += norm( A[row,:] )**2  # this is Euclidean norm, not Frobenius
  elif scipy.sparse.issparse(A):
    for row in range(m):
      _,_,v = scipy.sparse.find(A[row,:])
      for vi in v:
        nrm += vi**2
  else:
    for row in range(m):
      for col in range(n):
        nrm += A[row,col]**2
  return np.sqrt(nrm)

def FrobeniusNormByColumn(A, use_blas = True):
  """ Outer loop over columns (inner loop over rows) """
  if scipy.sparse.issparse(A) and use_blas:
    norm = scipy.sparse.linalg.norm
  else:
    norm = np.linalg.norm
  m,n = A.shape
  nrm = 0.
  if use_blas:
    for col in range(n):
      nrm += norm( A[:,col] )**2  # this is Euclidean norm, not Frobenius
  elif scipy.sparse.issparse(A):
    for col in range(n):
      _,_,v = scipy.sparse.find(A[:,col])
      for vi in v:
        nrm += vi**2
  else:
    for col in range(n):
      for row in range(m):
        nrm += A[row,col]**2
  return np.sqrt(nrm)

#### Run some experiments

In [None]:
n   = int(1e4)
m   = n
density   = 0.01

A   = scipy.sparse.random( m, n, density, format='csc') # Compressed Sparse Column

In [None]:
# %time nrm = np.linalg.norm(A) # doesn't work if A is sparse
%time nrm = scipy.sparse.linalg.norm(A) # use this instead
print(f'The true norm is {nrm:.6e}')

CPU times: user 5.96 ms, sys: 52 µs, total: 6.02 ms
Wall time: 6.59 ms
The true norm is 5.769165e+02


In [None]:
%time nrmRow = FrobeniusNormByRow(A, use_blas = True)
print(f'Looping over rows, the discrepancy in the norm is {nrmRow-nrm:.8e}')

CPU times: user 34.8 s, sys: 16.8 ms, total: 34.8 s
Wall time: 34.7 s
Looping over rows, the discrepancy in the norm is 5.68434189e-13


In [None]:
%time nrmRow = FrobeniusNormByColumn(A, use_blas = True)
print(f'Looping over columns, the discrepancy in the norm is {nrmRow-nrm:.8e}')

CPU times: user 2.99 s, sys: 9.95 ms, total: 3 s
Wall time: 3 s
Looping over columns, the discrepancy in the norm is -1.13686838e-13


### Repeat the experiment without using BLAS
Let's make the matrix smaller so we don't have to wait so long

Here there is less difference, because there's already a lot of overhead just due to the `for` loop (since Python isn't compiled)

In [None]:
n   = int(4e3)
m   = n
density   = 0.02

A   = scipy.sparse.random( m, n, density, format='csc') # Compressed Sparse Column

# %time nrm = np.linalg.norm(A) # doesn't work if A is sparse
%time nrm = scipy.sparse.linalg.norm(A) # use this instead
print(f'The true norm is {nrm-n:.6f} + ', n)

CPU times: user 2.58 ms, sys: 24 µs, total: 2.61 ms
Wall time: 2.62 ms
The true norm is -3673.326477 +  4000


In [None]:
%time nrmRow = FrobeniusNormByRow(A, use_blas = True)
print(f'Looping over rows, the discrepancy in the norm is {nrmRow-nrm:.8e}')

%time nrmRow = FrobeniusNormByRow(A, use_blas = False)
print(f'Looping over rows (no BLAS), the discrepancy in the norm is {nrmRow-nrm:.8e}')

CPU times: user 5.48 s, sys: 2.17 ms, total: 5.49 s
Wall time: 5.48 s
Looping over rows, the discrepancy in the norm is 2.27373675e-13
CPU times: user 5.43 s, sys: 3.25 ms, total: 5.43 s
Wall time: 5.43 s
Looping over rows (no BLAS), the discrepancy in the norm is 1.19371180e-12


In [None]:
%time nrmRow = FrobeniusNormByColumn(A, use_blas = True)
print(f'Looping over columns, the discrepancy in the norm is {nrmRow-nrm:.8e}')

%time nrmRow = FrobeniusNormByColumn(A, use_blas = False)
print(f'Looping over columns (no BLAS), the discrepancy in the norm is {nrmRow-nrm:.8e}')

CPU times: user 1.42 s, sys: 49.9 ms, total: 1.47 s
Wall time: 1.4 s
Looping over columns, the discrepancy in the norm is 1.70530257e-13
CPU times: user 1.31 s, sys: 30.7 ms, total: 1.34 s
Wall time: 1.31 s
Looping over columns (no BLAS), the discrepancy in the norm is 6.25277607e-13


## Column vs row access, and tricks

First, let's discuss copies:

In [31]:
n   = int(1e1)
m   = n
density   = 0.01

A   = scipy.sparse.random( m, n, density, format='csc') # Compressed Sparse Column
#print( A.toarray() ) # see it in dense format
# B   = A.T
B   = A.T.copy()  # this *does* make a copy (btw, np.copy(A.T) doesn't work here)

# First lesson: be aware that B = A.T does *not* copy A
# so if you change B, then A will change too.

I,J,vals = scipy.sparse.find(B)
#print( I,J,vals )
if len(I) < 1:
  raise ValueError('Too sparse!! Try again')
elif n <= 1e2:
  i,j = I[0], J[0]
  print(f'\n\nOriginal:')
  print( "Value of A.T(i,j) is", B[i,j], 'and A(j,i) is', A[j,i] )
  B[i,j] = 99
  print(f'\nNow, after update of B:')
  print( "Value of A.T(i,j) is", B[i,j], 'and A(j,i) is', A[j,i] )



Original:
Value of A.T(i,j) is 0.8236178299837936 and A(j,i) is 0.8236178299837936

Now, after update of B:
Value of A.T(i,j) is 99.0 and A(j,i) is 0.8236178299837936


Ok, now look into some row-based operation, such as recording the sparsity of each row.

Since we have a `csc` matrix, we'd expect this operation to be quite slow

In [None]:
n   = int(1e4)
m   = n
density   = 0.01

A   = scipy.sparse.random( m, n, density, format='csc') 

In [44]:
# Let's do something row-based, like find the sparsity of each row
rowNNZs = np.zeros(m)
%time for row in range(m): _,_,vals = scipy.sparse.find(A[row,:]); rowNNZs[row] = len(vals)

CPU times: user 38.4 s, sys: 21.9 ms, total: 38.4 s
Wall time: 38.3 s


For comparison, if we wanted to find the sparsity of each **column**, that'd be faster:

In [45]:
# Same as above, but for columns
colNNZs = np.zeros(n)
%time for col in range(m): _,_,vals = scipy.sparse.find(A[:,col]); colNNZs[col] = len(vals)

CPU times: user 2.55 s, sys: 46.7 ms, total: 2.6 s
Wall time: 2.56 s


So, for the row-based operation, let's do a column-based operation on the transpose of the matrix.  To be fair, we'll include the time it takes to make the transpose.  If we can afford the memory, this can be a very nice trick.  This is especially useful if we can re-use this later (and amortize the cost of the transpose).

In [46]:
%%time
# Try this column-based on the transpose
rowNNZs_ver2 = np.zeros(m)

B   = A.T  # we are including the time it takes to do this
for row in range(m): 
  _,_,vals = scipy.sparse.find(B[:,row])
  rowNNZs_ver2[row] = len(vals)

CPU times: user 38.6 s, sys: 31.3 ms, total: 38.6 s
Wall time: 38.5 s


In [47]:
# And check that we got the same answers
np.linalg.norm( rowNNZs - rowNNZs_ver2)

0.0

So we see that in this example, it was about the same time to make the transpose and access that via columns.  If we can re-use this transpose later on, then we can access its columns in a matter of a few seconds, rather than 30 seconds