# Scipy

### difference between NumPy and SciPy

NumPy and SciPy both are very important libraries in Python. They have a wide range of functions and contrasting operations.

NumPy is short for Numerical Python while SciPy is an abbreviation of Scientific Python. Both are modules of Python and are used to perform various operations with the data.

Description

* NumPy: It provides extended functionalities to Python and provides a user-friendly ambiance. It allows efficient operations on homogeneous data stored in specially designed arrays called NumPy arrays. It also helps manipulate numerical data.

* SciPy: It contains a variety of sub-packages and has a collection of scientific functions, including clustering, image processing, integration, differentiation, gradient optimization, etc. The reason it is preferred over other tools is its speed. All the numerical computing in Python is done via SciPy.

The differences between these libraries are as follows:

**Difference between NumPy and SciPy**
_________________________________________________
| Point of Difference      | NumPy | SciPy  |
| ----------- | ----------- | ----------- |
| Type of operations      | Performs basic operations such as sorting, indexing, etc. It is mostly used when working with data science and statistical concepts.       | Used for complex operations such as algebraic functions, various numerical algorithms, etc. |
|Functions | Contains a variety of functions but these are not defined in depth. | Contains detailed versions of the functions like linear algebra that are completely featured.|
| Arrays | NumPy Arrays are multi-dimensional arrays of objects which are of the same type i.e.  homogeneous. | SciPy does not have any such array concepts as it is more functional. It has no constraints of homogeneity. |
| Base Language of creation and speed | NumPy is written in C and so has a faster computational speed. | SciPy is written in Python and so has a slower execution speed but vast functionality. But there are modules containing low-level functions from BLAS/LAPACK libraries, which uses fortran, and should be fast.| 
| Array Memory | Nonsparse, large memory | Sparse, much less memory | 
|Convenience| ndarray is very easy to define and very visual | sparse matrices are not as visual |
_________________________________________________


Conclusion

NumPy and SciPy are both important Python libraries in terms of convenience and their wide range of functions, modules, and packages. They deal with mathematical computations and are useful in data science, machine learning, deep learning, etc.

Although conceptually different, they have similar functionalities. Their combined functions are necessary and helpful to work on various numerical/mathematical technologies, making our lives a lot more simple.

In this class, we will explore a little more on the Scipy, for dealing with sparse matrices, finding their applications in Numerical Differential Equations, Numerical Integrations, Interpolations, and Image Processing. 

### Difference between numpy ndarray and scipy sparse matrices

How do I create a numpy ndarray? 

* [Array creation](https://numpy.org/doc/stable/user/basics.creation.html)
* [NumPy for MATLAB users](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html)

How do I create a scipy sparse matrix?

* [Sparse matrices](https://docs.scipy.org/doc/scipy/reference/sparse.html)

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import scipy.sparse as sp
import numpy as np
n = 2000
row = np.concatenate((np.arange(n), np.arange(n-1), np.arange(n-1)+1))
col = np.concatenate((np.arange(n), np.arange(n-1)+1, np.arange(n-1)))
data = np.concatenate((2*np.ones(n), -1*np.ones(n-1), -1*np.ones(n-1))) 
X1 = sp.csr_matrix((data, (row, col)), shape=(n, n))
X2 = np.array(X1)
X3 = X1.toarray()
# In this case, both X1 and X2 are sparse, and X3 is dense.
#print(type(X1), type(X2))
#print(X1)
#print(X2)
#print(X3)

How do I check memory and efficiency?

* Check the memory usage of python objects: [sys.getsizeof](https://stackoverflow.com/questions/33978/find-out-how-much-memory-is-being-used-by-an-object-in-python)
* Calculate computation time to micro seconds: [datatime](https://stackoverflow.com/questions/766335/python-speed-testing-time-difference-milliseconds)


In [79]:
import sys
sys.getsizeof(X3) # the size of ndarray increases with n, it scales in O(n^2)

sys.getsizeof(X1) 
# this doesn't give you the correct size of the sparse matrix
sys.getsizeof(col)+sys.getsizeof(row)+sys.getsizeof(data) 
# this is the size of the sparse matrix, it scales in O(n)

480280

An example of finding eigenvalues. 
$$A{\bf v}=\lambda{\bf v}$$
* Numpy functions such as *eig, eigvals, eigvalh, eigvalhs* are not for sparse matrices, and can only be used to find all the eigenvalues of the input matrix. 
* Scipy functions such as *eigs, eigsh* are for sparse matrices, and they are used to find the first a few eigenvalues, not the full list of eigenvalues. 

In [9]:
import numpy.linalg as la
import datetime
a = datetime.datetime.now()
w, v = la.eig(X3)
b = datetime.datetime.now()
c = b-a
c.total_seconds()

5.413639

In [6]:
import datetime
import scipy.sparse.linalg as sla
a = datetime.datetime.now()
w, v = sla.eigs(X1)
b = datetime.datetime.now()
c = b-a
c.total_seconds()

2.173534

In [7]:
a = datetime.datetime.now()
w, v = sla.eigs(X3)
b = datetime.datetime.now()
c = b-a
c.total_seconds()

86.453094

### Basic linear algebra operations

What are the important numerical linear algebra operations and routines and how can we call them?

In [26]:
# For numpy array, @ can be used as dot product for matrix matrix ...
# multiplication or matrix vector multiplication. 
a = np.array([[1, 2,3],[3,4,2],[1,0,1]])
b = np.array([[1,1,1],[-1,-1,-1],[0,0,0.5]])
#print("a = ",a)
#print("b = ",b)
#print("a*b = ",a@b) # this is for matrix multiplication
#print("a*b = ",np.dot(a,b))
c = np.array([1,2,3])
#print("a*c = ", a@c) # @ is the operator for dot product. 

# For scipy sparse matrix, ?
import scipy.sparse as sp
n = 5
row = np.concatenate((np.arange(n), np.arange(n-1), np.arange(n-1)+1))
col = np.concatenate((np.arange(n), np.arange(n-1)+1, np.arange(n-1)))
data = np.concatenate((2*np.ones(n), -1*np.ones(n-1), -1*np.ones(n-1))) 
X1 = sp.csr_matrix((data, (row, col)), shape=(n, n))
x = np.array([1,2,3,2,1])
#print("X1*x", X1@x) # @ can be used for sparse matrix multiply a vector
#print("X1*x", X1*x) # * can be used for sparse matrix multiply a vector
#print("X1*x", np.dot(X1,x)) # numpy.dot does not work on sparse matrix
X2 = X1@X1 # @ can be used for multiplication of sparse matrices
#print("X1*X1 = ", X2.toarray())
X2 = X1*X1 # * can be used for multiplication of sparse matrices
#print("X1*X1 = ", X2.toarray())

An interesting function *numpy.multi_dot*

*multi_dot* chains *numpy.dot* and uses optimal parenthesization of the matrices. Depending on the shapes of the matrices, this can speed up the multiplication a lot.

In [36]:
print("a*b*a = ",np.linalg.multi_dot([a,b,a])) 
# use a list to include all the matrices
print("a*b*a = ",np.linalg.multi_dot((a,b,a))) # may also use a tuple
#print("X1*X1*X1 = ",np.linalg.multi_dot([X1,X1,X1])) 
# numpy.linalg.multi_dot does not work for sparse matrix. 


a*b*a =  [[-3.5 -6.  -4.5]
 [-4.  -6.  -5. ]
 [ 5.5  6.   6.5]]
a*b*a =  [[-3.5 -6.  -4.5]
 [-4.  -6.  -5. ]
 [ 5.5  6.   6.5]]


In [50]:
# Let's do a time cost comparison between dense matrix and sparse matrix
import scipy.sparse as sp
import numpy as np
n = 2000
row = np.concatenate((np.arange(n), np.arange(n-1), np.arange(n-1)+1))
col = np.concatenate((np.arange(n), np.arange(n-1)+1, np.arange(n-1)))
data = np.concatenate((2*np.ones(n), -1*np.ones(n-1), -1*np.ones(n-1))) 
X1 = sp.csr_matrix((data, (row, col)), shape=(n, n))
X2 = X1.toarray()
a = np.ones(n)

In [52]:
import datetime
Ta1 = datetime.datetime.now()
for i in range(100):
    a = X1@a
Tb1 = datetime.datetime.now()
Ta2 = datetime.datetime.now()
for i in range(100):
    a = X2@a
Tb2 = datetime.datetime.now()
print("sparse matrix multiply a vecotor time cost ",(Tb1-Ta1).total_seconds())
print("dense matrix multiply a vecotor time cost ",(Tb2-Ta2).total_seconds())

sparse matrix multiply a vecotor time cost  0.001995
dense matrix multiply a vecotor time cost  0.266077


**Pytorch is a package that uses GPU to parallelize the computations, and is super efficient.**

### Matrix factorizations and iterative methods

* Matrix factorizations: QR, LU, Cholesky, SVD; 
* Iterative methods for sparse matrices: gmres, minres, cg, bicgstab; 
* BLAS, LAPACK libraries in Scipy

* [Numpy.linalg](https://numpy.org/doc/stable/reference/generated/numpy.linalg.qr.html)
* [Scipy.sparse.linalg](https://docs.scipy.org/doc/scipy/reference/sparse.linalg.html)