# Parallelisation of Matrix Operations

In [None]:
import os
user = os.getenv('USER')
os.chdir(f'/scratch/cd82/{user}/notebooks')

### The reason why matrix operations are important

In regression, we often use the following computation to solve for ${\beta}$:
<div style="border: 0px solid black; padding: 2px; margin: 0;">
$$
\boldsymbol{\beta} = (\mathbf{X}^T \mathbf{X})^{-1}\mathbf{X}^T \mathbf{y}
$$
</div>

For *large* datasets $X$, or *many* smaller ones, these operation can take a long time.

## BLAS and LAPACK libraries
The underlying libraries that are used by the Python NumPy library (and many other data analysis packages) are based on the BLAS and LAPACK software library interfaces. These interfaces are designed to provide a generic interface to linear algebra functions and alow hardware vendors to provide optimised versions of the libraries that perform well on their hardware.  
  
BLAS libraries provide various fundamental vector-vector, vector-matrix and matrix-matrix operations and
LAPACK provides various matrix decompostion, factorisation and solver routines for various matrix types. The library specifies its vector and matrix operations using BLAS functions.

In [None]:
import numpy as np
np.show_config()

### Controlling the number of threads
The BLAS libraries are controlled using environment variables that are sent in the shell being used to run Python.

e.g.  
- **OMP_NUM_THREADS**  The 'generic' variable used to set the number of threads.
- **OPENBLAS_NUM_THREADS** 
- **MKL_NUM_THREADS**  Used for setting Intel's MKL library
  
**OMP_NUM_THREADS** should typically be available. It is the generic variable used by the OpenMP library, which is the library that is typically used for parallelisation.
  
Please note, the variables controlling the number of threads to use need to be set in the environment before the code is run.

In [None]:
import os
user = os.getenv('USER')
os.chdir(f'/scratch/cd82/{user}')

In [None]:
# The following script will be saved to the filesystem and run in a seperate process, 
# alowing us to change the environment to change the number of threads used.
script = """
import numpy as np
import time
import os

ob_nthreads = os.getenv('OPENBLAS_NUM_THREADS') 
mkl_nthreads = os.getenv('MKL_NUM_THREADS')
omp_nthreads = os.getenv('OMP_NUM_THREADS') 

print(f"Threads: OpenBLAS: {ob_nthreads} MKL: {mkl_nthreads}  OMP: {omp_nthreads} ")

threads = {ob_nthreads, mkl_nthreads, omp_nthreads}
# select a value that is not 'None'
nthreads = [num for num in threads if num is not None]

nthreads=max(nthreads)

# np.show_config()
N=4000

# Create two numpy arrays
array1 = np.random.rand(N, N)
array2 = np.random.rand(N, N)

# Start the timer
start_time = time.time()

# Multiply the arrays
result = np.dot(array1, array2)

# Stop the timer
end_time = time.time()

# Calculate the elapsed time
elapsed_time = end_time - start_time

print(f"Time taken to multiply the arrays on {nthreads} cores: {elapsed_time} seconds")
"""
with open("matrixmult.py", "w") as file:
    # Write the string to the file
    file.write(script)

### Timing of larger problems
Sending larger computations to the supercomputer queue
The environemnt variable NCPUS is available on NCI machines that are running the PBS batch system.

In [None]:
!echo "Number of cores allocated: $NCPUS"

In [None]:
import subprocess
import os
os.environ['OPENBLAS_NUM_THREADS'] = '1'  # Set the number of threads to 1
subprocess.run(['python', 'matrixmult.py'])

In [None]:
os.environ['OPENBLAS_NUM_THREADS'] = '2'  # Set the number of threads to 2
subprocess.run(['python', 'matrixmult.py'])

In [None]:
os.environ['OPENBLAS_NUM_THREADS'] = '4'  # Set the number of threads to 4
subprocess.run(['python', 'matrixmult.py'])