## Example of parallelizing matrix-vector multiplication

In [1]:
import concurrent.futures
import numpy as np

def multiply_matrix_vector(matrix, vector):
    # Get the number of rows in the matrix
    rows = matrix.shape[0]
    # Create a thread pool with the number of available processors
    with concurrent.futures.ThreadPoolExecutor() as executor:
        # Create a list of futures for each row of the matrix
        futures = [executor.submit(np.dot, matrix[i,:], vector) for i in range(rows)]
        # Wait for all the futures to complete and get the results
        results = [future.result() for future in futures]
    # Return the results as a numpy array
    return np.array(results)

In [2]:
    matrix = np.random.rand(1000, 1000)
    vector = np.random.rand(1000)
    result = multiply_matrix_vector(matrix, vector)
    print(result)

[249.8526153  259.35487218 254.5245441  253.69709141 265.01940892
 253.23445376 258.74167325 249.03551504 252.99711162 261.67590519
 256.51700162 258.07346123 258.02746484 255.38435289 260.66988962
 249.90839551 264.77672491 253.93436994 268.66567771 258.60440363
 249.80757614 257.85182118 255.00254254 255.56849154 250.99422155
 262.39030345 261.27061571 257.10607784 253.97548307 267.27195879
 255.8620838  247.06866571 260.2220469  249.20174919 262.28535395
 260.84217901 259.57077476 260.28707055 252.49844459 252.5930867
 259.80870176 255.31371612 256.82357797 256.46848335 258.71911592
 256.06608234 265.24955066 263.51389502 258.35783552 257.26278209
 261.01984237 255.44841158 250.8018411  254.7821147  252.47053843
 250.84934666 256.41474876 246.21644565 251.94248579 255.69799438
 253.61843085 264.13402882 253.56444854 253.02959728 262.77617031
 260.93004583 248.02311491 259.86677916 253.65057997 252.76062709
 254.96969134 255.75147897 262.9327668  262.1754937  260.23085852
 254.448522

### Algorithm

The algorithm used here is simple matrix-vector multiplication, where each element of the resulting vector is the dot product of a row of the matrix and the input vector. The algorithm is parallelized by breaking the matrix into rows, and computing the dot product of each row and the vector concurrently using a thread pool.

This algorithm is suitable for large matrices and vectors where the number of rows is larger than the number of available processors and dot product of a single row and the vector is small, so that dividing the work between the processors can significantly speed up the computation.

### Complexity and Efficiency

The time complexity of this algorithm is O(n^2) where n is the number of rows in the matrix. However, the efficiency of the algorithm will depend on the number of processors available, and the amount of time it takes to compute the dot product of a single row and the vector. In general, the efficiency will be higher when the number of rows in the matrix is large and the time to compute the dot product of a single row and the vector is small.