### Q1)

How many multiplications and additions do you need to perform a matrix multiplication between a (n, k) and (k, m) matrix? Explain.

### Answer)

#### Multiplications
- Given A (n, k) and B (k, m) matrices.
- Row of A & Column of B has k multiplication.
- The row is multiplied by all the `m columns` of B, so `k * m` multiplications for a row.
- All the `n rows of A` are subjected to the above operation, so `n * k * m` multiplications.


#### Additions
- Given A (n, k) and B (k, m) matrices.
- Row of A & Column of B has k - 1 additions.
- The row is multiplied by all the `m columns` of B, so `m * (k - 1)` additions for a row.
- All the `n rows of A` are subjected to the above operation, so `n * m * (k - 1)` additions.


### Q2)

Write Python code to multiply the above two matrices. Solve using list of lists and then use numpy. Compare the timing of both solutions. Which one is faster? Why?

### Answer)

In [17]:
# Define the matrices of n = 5, k = 10, m = 6
from random import randint
from typing import Sequence

A = [[randint(1000, 2000) for _ in range(10)] for _ in range(5)]
B = [[randint(1000, 2000) for _ in range(6)] for _ in range(10)]

In [18]:
from pprint import pprint

def matrix_multiplication(A, B):
    n = len(A)
    k = len(B)
    m = len(B[0])

    C = [[0 for _ in range(m)] for _ in range(n)]

    # row of A
    for r in range(n):
        # column of B
        for c in range(m):
            for i in range(k):
                C[r][c] += (A[r][i] * B[i][c])

    return C


def matrix_multiplication_numpy(A, B):
    import numpy as np

    return np.matmul(A, B)

pprint(matrix_multiplication(A, B))
pprint(list(matrix_multiplication_numpy(A, B)))

[[22575890, 22800707, 23984723, 23744898, 22688090, 19904627],
 [21936713, 21959146, 23264714, 23062893, 21605418, 19063967],
 [22710404, 22992604, 23886690, 23793250, 22738238, 19313036],
 [19801988, 20222719, 21113283, 21226116, 20149875, 16990289],
 [19439942, 20284745, 20981956, 21474734, 19362069, 17123405]]
[array([22575890, 22800707, 23984723, 23744898, 22688090, 19904627]),
 array([21936713, 21959146, 23264714, 23062893, 21605418, 19063967]),
 array([22710404, 22992604, 23886690, 23793250, 22738238, 19313036]),
 array([19801988, 20222719, 21113283, 21226116, 20149875, 16990289]),
 array([19439942, 20284745, 20981956, 21474734, 19362069, 17123405])]


In [19]:
%timeit matrix_multiplication(A, B)

51 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [20]:
%timeit matrix_multiplication_numpy(A, B)

9.84 µs ± 98 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


Comparing the timing of both solutions, the numpy solution is faster. This is because numpy is implemented in C, which is faster than Python.

### Q3)

Finding the highest element in a list requires one pass of the array. Finding the second highest element requires 2 passes of the the array.

1. Using this method, what is the time complexity of finding the median of the array?
1. Can you suggest a better method?
1. Can you implement both these methods in Python and compare against numpy.median routine in terms of time?


### Answer)

a) Using this method, the time complexity of finding the median of the array is O(n^2), because it requires n passes of the array.

b) A better method is to use the `quickselect algorithm`, which is a selection algorithm to find the k-th smallest element in an unordered list. It is related to the quicksort sorting algorithm.


In [23]:
from typing import Sequence
import numpy as np


def quickselect_median(l: Sequence[int], k: int) -> int:
    if len(l) % 2 == 1:
        return quickselect(l, len(l) // 2)
    else:
        return 0.5 * (quickselect(l, len(l) // 2 - 1) + quickselect(l, len(l) // 2))


def quickselect(l: Sequence[int], k: int):
    if len(l) == 1:
        return l[0]

    pivot = l[0]
    left = [x for x in l if x < pivot]
    right = [x for x in l if x > pivot]
    middle = [x for x in l if x == pivot]

    if k < len(left):
        return quickselect(left, k)
    elif k < len(left) + len(middle):
        return middle[0]
    else:
        return quickselect(right, k - len(left) - len(middle))


def median(l: Sequence[int]) -> int:
    return quickselect_median(l, len(l) // 2)


arr = [randint(1000, 2000) for _ in range(100000)]

pprint(median(arr))
pprint(np.median(arr))

1502.0
1502.0


In [24]:
%timeit median(arr)

68.9 ms ± 2.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [25]:
%timeit np.median(arr)

5.08 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
