In [1]:
version = "REPLACE_PACKAGE_VERSION"

# Assignment 1 Part 1: NumPy Exercises (30 pts)

The [NumPy](https://numpy.org/) library and its n-dimensional arrays in particular are indispensable to applied machine learning in Python, as many popular machine learning libraries, such as [Scikit-Learn](https://scikit-learn.org/stable/), are built on top of NumPy. Having a strong command of NumPy serves as a great stepping stone into the exciting world of applied machine learning. The purpose of this assignment is to help you review, if not learn afresh, some of the common operations and coding patterns associated with `np.ndarray` objects. 

Now, ready for your appetisers?

In [52]:
# Since this is a NumPy exercise, please use NumPy as the only external library in your work
import numpy as np

### Question 1. Array Creation (5 pts)

Write a function to create an `n` by `d` `np.ndarray` of an integer type with numbers from `0` to `k` (exclusive) filled in. The numbers should be aranged in order and along the rows. For example, with `k=100`, `n=20` and `d=5`, your function should return:

```
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       ...
       [90, 91, 92, 93, 94],
       [95, 96, 97, 98, 99]])
```

**This function should return an integer `np.ndarray` of shape `(n, d)`.**

In [53]:
def create_array(k, n, d):
    """
    This function returns an n by d matrix with numbers from 0 to k (exclusive) filled in. 
    """
    assert n * d == k, "Q1: The values of n and d are not compatible with the value of k. "
    
    array = np.arange(k).reshape(n, d)
    
    return array

In [54]:
# Autograder tests - sanity checks
n, d = 542, 42
k = n * d
stu_ans = create_array(k, n, d)

assert isinstance(stu_ans, np.ndarray), "Q1: Your function should return a np.ndarray. "
assert stu_ans.shape == (n, d), f"Q1: The shape of your np.ndarray {stu_ans.shape} is not correct. "
assert np.issubdtype(stu_ans.dtype, np.integer), "Q1: Your np.ndarray should be of an integer type. "

# Some hidden tests below


del k, n, d, stu_ans

### Question 2. Row Sum (5 pts)

Complete the function below that returns the sum of all **rows** in a 2-D matrix as a **row** vector. For example, given matrix

```
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       ...
       [90, 91, 92, 93, 94],
       [95, 96, 97, 98, 99]])
```

it should return
```
array([ 950,  970,  990, 1010, 1030])
```

**This function should return a `np.ndarray` of shape `(arr.shape[-1], )`, given the input 2D matrix `arr`.**

In [55]:
def calc_row_sum(arr):
    """
    This function calculates the sum of all rows in a 2D matrix "arr". 
    """
    row_sum = np.sum(arr,axis=0)
    
    return row_sum

In [56]:
# Autograder tests - sanity checks
n, d = 542, 42
k = n * d
stu_ans = calc_row_sum(create_array(k, n, d))

assert isinstance(stu_ans, np.ndarray), "Q2: Your function should return a np.ndarray. "
assert stu_ans.shape == (d, ), f"Q2: The shape of your np.ndarray {stu_ans.shape} is not correct. "

# Some hidden tests below using create_array(k, n, d) as input for various k, n and d


del k, n, d, stu_ans

### Question 3. Column Sum (5 pts)

Complete the function below that returns the sum of all **columns** in a 2-D matrix as a **column** vector. For example, given matrix

```
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       ...
       [90, 91, 92, 93, 94],
       [95, 96, 97, 98, 99]])
```

it should return
```
array([[ 10],
       [ 35],
       ...
       [460],
       [485]])
```

**This function should return a `np.ndarray` of shape `(arr.shape[0], 1)`, given the input 2D matrix `arr`.**

In [57]:
def calc_col_sum(arr):
    """
    This function calculates the sum of all columns in a 2D matrix "arr". 
    """
    col_sum = np.sum(arr,axis=1).reshape(n,1)
    
    return col_sum

In [58]:
# Autograder tests - sanity checks
n, d = 542, 42
k = n * d
stu_ans = calc_col_sum(create_array(k, n, d))

assert isinstance(stu_ans, np.ndarray), "Q3: Your function should return a np.ndarray. "
assert stu_ans.shape == (n, 1), f"Q3: The shape of your np.ndarray {stu_ans.shape} is not correct. "

# Some hidden tests below using create_array(k, n, d) as input for various k, n and d


del k, n, d, stu_ans

### Question 4. Sum of Entries from Even Rows and Columns  (5 pts)

Write a function to return the sum of entries with **an even row and column index** in a 2-D matrix as a **single number**. For example, given matrix

```
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9]])
```

it should return `6`, because `6 = 0 + 2 + 4`. 

**This function should return a single number given the input 2D matrix `arr`.**

In [59]:
def calc_even_row_col_sum(arr):
    """
    This function calculates the sum of entries with an even row and column index
    """
    return arr[::2,::2].sum()

In [60]:
# Autograder tests - sanity checks
n, d = 542, 42
k = n * d
stu_ans = calc_even_row_col_sum(create_array(k, n, d))

assert isinstance(stu_ans, np.number), "Q4: Your function should return a single number. "

# Some hidden tests below using create_array(k, n, d) as input for various k, n and d


del k, n, d, stu_ans

### Question 5. Row Selection  (5 pts)

Write a function to return the `top_n` **rows** of a 2-D matrix in descending order of their sum of entries. For example, given matrix

```
array([[ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [ 0,  1,  2,  3,  4,  5],
       [24, 25, 26, 27, 28, 29]])
```

and `top_n=3`, it should return
```
array([[24, 25, 26, 27, 28, 29],
       [18, 19, 20, 21, 22, 23],
       [12, 13, 14, 15, 16, 17]])
```

**This function should return a `np.ndarray` of shape `(top_n, arr.shape[-1])`, given the input 2D matrix `arr`.**

In [61]:
def select_rows(arr, top_n):
    """
    This function selects the top_n rows that have the largest sum of entries
    """
    sel_rows = arr[np.argsort(-arr.sum(axis=1))[:top_n]]
    
    return sel_rows

In [62]:
# Autograder tests - sanity checks
n, d = 542, 42
k = n * d
top_n = 94
stu_ans = select_rows(np.random.permutation(create_array(k, n, d)), top_n)

assert isinstance(stu_ans, np.ndarray), "Q5: Your function should return a np.ndarray. "
assert stu_ans.shape == (top_n, d), f"Q5: The shape of your np.ndarray {stu_ans.shape} is not correct. "

# Some hidden tests below using a randomly permuted create_array(k, n, d) as input for various k, n and d


del k, n, d, top_n, stu_ans

### Question 6. Pairwise Cosine Similarity  (5 pts)

Write a function to compute all pairwise cosine similarity of the **row vectors** in a 2-D matrix. For example, given matrix

```
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
```

it should return
```
array([[1.        , 0.91465912, 0.87845859],
       [0.91465912, 1.        , 0.99663684],
       [0.87845859, 0.99663684, 1.        ]])
```

where the `(i, j)` entry of the result is the cosine similarity between the row vector `arr[i]` and the row vector `arr[j]`:  

```
cos_sim[i, j] == CosSim(arr[i], arr[j]).
```



As usual, the cosine similarity between two vectors $x, y$ is defined as:

\begin{equation*}
\mathrm{CosSim}(x, y) = \frac{x^{T}y}{\left\lVert x \right\rVert_{2} \left\lVert y \right\rVert_{2}} = \left(\frac{x}{\left\lVert x \right\rVert_{2}}\right)^{T}\left(\frac{y}{\left\lVert y \right\rVert_{2}}\right)
\end{equation*}

**This function should return a `np.ndarray` of shape `(arr.shape[0], arr.shape[0])`, given the input 2D matrix `arr`.**

In [63]:
def calc_pairwise_cos_sim(arr):
    """
    This function computes all pairwise cosine similarity
    """
    rows = arr / np.linalg.norm(arr, 2, axis=1).reshape(-1,1)
    cos_sim = np.dot(rows, rows.T)
    
    return cos_sim

In [64]:
# Autograder tests
n, d = 542, 42
k = n * d
stu_ans = calc_pairwise_cos_sim(create_array(k, n, d))

assert isinstance(stu_ans, np.ndarray), "Q6: Your function should return a np.ndarray. "
assert stu_ans.shape == (n, n), f"Q6: The shape of your np.ndarray {stu_ans.shape} is not correct. "
assert np.isclose(np.diag(stu_ans), 1).all(), "Q6: The diagonal entries of your np.ndarray should all be ones. "

# Some hidden tests below using create_array(k, n, d) as input for various k, n and d


del k, n, d, stu_ans

While this "appetiser" exercise cannot possibly cover all aspects of NumPy, we do hope it has helped you identify your own areas of improvement and stimulated your "appetite" for practising NumPy more!