In [1]:
version = "v2.0.220705.1"

In [2]:
# Either of the following is no longer
# necessary for matplotlib in notebooks.
# The import statement has you covered!

# %matplotlib notebook
# %matplotlib inline

In [3]:
# Suppress all warnings only when absolutely necessary
# Warnings are in place for a reason!
import warnings

# warnings.filterwarnings('ignore')
# warnings.simplefilter('ignore')

# Assignment 1 Part 1: NumPy Exercises (30 pts)

The [NumPy](https://numpy.org/) library and its n-dimensional arrays in particular are indispensable to applied machine learning in Python, as many popular machine learning libraries, such as [Scikit-Learn](https://scikit-learn.org/stable/), are built on top of NumPy. Having a strong command of NumPy serves as a great stepping stone into the exciting world of applied machine learning. The purpose of this assignment is to help you review, if not learn afresh, some of the common operations and coding patterns associated with `np.ndarray` objects. 

Now, ready for your appetisers?

In [4]:
# this is a NumPy exercise, please use NumPy
# as the only external library in your work!

import numpy as np

In [5]:
np.set_printoptions(precision=3)

## Additional imports can be inlcuded here

### Question 1. Array Creation (5 pts)

Write a function to create an `n` by `d` `np.ndarray` of an integer type with numbers from `0` to `k` (exclusive) filled in. The numbers should be aranged in order and along the rows. For example, with `k=100`, `n=20` and `d=5`, your function should return:

```
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       ...
       [90, 91, 92, 93, 94],
       [95, 96, 97, 98, 99]])
```

**This function should return an integer `np.ndarray` of shape `(n, d)`.**

In [6]:
def create_array(k, n, d):
    """
    This function returns an n by d matrix with numbers from 0 to k (exclusive) filled in.
    """
    assert (
        n * d == k
    ), "Q1: The values of n and d are not compatible with the value of k. "

    array = np.array([i for i in range(0,k)]).reshape((n,d))

    # YOUR CODE HERE
    # raise NotImplementedError()

    return array

In [7]:
# # use this cell to explore your solution
# # remember to comment the function call before submitting the notebook

# create_array(9, 3, 3)

In [8]:
# Autograder tests - sanity checks
n, d = 542, 42
k = n * d
stu_ans = create_array(k, n, d)

assert isinstance(stu_ans, np.ndarray), "Q1: Your function should return a np.ndarray. "
assert stu_ans.shape == (
    n,
    d,
), f"Q1: The shape of your np.ndarray {stu_ans.shape} is not correct. "
assert np.issubdtype(
    stu_ans.dtype, np.integer
), "Q1: Your np.ndarray should be of an integer type. "

# Some hidden tests below


del k, n, d, stu_ans

### Question 2. Row Sum (5 pts)

Complete the function below that returns the sum of all **rows** in a 2-D matrix as a **row** vector. For example, given matrix

```
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       ...
       [90, 91, 92, 93, 94],
       [95, 96, 97, 98, 99]])
```

it should return
```
array([ 950,  970,  990, 1010, 1030])
```

**This function should return a `np.ndarray` of shape `(arr.shape[-1], )`, given the input 2D matrix `arr`.**

In [9]:
def calc_row_sum(arr):
    """
    This function calculates the sum of all rows in a 2D matrix "arr".
    """
    row_sum = None
    row_sum = arr.sum(axis= 0)

    # YOUR CODE HERE
    #raise NotImplementedError()

    return row_sum

In [10]:
# # use this cell to explore your solution
# # remember to comment the function call before submitting the notebook

calc_row_sum(create_array(9, 3, 3))

array([ 9, 12, 15])

In [11]:
# Autograder tests - sanity checks
n, d = 542, 42
k = n * d
stu_ans = calc_row_sum(create_array(k, n, d))

assert isinstance(stu_ans, np.ndarray), "Q2: Your function should return a np.ndarray. "
assert stu_ans.shape == (
    d,
), f"Q2: The shape of your np.ndarray {stu_ans.shape} is not correct. "

# Some hidden tests below using create_array(k, n, d) as input for various k, n and d


del k, n, d, stu_ans

### Question 3. Column Sum (5 pts)

Complete the function below that returns the sum of all **columns** in a 2-D matrix as a **column** vector. For example, given matrix

```
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       ...
       [90, 91, 92, 93, 94],
       [95, 96, 97, 98, 99]])
```

it should return
```
array([[ 10],
       [ 35],
       ...
       [460],
       [485]])
```

**This function should return a `np.ndarray` of shape `(arr.shape[0], 1)`, given the input 2D matrix `arr`.**

In [12]:
def calc_col_sum(arr):
    """
    This function calculates the sum of all columns in a 2D matrix "arr".
    """
    col_sum = None
    col_sum = arr.sum(axis=1)
    col_sum = col_sum.reshape((col_sum.shape[0], 1))

    # YOUR CODE HERE
    #raise NotImplementedError()

    return col_sum

In [13]:
# # use this cell to explore your solution
# # remember to comment the function call before submitting the notebook

# calc_col_sum(create_array(100, 20,5))

In [14]:
# Autograder tests - sanity checks
n, d = 542, 42
k = n * d
stu_ans = calc_col_sum(create_array(k, n, d))

assert isinstance(stu_ans, np.ndarray), "Q3: Your function should return a np.ndarray. "
assert stu_ans.shape == (
    n,
    1,
), f"Q3: The shape of your np.ndarray {stu_ans.shape} is not correct. "

# Some hidden tests below using create_array(k, n, d) as input for various k, n and d


del k, n, d, stu_ans

### Question 4. Sum of Entries from Even Rows and Columns  (5 pts)

Write a function to return the sum of entries with **an even row and column index** in a 2-D matrix as a **single number**. For example, given matrix

```
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9]])
```

it should return `6`, because `6 = 0 + 2 + 4`. 

**This function should return a single number given the input 2D matrix `arr`.**

In [15]:
def calc_even_row_col_sum(arr):
    """
    This function calculates the sum of entries with an even row and column index
    """
    even_row_col_sum = None
    even_row_col_sum = 0

    # YOUR CODE HERE
    for i in range(0,arr.shape[0]):
        for j in range(0,arr.shape[1]):
            if (i %2 == 0) & (j %2 == 0):
                even_row_col_sum = even_row_col_sum + arr[i][j]
    #raise NotImplementedError()

    return even_row_col_sum

In [16]:
# # use this cell to explore your solution
# # remember to comment the function call before submitting the notebook

# calc_even_row_col_sum(create_array(10, 2, 5))

In [17]:
# Autograder tests - sanity checks
n, d = 542, 42
k = n * d
stu_ans = calc_even_row_col_sum(create_array(k, n, d))

assert isinstance(
    stu_ans, np.number
), "Q4: Your function should return a single number. "

# Some hidden tests below using create_array(k, n, d) as input for various k, n and d


del k, n, d, stu_ans

### Question 5. Row Selection  (5 pts)

Write a function to return the `top_n` **rows** of a 2-D matrix in descending order of their sum of entries. For example, given matrix

```
array([[ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [ 0,  1,  2,  3,  4,  5],
       [24, 25, 26, 27, 28, 29]])
```

and `top_n=3`, it should return
```
array([[24, 25, 26, 27, 28, 29],
       [18, 19, 20, 21, 22, 23],
       [12, 13, 14, 15, 16, 17]])
```

**This function should return a `np.ndarray` of shape `(top_n, arr.shape[-1])`, given the input 2D matrix `arr`.**

In [18]:
def select_rows(arr, top_n):
    """
    This function selects the top_n rows that have the largest sum of entries
    """
    sel_rows = None
    #sel_rows = arr[[i for i in range(arr.shape[0]-1,arr.shape[0]-1 - top_n,-1)]].reshape(top_n, arr.shape[-1])
    tmp = np.argsort(arr.sum(axis=1))[-top_n:]
    tmp = tmp[::-1]
    sel_rows = arr[tmp]
    # YOUR CODE HERE
    #raise NotImplementedError()

    return sel_rows

In [19]:
# # use this cell to explore your solution
# # remember to comment the function call before submitting the notebook

# select_rows(create_array(9, 3, 3), 3)

In [34]:
n, d = 542, 42
k = n * d
top_n = 94
stu_ans = create_array(k, n, d)
stu_ans

array([[    0,     1,     2, ...,    39,    40,    41],
       [   42,    43,    44, ...,    81,    82,    83],
       [   84,    85,    86, ...,   123,   124,   125],
       ...,
       [22638, 22639, 22640, ..., 22677, 22678, 22679],
       [22680, 22681, 22682, ..., 22719, 22720, 22721],
       [22722, 22723, 22724, ..., 22761, 22762, 22763]])

In [42]:
np.argsort(stu_ans.sum(axis=0))[::-1][:3]

array([41, 40, 39])

In [20]:
# Autograder tests - sanity checks
n, d = 542, 42
k = n * d
top_n = 94
stu_ans = select_rows(np.random.permutation(create_array(k, n, d)), top_n)

assert isinstance(stu_ans, np.ndarray), "Q5: Your function should return a np.ndarray. "
assert stu_ans.shape == (
    top_n,
    d,
), f"Q5: The shape of your np.ndarray {stu_ans.shape} is not correct. "

# Some hidden tests below using a randomly permuted create_array(k, n, d) as input for various k, n and d


del k, n, d, top_n, stu_ans

### Question 6. Pairwise Cosine Similarity  (5 pts)

Write a function to compute all pairwise cosine similarity of the **row vectors** in a 2-D matrix. For example, given matrix

```
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
```

it should return
```
array([[1.        , 0.91465912, 0.87845859],
       [0.91465912, 1.        , 0.99663684],
       [0.87845859, 0.99663684, 1.        ]])
```

where the `(i, j)` entry of the result is the cosine similarity between the row vector `arr[i]` and the row vector `arr[j]`:  

```
cos_sim[i, j] == CosSim(arr[i], arr[j]).
```



As usual, the cosine similarity between two vectors $x, y$ is defined as:

\begin{equation*}
\mathrm{CosSim}(x, y) = \frac{x^{T}y}{\left\lVert x \right\rVert_{2} \left\lVert y \right\rVert_{2}} = \left(\frac{x}{\left\lVert x \right\rVert_{2}}\right)^{T}\left(\frac{y}{\left\lVert y \right\rVert_{2}}\right)
\end{equation*}

**This function should return a `np.ndarray` of shape `(arr.shape[0], arr.shape[0])`, given the input 2D matrix `arr`.**

In [21]:
def calc_pairwise_cos_sim(arr):
    """
    This function computes all pairwise cosine similarity
    """
    cos_sim = None
    d = arr.shape[0]
    cos_sim = np.zeros(shape=(d, d))
    for i in range(0,d):
        for j in range(0,d):
            cos_sim[i][j] = np.dot(arr[i].T, arr[j])/(np.linalg.norm(arr[i].T)*np.linalg.norm(arr[j]))

    # YOUR CODE HERE
    #raise NotImplementedError()

    return cos_sim

In [22]:
# use this cell to explore your solution
# remember to comment the function call before submitting the notebook

calc_pairwise_cos_sim(create_array(9, 3, 3))

array([[1.   , 0.885, 0.843],
       [0.885, 1.   , 0.996],
       [0.843, 0.996, 1.   ]])

In [23]:
# Autograder tests
n, d = 542, 42
k = n * d
stu_ans = calc_pairwise_cos_sim(create_array(k, n, d))

assert isinstance(stu_ans, np.ndarray), "Q6: Your function should return a np.ndarray. "
assert stu_ans.shape == (
    n,
    n,
), f"Q6: The shape of your np.ndarray {stu_ans.shape} is not correct. "
assert np.isclose(
    np.diag(stu_ans), 1
).all(), "Q6: The diagonal entries of your np.ndarray should all be ones. "

# Some hidden tests below using create_array(k, n, d) as input for various k, n and d


del k, n, d, stu_ans

While this "appetizer" exercise cannot possibly cover all aspects of NumPy, we do hope that it has helped you identify your own areas of improvement and stimulated your "appetite" for more NumPy practicing and learning!