Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (click the restart button in the tool bar or select Kernel$\rightarrow$Restart in the menu bar) and then **run all cells** (in the menu bar, select Cell$\rightarrow$Run All).

**Make sure you delete `raise NotImplementedError()`** (if existing) and fill in any place that says "YOUR CODE HERE" or "YOUR ANSWER HERE", as well as your name below:

In [63]:
NAME = "Stephen Shell"

**Don't modify the formal TEST cells!**

---

# NumPy continue...

In [64]:
# always import this first
import numpy as np

### 1. Create an array with all 0 inside and 1 on the border (2 points)
Write a function to create a 2D array. This array should have all zero values, except for the elements around the border (i.e., the first and last rows, and the first and last columns), which should have a value of one.
</div>

In [65]:
def border(n, m):
    """Creates an array with shape (n, m) that is all zeros
    except for the border (i.e., the first and last rows and
    columns), which should be filled with ones.

    Hint: you should be able to do this in three lines
    (including the return statement)

    Parameters
    ----------
    n, m: int
        Number of rows and number of columns

    Returns
    -------
    numpy array with shape (n, m)

    """
    arr = a = np.ones(shape=(n, m))
    arr[1:-1,1:-1] = 0
    return arr     
    # raise NotImplementedError()

In [66]:
from numpy.testing import assert_array_equal
from nose.tools import assert_equal

# check a few small examples explicitly
assert_array_equal(border(1, 1), [[1]])
assert_array_equal(border(2, 2), [[1, 1], [1, 1]])
assert_array_equal(border(3, 3), [[1, 1, 1], [1, 0, 1], [1, 1, 1]])
assert_array_equal(border(3, 4), [[1, 1, 1, 1], [1, 0, 0, 1], [1, 1, 1, 1]])

# check a few large and random examples
for i in range(10):
    n, m = np.random.randint(2, 1000, 2)
    result = border(n, m)

    # check dtype and array shape
    assert_equal(result.dtype, float)
    assert_equal(result.shape, (n, m))

    # check the borders
    assert (result[0] == 1).all()
    assert (result[-1] == 1).all()
    assert (result[:, 0] == 1).all()
    assert (result[:, -1] == 1).all()

    # check that everything else is zero
    assert np.sum(result) == (2*n + 2*m - 4)

print("Success!")

Success!


### 2. Subtract the row mean from rows of a matrix (2 points)

Write a function `array_minus_row_mean` which takes in matrix $F$, then subtracts the mean of each row of the matrix from the respective row. Do this without using a loop (that is, using array operations). For example, the input matrix is

    array([[  0.,   1.,   2.,   3.],
           [  4.,   5.,   6.,   7.],
           [  8.,   9.,  10.,  11.]])

The output should be:

    array([[-1.5, -0.5,  0.5,  1.5],
           [-1.5, -0.5,  0.5,  1.5],
           [-1.5, -0.5,  0.5,  1.5]])

In [67]:
def array_minus_row_mean(F):
    """Returns input array with the mean of each row subtracted from the respective row.
    
    Does not use a loop but instead uses array operations.
    
    Parameters
    ----------
    F : numpy array
        
    Returns
    -------
    F with mean of each row subtracted from the respective row
    """
    # YOUR CODE HERE
    F = F - F.mean(axis = 1, keepdims = True)
    return F
    # raise NotImplementedError()

In [68]:
"""(1 point) Test code for the previous function. This cell should NOT give any errors when it is run."""

G = np.ones((2, 3))
assert np.array_equal(array_minus_row_mean(G), np.zeros((2, 3)))

print("This test worked! Try the next cell too.")

This test worked! Try the next cell too.


In [69]:
"""(1 point) Test code for the previous function. This cell should NOT give any errors when it is run."""

H = np.array([[1., 2.], [3., 4.]])
Hminus = np.array([[-0.5, 0.5], [-0.5, 0.5]])
assert np.array_equal(array_minus_row_mean(H), Hminus)

print("Success!")

Success!


### 3. Modify the array based on index (5 points)

Write a function, <code>threshold</code>, which takes an array and returns a new array with values thresholded by the mean of the original array. The new array will have 1 where values in the original array are greater than the mean, 0 where they are equal to the mean, and -1 where they are less than the mean.


In [70]:
def threshold(arr):
    """Computes the mean of the given array, and returns a new array which
    is 1 where values in the original array are greater than the mean, 0 where
    they are equal to the mean, and -1 where they are less than the mean.

    Remember that if you want to create a copy of an array, you need to use
    `arr.copy()`.
    
    Hint: your solution should use boolean indexing.
    
    Parameters
    ----------
    arr : numpy.ndarray

    Returns
    -------
    new_arr : thresholded version of `arr`
    
    """
    # YOUR CODE HERE
    arr2 = arr.copy()
    
    mean = arr2.mean()
    
    arr2[arr == mean] = 0
    
    arr2[arr < mean] = -1
    
    arr2[arr > mean] = 1
    
    return arr2
    # raise NotImplementedError()

In [71]:
"""Try a few obvious threshold cases."""
from numpy.testing import assert_array_equal
assert_array_equal(threshold(np.array([1, 2, 1, 1])), np.array([-1, 1, -1, -1]))
assert_array_equal(threshold(np.array([1, 0, 1, 0])), np.array([1, -1, 1, -1]))
assert_array_equal(threshold(np.array([1, 0.5, 0, 0.5])), np.array([1, 0, -1, 0]))
assert_array_equal(
    threshold(np.array([[0.5, 0.2, -0.3, 0.1], [1.7, -3.8, 0.5, 0.6]])), 
    np.array([[1, 1, -1, 1], [1, -1, 1, 1]]))
print("These worked! There are a few more in the next cell...")

These worked! There are a few more in the next cell...


In [72]:
"""Make sure a copy of the array is being returned, and that the original array is unmodified."""
x = np.array([[0.5, 0.2, -0.3, 0.1], [1.7, -3.8, 0.5, 0.6]])
y = threshold(x)
assert_array_equal(x, np.array([[0.5, 0.2, -0.3, 0.1], [1.7, -3.8, 0.5, 0.6]]))
assert_array_equal(y, np.array([[1, 1, -1, 1], [1, -1, 1, 1]]))
print("Success!")

Success!


### 4. Multiply arrays (3 points)

Write function `multiply_arrays` which takes in two arrays, $O$ and $P$, of shape (5,5,3) and (5,5), respectively, and returns $O$ times $P$, and $P$ times the transpose of $O$, respectively. Use the operator `*` for multiplication.

In [73]:
def multiply_arrays(O, P):
    """Returns O times P and P times O transpose.
    
    Parameters
    ----------
    O: numpy array
        shape 5 x 5 x 3
    P: numpy array
        Shape 5 x 5
    
    Returns
    -------
    (O * P), (P * transpose of O) 
    """
    # YOUR CODE HERE
    O_P = np.ones(O.shape)
    P_TO = np.ones(O.shape)
    
    for i in range(list(O.shape)[2]):
        O_P[:, :, i] = O[:, :, i] * P
    
    for i in range(list(O.shape)[2]):
        P_TO[:, :, i] = P * O[:, :, i]
    
    P_TO = P_TO.transpose()
    return O_P, P_TO
    # raise NotImplementedError()

In [74]:
"""(1 point) Test code for the previous function. This cell should NOT give any errors when it is run."""

Q = np.random.rand(5, 5, 3)
R = np.random.rand(5, 5)
S, T = multiply_arrays(Q, R)

assert S.shape == (5, 5, 3)


In [75]:
"""(1 point) Test code for the previous function. This cell should NOT give any errors when it is run."""

assert T.shape == (3, 5, 5)

print("Success!")

Success!


### 5. Fitting a line (2 points)

Write a function `xatfive`. Use `numpy` polynomial functions to fit the input values `x` and `y` to a line (i.e., first order polynomial), and return the expected value based on that fit at `x=5.0`


In [76]:
def xatfive(x, y):
    
    """Returns the expected fitted value at x = 5.
    
    Parameters
    ----------
    x: 1-D numpy array
    y: 1-D numpy array
    
    Returns
    -------
    Expected fit at x=5 of linear fit of random data x and y
    
    """
    # YOUR CODE HERE
    f = np.polyfit(x, y, 1)
    return np.polyval(f, 5)
    # raise NotImplementedError()

In [77]:
x = np.array([ 0.29646582,  5.9083115 ,  2.97347063,  0.77284422,  9.39502588,
               2.15227687,  6.1158336 ,  4.56733438,  9.9835841 ,  9.72066327])
y = np.array([  1.18875319,  17.82393043,   9.1241461 ,   2.5398729 ,
               28.48371414,   6.5234553 ,  18.49199616,  13.91623748,
               29.9688738 ,  29.41331221])

assert np.allclose(xatfive(x, y), 15.182532735557906)
print("Success!")

Success!


# Graduate student problems below

### 6. Array manipulation (3 points)

Some coding warm up before you complete the problem below. We have `experiment_data.npy` file that contains the trial results for all participants at different time slots. Each row represents the same participant at all time slots and each column represents the same time slot containing all participants. This file contains 2-D data. We also have `experiment_participants.npy` that contains all the ids for participants. This file contains 1-D data, and the number of elements is equal to the number of rows in the first 2-D file.

In [78]:
data = np.load("experiment_data.npy")

print(data)

print(data.shape)

[[1668.07869346  774.38921876 3161.14983152 ... 2359.05394666
   784.36404676  448.33416341]
 [2419.38185232  809.18389145 2766.62648929 ... 1159.47379735
  1330.44887992 1842.3268586 ]
 [2221.02887591 1496.00517071  354.95889145 ... 1355.74575912
  1205.29137942 1385.71283365]
 ...
 [1654.50469248  518.3271927  5127.58599224 ... 2544.1042064
   624.07607332 1029.57386246]
 [ 480.68016502 4690.12200498 1520.27397139 ... 1000.40541618
   988.73647145  378.43452948]
 [1823.42891807 3680.12951133 3522.94413167 ...  591.4133153
   383.26367525 1768.50528483]]
(50, 300)


In [79]:
participants = np.load("experiment_participants.npy")

print(participants)

print(participants.shape)

['p_045' 'p_039' 'p_027' 'p_023' 'p_041' 'p_008' 'p_025' 'p_019' 'p_036'
 'p_049' 'p_050' 'p_029' 'p_032' 'p_006' 'p_028' 'p_034' 'p_044' 'p_016'
 'p_010' 'p_017' 'p_022' 'p_033' 'p_042' 'p_009' 'p_047' 'p_035' 'p_002'
 'p_014' 'p_020' 'p_043' 'p_003' 'p_012' 'p_030' 'p_015' 'p_011' 'p_018'
 'p_004' 'p_040' 'p_001' 'p_031' 'p_005' 'p_013' 'p_046' 'p_038' 'p_021'
 'p_026' 'p_024' 'p_048' 'p_007' 'p_037']
(50,)


In other words, the first row of `data` corresponds to the first element of `participants` (so participant 45), the second row of `data` was given by participant 39, and so on.

if we wanted to pull out just the responses for participant 2, a natural approach would be to use boolean indexing:

In [80]:
participants == 'p_002'

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False,  True,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False])

In [81]:
data[participants == 'p_002'].shape

(1, 300)

Another way that we could do this would be to determine the index of participant 2, and then use that to index into data. To do this, we can use a function called np.where, which returns the indices of elements that are true:

In [82]:
np.where(participants == 'p_002')

(array([26], dtype=int64),)

In [83]:
data[np.where(participants == 'p_002')].shape

(1, 300)

#### Now comes the question
Write a function called `participant_mean` that takes as arguments a participant name/id, the data, and the list of participant names/ids, and computes the average trial results for the given participant.

Note that A clear statement should be printed if more than one participant has the given name/id. Even if you only have one participant that has the given name, you still need to write this code block for future-proof. The statement can be something like "more than one participant with id: xxx" (xxx is the participant name)


In [84]:
def participant_mean(participant, data, participants):
    """Computes the mean response for the given participant. 
    
    A clear statement describing the problem should be printed if more than one participant has the given name.
    
    Hint: your solution should use `np.where`.
    
    Parameters
    ----------
    participant: string
        The name/id of the participant
    data: numpy.ndarray with shape (n, m)
        Rows correspond to participants, columns to trials
    participants: numpy.ndarray with shape(n,)
        A string array containing participant names/ids, corresponding to
        the rows of the `data` array.
        
    Returns
    -------
    float: the mean response of the participant over all trials
    
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [85]:
"""Check for correct answers with the example experiment data."""
from numpy.testing import assert_allclose
data = np.load("experiment_data.npy")
participants = np.load("experiment_participants.npy")
assert_allclose(participant_mean('p_002', data, participants), 1857.7013113499095)
assert_allclose(participant_mean('p_047', data, participants), 1906.0651466520821)
assert_allclose(participant_mean('p_013', data, participants), 1718.4379910225193)
print("These work! More in next cell...")

NotImplementedError: 

In [None]:
"""Check for correct answers for some different data."""
data = np.arange(32).reshape((4, 8))
participants = np.array(['a', 'b', 'c', 'd'])
assert_allclose(participant_mean('a', data, participants), 3.5)
assert_allclose(participant_mean('b', data, participants), 11.5)
assert_allclose(participant_mean('c', data, participants), 19.5)
assert_allclose(participant_mean('d', data, participants), 27.5)
print("Success!")