Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your collaborators below:

In [137]:
COLLABORATORS = ""

---

In [138]:
import numpy as np

One of the simplest neural network learning algorithms is Hebbian
learning, in which the weight between two nodes is increased when
those nodes take on the same value, and decreased when they take on
different values. If ${\bf x} = (x_1, \ldots x_n)^T$ and
${\bf y} = (y_1, \ldots, y_m)^T$ are $n \times 1$ and $m \times 1$
vectors representing the inputs and outputs to a neural network
respectively, the weights of the neural network can be expressed in a
$m \times n$ matrix ${\bf W}$. The networks predicted output
$\mathbf{\hat{y}}$ is then:

\begin{equation}      
\mathbf{\hat{y}}  = \mathbf{Wx}
\end{equation}

We train the neural network by providing it with a set of input-output
pairs, $({\bf x},{\bf y})$. Hebbian learning adjusts the weights using
the following equation for each input-output pair:

\begin{equation}         
\Delta\mathbf{W} = \eta \mathbf{y}\mathbf{x}^{T}
\end{equation} 
        
In other words, the change in the weight matrix $\mathbf{W}$ is determined by
the outer product of the output and input vectors, multiplied by the
learning rate $\eta$. Then, the updated weight matrix equals the old
weight matrix plus $\Delta\mathbf{W}$.

\begin{equation}         
\mathbf{\hat{W}} = \mathbf{W} + \Delta \mathbf{W}
\end{equation} 

Another way to think about this is that each weight $w_{ij}$, which connects input neuron $j$ to output neuron $i$, is updated based on the correlation between their values:

\begin{equation}
\Delta{w_{ij}} = w_{ij} + \eta y_i x_j
\end{equation}

The Hebb rule was inspired by people thinking about how neuronal interactions may enable learning, and was confirmed by experiments many years later. It is, therefore, a good example of how considering a problem at the algorithmic level can guide the investigation of the implementational, and, vice versa, how discoveries at the implementational level can confirm thoughts about the suitability of algorithmic descriptions of processes.

Read more about the motivation and formulation behind the Hebb rule here: https://en.wikipedia.org/wiki/Hebbian_theory

Independent of its neuronal interpretations and basis, the Hebb rule forms the basis of many systems for *unsupervised* learning, as it organizes networks based solely on the statistics of the input, rather than any signal for a teacher.


---

## Part A (2 points)

Assume we want to learn what noises animals make based on their
appearance. We might list four input features: chasing sticks, liking water, having whiskers, and being furry, and then represent dogs with
${\bf x}_{\rm dog} = ( 1, 1, 1, 1)^T$ and cats with
${\bf x}_{\rm cat} = (-1,-1,1,1)^T$. Likewise, we could have four
output features: hissing, barking, neighing, and growling, with
${\bf y}_{\rm dog} = (-1,1,-1,1)^T$ and
${\bf y}_{\rm cat} = (1, -1 -1, 1)^T$. Then
$(\mathbf{x}_{\rm dog},\mathbf{y}_{\rm dog})$ would be one the
input-output pair for dogs. Note that 1 indicates a logical <code>TRUE</code> and -1 indicates a logical <code>FALSE</code>.

In [139]:
inputFeatures = ['chases sticks', 'likes water','whiskers','furry']
outputFeatures = ['hisses','barks','neighs','growls']

xDog = np.array([ 1., 1., 1., 1. ])
xCat = np.array([ -1., -1., 1., 1. ])

yDog =  np.array([ -1., 1., -1., 1. ])
yCat = np.array([ 1., -1., -1., 1. ])

trainingInputs = np.column_stack((xDog, xCat))
trainingOutputs = np.column_stack((yDog, yCat))

W = np.zeros((len(xDog),len(xDog)))

<div class="alert alert-success">Complete the function definition <code>updateWeights</code> so that it takes the current weight matrix $\mathbf{W}$, a matrix of training data inputs, and a matrix of training data outputs as parameters and returns a matrix with the updated weights using Hebbian learning on the given
training data.</div>

The matrix of training data inputs should have training instances as
its columns. For example, if there were two training instances,
$\mathbf{x}_{\rm dog}$ and $\mathbf{x}_{\rm cat}$, the matrix would
have two columns:
$\left[\mathbf{x}_{\rm dog},\mathbf{x}_{\rm cat}\right]$. The matrix
of training data outputs should have two columns corresponding to
output instances:
$\left[\mathbf{y}_{\rm dog},\mathbf{y}_{\rm cat}\right]$. The learning rate $\eta$ is set at $.25$ inside <code>updateWeights</code> as a default.

In [140]:
def updateWeights(W, trainingInputs, trainingOutputs, eta = 0.25):
    """
    Updates the current weight matrix W using Hebbian learning.
    
    Hint: your solution can be done in a single line of code, 
    including the return statement.
    
    Parameters
    ----------
    W: the trained weight matrix
    trainingInputs:  a matrix where each column represents a set of 
        input features
    trainingOutputs: a matrix where each column represents a set of 
        output features
    eta: the learning rate, set at 0.25 by default    

    Returns
    -------
    a matrix with the updated weight matrix after the network has seen 
        the given training data.

    """
    return W + (eta * np.dot(trainingOutputs, trainingInputs.T))

In [141]:
updatedW = updateWeights(W,trainingInputs,trainingOutputs)
print(W)
print(updatedW)

[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]
[[-0.5 -0.5  0.   0. ]
 [ 0.5  0.5  0.   0. ]
 [ 0.   0.  -0.5 -0.5]
 [ 0.   0.   0.5  0.5]]


In [142]:
# add your own test cases here!


In [143]:
"""Check that `updateWeights` produces expected output."""
from nose.tools import assert_equal
from numpy.testing import assert_allclose

testW0 = np.zeros((4,4))
testInputs1 = np.array([[1, -1, 1],[1, -1, 1],[1, 1, 1],[1, 1, 1]])
testOutputs1 = np.array([[-1, 1, 1],[1, -1, 1],[-1, -1, 1],[1, 1, 1]])

testW1 = updateWeights(testW0, testInputs1[:,[0,1]],testOutputs1[:,[0,1]])

"""Confirm that the output is an array""" 
assert(isinstance(testW1, np.ndarray))

"""Check if dimensions are the same as the original weights""" 
assert_equal(testW1.shape, (4, 4))

"""Check if weight matrix sums to 0, if trained on first two examples""" 
assert_equal(testW1.sum(),0)

testW2 = updateWeights(testW0, testInputs1, testOutputs1)

"""Check if weight matrix sums to 4, if trained on all three examples""" 
assert_equal(testW2.sum(),4)

testW3 = updateWeights(np.zeros((3,3)), testInputs1[0:3,], testOutputs1[0:3,])

"""Check if dimensions are the same as the original weights""" 
assert_equal(testW3.shape, (3, 3))

"""Check if the weight matrix is correct: Input 1""" 
testW4 = updateWeights(testW0, testInputs1,testOutputs1)
assert_allclose(testW4,np.array(
       [[-0.25, -0.25,  0.25,  0.25],
       [ 0.75,  0.75,  0.25,  0.25],
       [ 0.25,  0.25, -0.25, -0.25],
       [ 0.25,  0.25,  0.75,  0.75]]))

"""Check if the weight matrix is correct: Input 2""" 
test_xDog = np.array([ 1., 1., 1., 1. ])
test_xCat = np.array([ -1., -1., 1., 1. ])
test_yDog =  np.array([ -1., 1., -1., 1. ])
test_yCat = np.array([ 1., -1., -1., 1. ])

testInputs2 = np.column_stack((test_xDog, test_xCat))
testOutputs2 = np.column_stack((test_yDog, test_yCat))

testW5 = updateWeights(testW0, testInputs2,testOutputs2)
assert_allclose(testW5,np.array(
       [[-0.5, -0.5,  0. ,  0. ],
       [ 0.5,  0.5,  0. ,  0. ],
       [ 0. ,  0. , -0.5, -0.5],
       [ 0. ,  0. ,  0.5,  0.5]]))

print("Success!")

Success!


---

## Part B (1 point)

<div class="alert alert-success">
Calculate the network's predicted output
$\mathbf{\hat{y}}$ for inputs ${\bf x}_{\rm dog}$ and
${\bf x}_{\rm cat}$. Complete the function definition in <code>getPredictions</code>, which takes ${\bf x}_{\rm dog}$, ${\bf x}_{\rm cat}$, and <code>updatedW</code> as parameters and returns a dictionary with the predicted binary feature arrays for cat (for the key 
<code>yhatCat</code>) and dog (<code>yhatDog</code>).

**Hint:** To double-check that your code is functioning correctly,
these predicted outputs should be equal to their respective training
data outputs because the training data vectors are orthogonal.
</div>

In [144]:
def getPredictions(xDog, xCat, updatedW):   
    """
    Gets predicted binary feature vectors for cats and dogs
    
    Hint: your solution can be done in one or two lines of code, 
    including the return statement. 
    
    Parameters
    ----------
    xDog: the feature vector representing the input features for dog
    xCat: the feature vector representing the input features for cat
    updatedW: the weight matrix after training
 
    Returns
    -------
    a dictionary with two key-value pairs
        'yhatCat' ; the predicted binary features for cat 
        'yhatDog' ; the predicted binary features for dog 

    """    
    
    # YOUR CODE HERE
    return {"yhatDog": np.dot(updatedW, xDog), "yhatCat" : np.dot(updatedW, xCat)}

In [145]:
predictions = getPredictions(xDog, xCat, updatedW)
print(predictions)


{'yhatDog': array([-1.,  1., -1.,  1.]), 'yhatCat': array([ 1., -1., -1.,  1.])}


In [146]:
# add your own test cases here!

In [147]:
"""Confirm that the output is an array""" 
testPredictions = getPredictions(test_xDog, test_xCat, testW5)
assert_equal(type(testPredictions), dict)

"""Check if the keys are defined""" 
assert_equal('yhatCat' in testPredictions and 'yhatDog' \
    in testPredictions, True)

"""Check that the vectors are binary, with 1 and -1"""
assert(testPredictions['yhatCat'].prod() == 1 or  \
       testPredictions['yhatCat'].prod() == -1)
assert(testPredictions['yhatDog'].prod() == 1 or \
       testPredictions['yhatDog'].prod() == -1)

"""Check that the vectors are of the right length"""
assert_equal(len(testPredictions['yhatCat']), 4)
assert_equal(len(testPredictions['yhatDog']), 4)

"""Check that the predicted output is the same as the true output"""
assert(all(testPredictions['yhatCat'] == test_yCat))
assert(all(testPredictions['yhatDog'] == test_yDog))


"""Check that the predicted output for arbitrary input"""
testInput1 = np.array([-1, -1, -1, -1])
testInput2 = np.array([1, 1, -1, -1])
testPredictions2 = getPredictions(testInput1, testInput2, updatedW)
assert_allclose(testPredictions2['yhatCat'], \
                np.array([-1.,  1.,  1., -1.]))
assert_allclose(testPredictions2['yhatDog'], \
                np.array([ 1., -1.,  1., -1.]))

print("Success!")

Success!


---
## Part C (1 point)

Now, imagine you saw an animal that was definitely furry and had
whiskers, but you weren't sure about whether it liked water or chased
sticks. We could represent the features of this animal as the (convex) combination:
\begin{equation}
{\bf x}_{\rm unknown} = \alpha {\bf x}_{\rm dog} + (1-\alpha) {\bf x}_{\rm cat}
\end{equation}
for some value of $\alpha$ between $0$ and $1$. In other words, because we aren't certain of certain features in the ${\bf x}_{\rm unknown}$ vector, we can use various values of $\alpha$ to quantify our uncertainty.

<div class="alert alert-success">
Using the value of ${\bf W}$ computed in Part A, what is the
predicted network output $\mathbf{\hat{y}}$ for different inputs
${\bf x}_{\rm unknown}$ with $\alpha = 0.2$, $0.5$,
and $0.8$? Complete the function <code>getWeightedPredictions</code>, and return a dictionary that stores these predictions in the keys
<code>yhatAnimalX2</code>, <code>yhatAnimalX5</code>, and
<code>yhatAnimalX8</code>.
</div>

In [148]:
def getWeightedPredictions(xDog, xCat, updatedW):   
    """
    Get predicted binary feature vectors given a series of part-cat, 
    part-dog inputs, corresponding to alpha values of .2, .5 and .8 
    
    Hint: your solution can be done in about three lines of code, 
    including the return statement. 
    
    Parameters
    ----------
    
    xDog: the feature vector representing the input features for dog
    xCat: the feature vector representing the input features for cat
    updatedW: the weight matrix after training
 
    Returns
    -------
    a dictionary with two key-value pairs
        'yhatAnimalX2' ; the predicted binary features when alpha = .2 
        'yhatAnimalX5' ; the predicted binary features when alpha = .5 
        'yhatAnimalX8' ; the predicted binary features when alpha = .8 

    """    
    yhatAnimalX2 = (0.2 * xDog) + (1 - 0.2)* xCat
    yhatAnimalX5 = (0.5 * xDog) + (1 - 0.5)* xCat
    yhatAnimalX8 = (0.8 * xDog) + (1 - 0.8)* xCat
    return {"yhatAnimalX2": np.dot(updatedW, yhatAnimalX2), 
            "yhatAnimalX5": np.dot(updatedW, yhatAnimalX5), 
            "yhatAnimalX8": np.dot(updatedW, yhatAnimalX8)}

In [149]:
weightedPredictions = getWeightedPredictions(xDog, xCat, updatedW)
print(weightedPredictions)

{'yhatAnimalX2': array([ 0.6, -0.6, -1. ,  1. ]), 'yhatAnimalX5': array([ 0.,  0., -1.,  1.]), 'yhatAnimalX8': array([-0.6,  0.6, -1. ,  1. ])}


In [150]:
# add your own test cases here!

In [151]:
testWeightedPredictions = getWeightedPredictions(test_xDog, test_xCat,\
    testW5)

"""Confirm that the output is an array""" 
assert_equal(type(testWeightedPredictions), dict)

"""Check if the keys are defined""" 
assert_equal('yhatAnimalX2' in testWeightedPredictions and \
             'yhatAnimalX5' in testWeightedPredictions and \
             'yhatAnimalX8' in testWeightedPredictions , True)

"""Check that the vectors are of the right length"""
assert_equal(len(testWeightedPredictions['yhatAnimalX2']), 4)
assert_equal(len(testWeightedPredictions['yhatAnimalX5']), 4)
assert_equal(len(testWeightedPredictions['yhatAnimalX8']), 4)

"""Check the output values for .2, .5, and .8"""
assert_allclose(testWeightedPredictions['yhatAnimalX2'], \
    np.array([ 0.6, -0.6, -1. ,  1. ]))
assert_allclose(testWeightedPredictions['yhatAnimalX5'], \
    np.array([ 0.,  0., -1.,  1.]))
assert_allclose(testWeightedPredictions['yhatAnimalX8'], \
    np.array([-0.6,  0.6, -1. ,  1. ]))

print("Success!")

Success!


## Part D (2 points)

<div class="alert alert-success">
Interpret these predictions in terms of the noises the animal might
make. This subquestion is worth **1.5 points**, so give a detailed answer, stating your predictions and interpreting them fully.
</div>

For animal with alpha value = 0.2, the prediction corresponds to [0.6, -0.6, -1, 1]. This means that this animal has a high chance that it hisses, high chance that it does *not* bark, does not neigh, and growls. The predictions definitely make sense to me, because given the 0.2 alpha value here, if the animal was 20% Dog and 80% Cat, I believe the animal would have a high chance to hiss, most likely not bark, not neigh, and growl.  

For animal with alpha value = 0.5, the prediction corresponds to [0, 0, -1, 1]. This means that this animal then has a 50% chance of hissing, 50% chance of barking, does not neigh, and growls. Given the alpha value here, the animal is ambiguously either 50% dog or 50% cat. Therefore, I think the predictions are entirely accurate because the animal should have a 50% chance of hissing, 50% chance of barking, not neigh, and 100% chance of growling (because both animals exhibit this feature). 

For animal with alpha value = 0.8, the prediction corresponds to [-0.6, 0.6, -1, 1.]. This means then that this animal has a very likeley chance to *not* hiss, high likeley chance that it barks, does not neigh, and growls. This animal would be 80% dog, and 20% cat given the alpha value, so the predictions the neural network made here are fairly good. I think the chance of not hissing here would be right around the hissing trait for an 80% dog animal as with the same on the barking. The predictions are entirely accurate in my book.

<div class="alert alert-success">
How does the kind of solution the neural network produces differ
from the kind of thing we might expect from an account based on rules
and symbols? (**0.5 points**)
</div>

The solution from the neural network is continous rather than discrete based on an account comprised of rules and symbols. Therefore, it gives a better prediction given the available data and makes its guesses with a non-binary approach, which makes it much better at guessing ambiguous inputs (animals with certain percentage of dog-like and cat-like features in this case). The rules and symbols approach would be less apt to take into account the fluidity between the features.

---

Before turning this problem in remember to do the following steps:

1. **Restart the kernel** (Kernel$\rightarrow$Restart)
2. **Run all cells** (Cell$\rightarrow$Run All)
3. **Save** (File$\rightarrow$Save and Checkpoint)

<div class="alert alert-danger">After you have completed these three steps, ensure that the following cell has printed "No errors". If it has <b>not</b> printed "No errors", then your code has a bug in it and has thrown an error! Make sure you fix this error before turning in your problem set.</div>

In [152]:
print("No errors!")

No errors!
