# CS231n Convulutional Neural Networks for Visual Recognition
http://cs231n.github.io/


In [5]:
import numpy as np

## Matrix Multiplication

1. Make sure that the the number of columns in the 1st one equals the number of rows in the 2nd one
2. Multiply the elements of each row of the first matrix by the elements of each column in the second matrix.
3. Add the products.
4. Output = #rows x #cols

In [6]:
x = np.matrix( ((2,3,4), (5,6,7)) ) 2x3
y = np.matrix( ((2, 3), (3,5), (6,7)) ) 3x2

x * y


matrix([[37, 49],
        [70, 94]])

## Linear classifier
http://cs231n.github.io/linear-classify/

One  interpretation for the weights WW is that each row of WW corresponds to a template (or sometimes also called a prototype) for one of the classes.

In [30]:
W = np.matrix( [[0.2, -0.5, 0.1, 2.0], #cat
                [1.5, 1.3, 2.1, 0.0],  #dog
               [0, 0.25, 0.2, -0.3]] )  #ship
x = np.matrix( [[56], 
                [231], 
                [24], 
                [2]] ) 

b = np.matrix ([[1.1], 
               [3.2],
               [-1.2]])

W*x + b


matrix([[ -96.8 ],
        [ 437.9 ],
        [  60.75]])

it is a little cumbersome to keep track of two sets of parameters (the biases bb and weights WW) separately. A commonly used trick is to combine the two sets of parameters into a single matrix that holds both of them by extending the vector xixi with one additional dimension that always holds the constant 1 - a default bias dimension. With the extra dimension, the new score function will simplify to a single matrix multiply.

In [28]:
W = np.matrix( [[0.2, -0.5, 0.1, 2.0, 1.1], #cat + bias
                [1.5, 1.3, 2.1, 0.0, 3.2],  #dog + bias
               [0, 0.25, 0.2, -0.3, -1.2]] ) #ship + bias
                 
x = np.matrix( [[56], 
                [231], 
                [24], 
                [2],
               [1]] ) #default bias 

W*x


matrix([[ -96.8 ],
        [ 437.9 ],
        [  60.75]])

## Linear Classification Demo
http://vision.stanford.edu/teaching/cs231n/linear-classify-demo/

## Multi-class Support Vector Machine Loss
The SVM loss is set up so that the SVM “wants” the correct class for each image to a have a score higher than the incorrect classes by some fixed margin Δ.
http://cs231n.github.io/linear-classify/#loss



 

In [16]:
def L_i(x, y, W):
  """
  unvectorized version. Compute the multiclass svm loss for a single example (x,y)
  - x is a column vector representing an image (e.g. 3073 x 1 in CIFAR-10)
    with an appended bias dimension in the 3073-rd position (i.e. bias trick)
  - y is an integer giving index of correct class (e.g. between 0 and 9 in CIFAR-10)
  - W is the weight matrix (e.g. 10 x 3073 in CIFAR-10)
  """
  delta = 1.0 # see notes about delta later in this section
  scores = W.dot(x) # scores becomes of size 10 x 1, the scores for each class
  correct_class_score = scores[y]
  D = W.shape[0] # number of classes, e.g. 10
  loss_i = 0.0
  for j in xrange(D): # iterate over all wrong classes
    if j == y:
      # skip for the true class to only loop over incorrect classes
      continue
    # accumulate loss for the i-th example
    loss_i += max(0, scores[j] - correct_class_score + delta)
  return loss_i

In [17]:
W = np.matrix( [[0.2, -0.5, 0.1, 2.0, 1.1], #cat + bias
                [1.5, 1.3, 2.1, 0.0, 3.2],  #dog + bias
               [0, 0.25, 0.2, -0.3, -1.2]] ) #ship + bias
                 
x = np.matrix( [[56], 
                [231], 
                [24], 
                [2],
               [1]] ) #default bias 
y = 0
print L_i(x, y, W)

y = 1
print L_i(x, y, W)

y = 2
print L_i(x, y, W)


[[ 694.25]]
0.0
[[ 378.15]]


In [19]:
def L_i_vectorized(x, y, W):
  """
  A faster half-vectorized implementation. half-vectorized
  refers to the fact that for a single example the implementation contains
  no for loops, but there is still one loop over the examples (outside this function)
  """
  delta = 1.0
  scores = W.dot(x)
  # compute the margins for all classes in one vector operation
  margins = np.maximum(0, scores - scores[y] + delta)
  # on y-th position scores[y] - scores[y] canceled and gave delta. We want
  # to ignore the y-th position and only consider margin on max wrong class
  margins[y] = 0
  loss_i = np.sum(margins)
  return loss_i

In [69]:
W1 = np.matrix( [[0.2, -0.5, 0.1, 2.0, 1.1], #cat + bias
                [1.5, 1.3, 2.1, 0.0, 3.2],  #dog + bias
               [0, 0.25, 0.2, -0.3, -1.2]] ) #ship + bias

W2 = np.matrix( [[0.1, -0.4, 0.6, 1.0, 2.1], #cat + bias
                [1.1, 1.7, 1.1, 0.1, 2.2],  #dog + bias
               [1, 0.55, 0.4, -0.5, -1.7]] ) #ship + bias

W = np.row_stack([W1, W2])

x1 = np.matrix( [[56], 
                [231], 
                [24], 
                [2],
               [1]] ) #default bias 
x2 = np.matrix( [[231], 
                [76], 
                [12], 
                [1],
               [1]] ) #default bias 
X = np.column_stack([x1, x2])

y = [0, 1]

for i in range(0, 2):
    print L_i_vectorized(
        X[:, i:i+1],
        y[i:i+1],
        W[i*(i+2):(i+3)*(i+1),:]
        )


694.25
0.0


'\ny = 0\nprint L_i_vectorized(x, y, W)\n\ny = 1\nprint L_i_vectorized(x, y, W)\n\ny = 2\nprint L_i_vectorized(x, y, W)\n'

    fully-vectorized implementation :
    - X holds all the training examples as columns (e.g. 3073 x 50,000 in CIFAR-10)
    - y is array of integers specifying correct class (e.g. 50,000-D array)
    - W are weights (e.g. 10 x 3073)
1. Calculate the W.x matrix multiplication result.
2. Create a column vector of the correct scores in each row, using the values in y to index the columns.
3. Subtract the correct scores from each of the scores in the W.x matrix. CHECK: The correct class elements are now 0.
4. Apply the max(0, scores difference +1) to each element.
5. The correct class elements will now be 1, set them back to 0.
6. Sum up all elements in the matrix, divide by the number of examples to normalize.


In [99]:
def L(X, y, W):
    # evaluate loss over all examples in X without using any for loops
    # left as exercise to reader in the assignment
    delta=1.0
    scores = W.dot(X) # scores : 10x50000
    N=X.shape[1] # 50000
    z=np.arange(N)
    correct_class_score=scores[y,z].reshape(1,N)  
    loss=np.sum(np.maximum(0,(scores[:,]-correct_class_score[0,]+delta)),axis=0)
    loss-=delta
    return loss    

In [100]:
W1 = np.array( [[0.2, -0.5, 0.1, 2.0, 1.1], #cat + bias
                [1.5, 1.3, 2.1, 0.0, 3.2],  #dog + bias
               [0, 0.25, 0.2, -0.3, -1.2]] ) #ship + bias

W2 = np.array( [[0.1, -0.4, 0.6, 1.0, 2.1], #cat + bias
                [1.1, 1.7, 1.1, 0.1, 2.2],  #dog + bias
               [1, 0.55, 0.4, -0.5, -1.7]] ) #ship + bias

W = np.array([W1, W2])

x1 = np.matrix( [[56], 
                [231], 
                [24], 
                [2],
               [1]] ) #default bias 
x2 = np.matrix( [[231], 
                [76], 
                [12], 
                [1],
               [1]] ) #default bias 
X = np.column_stack([x1, x2])

y = [0, 1]

print L(X, y, W)

ValueError: shape too large to be a matrix.