# Motivation
I create this notebook to point out common mistakes among students who have done the __"Build Your First Neural Network"__ project

__Erros in this notebook are intentional. Do not fix them.__

### Linear ALgebra refresher
Before we dive into deep neural networks, let's review some linear algebra and some numpy gotcha

__This toturial focus on numpy programming, not math.__  
__For a mathamatical refresher, see [this great tutorial](https://www.youtube.com/watch?v=sYlOjyPyX3g&t=419s)__

In [1]:
from numpy import array, dot

In [18]:
# First
# dot(A, B) != A * B
A = array([[1,2,3]])
B = array([1,2,3])
dot_C = dot(A, B)
element_wise_c = A * B
print(dot_C)
print(element_wise_c)

# dot is the dot product
#  *  is the element-wise product

[14]
[[1 4 9]]


In [5]:
# But, these is a gotcha
A = array([[1,2,3]])
B = array([[1,2,3]])

# this line will fail, because it make no sense to dot product two matrices with shape (1, 3) and (1, 3)
dot_C = dot(A, B)

ValueError: shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)

In [7]:
# However
A = array([[1,2,3]])
B = array([[1,2,3]])

# This is the same as if B = [1,2,3]. But actually, this is the correct way. 
dot_C = dot(A, B.T)
print(dot_C)

# dot([[1,2,3]], [1,2,3]) works probably just because it's a numpy short hand.
# Because mathamatically speaking, it make less sense.

[[14]]


In [8]:
# Now, let's consider bigger matrix
A = array([
        [1,2,3],
        [4,5,6]
    ])
B = array([
        [7,8,9,10],
        [0,1,2,3]
    ])


# First let's see that is a transpose matrix .T
print(A.T)
print(B.T)

[[1 4]
 [2 5]
 [3 6]]
[[ 7  0]
 [ 8  1]
 [ 9  2]
 [10  3]]


In [9]:
X = dot(A.T, B)
Y = dot(B.T, A)

print('X:')
print(X)
print()
print('Y:')
print(Y)

X:
[[ 7 12 17 22]
 [14 21 28 35]
 [21 30 39 48]]

Y:
[[ 7 14 21]
 [12 21 30]
 [17 28 39]
 [22 35 48]]


In [25]:
# You see, X and Y are just transpose of each other
print(X.T == Y)
print(X == Y.T)
print(Y == X.T)
print(Y.T == X)

[[ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]]
[[ True  True  True  True]
 [ True  True  True  True]
 [ True  True  True  True]]
[[ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]]
[[ True  True  True  True]
 [ True  True  True  True]
 [ True  True  True  True]]


In [10]:
# Also consider
A = array([
        [1, 4],
        [2, 5],
        [3, 6],
    ])
B = array([
        [7,8,9,10],
        [0,1,2,3]
    ])

X = dot(A, B)
print(X)

[[ 7 12 17 22]
 [14 21 28 35]
 [21 30 39 48]]


In [11]:
# But this will fail, because the shape doesn't match
Y = dot(B, A)
print(Y)

ValueError: shapes (2,4) and (3,2) not aligned: 4 (dim 1) != 3 (dim 0)

In [12]:
# Therefore, dot(A, B) != dot(B, A)
# also consider
A = array([
        [1, 4],
        [2, 5],
        [3, 6],
    ])
B = array([
        [7, 8, 9],
        [1, 2, 3]
    ])
X = dot(A, B)
Y = dot(B, A)
print(X)
print(Y)

# As you can see, 
# just because 2 operations with the same arguments but different order get computed without error,
# it doesn't mean the semantics are the same.

# But, for *, A * B == B * A because it's just element-wise!

[[11 16 21]
 [19 26 33]
 [27 36 45]]
[[ 50 122]
 [ 14  32]]


In [42]:
# Now, let's consider scalar or 1 unit vector/matrix

# salar
A = 10
B = array([
        [7, 8, 9],
        [1, 2, 3]
    ])
print(A * B)
print(B * A)
print(dot(A, B))
print(dot(B, A))

[[70 80 90]
 [10 20 30]]
[[70 80 90]
 [10 20 30]]
[[70 80 90]
 [10 20 30]]
[[70 80 90]
 [10 20 30]]


In [46]:
# 1 unit vector
A = array([10])
B = array([
        [7, 8, 9],
        [1, 2, 3]
    ])
print(A * B)
print(B * A)
print(dot(A, B))
print(dot(B, A))

[[70 80 90]
 [10 20 30]]
[[70 80 90]
 [10 20 30]]


ValueError: shapes (1,) and (2,3) not aligned: 1 (dim 0) != 2 (dim 0)

In [47]:
# 1 unit matrix
A = array([[10]])
B = array([
        [7, 8, 9],
        [1, 2, 3]
    ])
print(A * B)
print(B * A)
print(dot(A, B))
print(dot(B, A))

[[70 80 90]
 [10 20 30]]
[[70 80 90]
 [10 20 30]]


ValueError: shapes (1,1) and (2,3) not aligned: 1 (dim 1) != 2 (dim 0)

In [56]:
# But, what if B has only one layer?
# saclar
A = array(10)
B = array([[7, 8, 9]])
print(A * B)
print(B * A)
print(dot(A, B))
print(dot(B, A), '\n')

# 1-unit vector
print('1 unit vector')
A = array([10])
B = array([[7, 8, 9]])
print(A * B)
print(B * A)
print(dot(A, B))
print(dot(B, A))

[[70 80 90]]
[[70 80 90]]
[[70 80 90]]
[[70 80 90]] 

1 unit vector
[[70 80 90]]
[[70 80 90]]
[70 80 90]


ValueError: shapes (1,3) and (1,) not aligned: 3 (dim 1) != 1 (dim 0)

In [55]:
# 1-unit matrix
print('1 unit matrix')
A = array([[10]])
B = array([[7, 8, 9]])
print(A * B)
print(B * A)
print(dot(A, B))
print(dot(B, A))

1 unit matrix
[[70 80 90]]
[[70 80 90]]
[[70 80 90]]


ValueError: shapes (1,3) and (1,1) not aligned: 3 (dim 1) != 1 (dim 0)

In [None]:
# Have you noticed that [10] dot [[7,8,9]] != [[10]] dot [[7,8,9]]

In [59]:
# Now, what if B is also a vector?
A = array(10)
B = array([7, 8, 9])
print(A * B)
print(B * A)
print(dot(A, B))
print(dot(B, A), '\n')

A = array([10])
B = array([7, 8, 9])
print(A * B)
print(B * A)
print(dot(A, B))
print(dot(B, A))

[70 80 90]
[70 80 90]
[70 80 90]
[70 80 90] 

[70 80 90]
[70 80 90]


ValueError: shapes (1,) and (3,) not aligned: 1 (dim 0) != 3 (dim 0)

In [19]:
# The above is another gotcha!!!
# matrix dot product only make sense when it's a matrix,
# that is, has shape (x, y), even if x or y could be 1!

print("And I will stop here. That's enough numpy matrix multiplication")

And I will stop here. That's enough numpy matrix multiplication


### Now, let's dive into deep neural network
__Just simple fully connected neural nets. No convolution, no recurrence__  
For example, if layer A and layer B are adjacent and B is received input from A.

A has 10 nodes and B has 15 nodes.

Then there are 150 connections between A and B. Each connection is an edge if you think of it as a computational graph.

Therefore, B received 150 values(scalars) from A. Each node in B received 10 values from A. Each node in A ouput 15 values to B.

Therefore, no matter it's forward pass or backward pass, 150 values get computed between A and B each pass.

The forward pass computes the activation. The backward pass computes the gradient and updating values.

These 150 edges contains 150 weights, 1 weight each.

In [65]:
# Therefore, assuming
A = array([[1,2,3,4,5,6,7,8,9,10]])
B = array([[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]])
# it only make semantic sense to write
whatever_been_computed = dot(A.T, B)
print(whatever_been_computed)
# or
whatever_been_computed = dot(B.T, A)
print(whatever_been_computed)

# no matter it's computing the weights, gradient, error, or what ever intermedia values between 2 layers.

[[  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15]
 [  2   4   6   8  10  12  14  16  18  20  22  24  26  28  30]
 [  3   6   9  12  15  18  21  24  27  30  33  36  39  42  45]
 [  4   8  12  16  20  24  28  32  36  40  44  48  52  56  60]
 [  5  10  15  20  25  30  35  40  45  50  55  60  65  70  75]
 [  6  12  18  24  30  36  42  48  54  60  66  72  78  84  90]
 [  7  14  21  28  35  42  49  56  63  70  77  84  91  98 105]
 [  8  16  24  32  40  48  56  64  72  80  88  96 104 112 120]
 [  9  18  27  36  45  54  63  72  81  90  99 108 117 126 135]
 [ 10  20  30  40  50  60  70  80  90 100 110 120 130 140 150]]
[[  1   2   3   4   5   6   7   8   9  10]
 [  2   4   6   8  10  12  14  16  18  20]
 [  3   6   9  12  15  18  21  24  27  30]
 [  4   8  12  16  20  24  28  32  36  40]
 [  5  10  15  20  25  30  35  40  45  50]
 [  6  12  18  24  30  36  42  48  54  60]
 [  7  14  21  28  35  42  49  56  63  70]
 [  8  16  24  32  40  48  56  64  72  80]
 [  9  18  27  36  45  54

In [67]:
# You can't store A and B in plain vectors
A = array([1,2,3,4,5,6,7,8,9,10])
B = array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])

# Because A.T make zero impact
print(A.T)
# and this will fall and it hurts
dot(A.T, B)

[ 1  2  3  4  5  6  7  8  9 10]


ValueError: shapes (10,) and (15,) not aligned: 10 (dim 0) != 15 (dim 0)

In [75]:
# But, if one of you layer has only 1 node
A = array([[1]])
B = array([[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]])

print(A * B)
print(B * A)
print(dot(A, B))
print(dot(A.T, B))
print(dot(B.T, A)) # oh, and this one's transpose is just
print(dot(B.T, A).T) # which == all of the first 4 equations

[[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]]
[[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]]
[[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]]
[[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]]
[[ 1]
 [ 2]
 [ 3]
 [ 4]
 [ 5]
 [ 6]
 [ 7]
 [ 8]
 [ 9]
 [10]
 [11]
 [12]
 [13]
 [14]
 [15]]
[[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]]


__And the above exmple is the reason some students start to do crazy combinations of `*`, `dot` and `.T`__

Eventually, They will probably gets the shapes matched. But, does that mean they understand how it actually work?

Also, even if the shapes match, the value could be wrong.

In [94]:
# For example
A = array([
        [1,2,3],
        [4,5,6]
    ])
B = array([
        [7,8,9],
        [0,2,1]
    ])

print(dot(A.T, B))
print(dot((A * B).T, B))

[[ 7 16 13]
 [14 26 23]
 [21 36 33]]
[[ 49  56  63]
 [112 148 154]
 [189 228 249]]


### End
I hope this will help you to clear your mind and understand what exactly happend when you are implementing neural netowrks with numpy as your matrix manipulation tool