<a href="https://colab.research.google.com/github/rahiakela/grokking-deep-learning/blob/3-forward-propagation/forward_propagation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A Simple Neural Network Making a Prediction

## Step 1: Predict


You learned about the paradigm predict, compare, learn. In this guide, we’ll dive deep into the first step: **predict**.
<img src="https://github.com/rahiakela/img-repo/blob/master/predict1.JPG?raw=1" width="800"/>

you’ll learn more about what these three different parts of a neural network
prediction look like under the hood. Let’s start with the first one: the **data**.
<img src="https://github.com/rahiakela/img-repo/blob/master/data-predict-1.JPG?raw=1" width="800"/>

Later, you’ll find that the number of datapoints you process at a time has a significant
impact on what a network looks like. You might be wondering, “How do I choose how
many datapoints to propagate at a time?” The answer is based on whether you think the
neural network can be accurate with the data you give it.

For example, if I’m trying to predict whether there’s a cat in a photo, I definitely need to
show my network all the pixels of an image at once. Why? Well, if I sent you only one
pixel of an image, could you classify whether the image contained a cat? Me neither!

That’s a general rule of thumb, by the way: always present enough information to the
network, where “enough information” is defined loosely as how much a human might
need to make the same prediction.

As it turns out, you can create a network only after
you understand the shape of the input and output datasets (for now, shape means “number
of columns” or “number of datapoints you’re processing at once”).

<img src="https://github.com/rahiakela/img-repo/blob/master/win-probability.JPG?raw=1" width="800"/>

Now that you know you want to take one input datapoint and output one prediction, you
can create a neural network. Because you have only one input datapoint and one output
datapoint, you’re going to build a network with a single knob mapping from the input point
to the output. (Abstractly, these “knobs” are actually called weights) 

So, without further ado, here’s your first neural network, with a
single weight mapping from the input “# toes” to the output “win?”:

<img src="https://github.com/rahiakela/img-repo/blob/master/empty-network.JPG?raw=1" width="800"/>

### An empty network

In [0]:
weight = 0.1

def neural_network(input, weight):
  # Multiplying input by weight
  prediction = input * weight
  return prediction

### Inserting one input datapoint and Depositing the prediction

<img src="https://github.com/rahiakela/img-repo/blob/master/input-datapoints.JPG?raw=1" width="800"/>

<img src="https://github.com/rahiakela/img-repo/blob/master/input-weights.JPG?raw=1" width="800"/>

In [0]:
number_of_toes = [8.5, 9.5, 10, 9]

input = number_of_toes[0]
pred = neural_network(input, weight)
print(pred)

0.8500000000000001


You just made your first neural network and used it to predict! Congratulations! It should be 0.85. So what is a neural network? For now, it’s one or
more weights that you can multiply by the input data to make a prediction.
<img src="https://github.com/rahiakela/img-repo/blob/master/prediction.JPG?raw=1" width="800"/>

### How does the network learn?

Trial and error! First, it tries to make a prediction. Then, it sees whether the prediction was too
high or too low. 

Finally, it changes the weight (up or down) to predict more accurately the
next time it sees the same input.

### What does this neural network do?

**It multiplies the input by a weight. It “scales” the input by a certain amount.**

In the previous section, you made your first prediction with a neural network. A neural network,
in its simplest form, uses the power of multiplication. It takes an input datapoint (in this case,
8.5) and multiplies it by the weight. If the weight is 2, then the neural network will double the
input. If the weight is 0.01, then the network will divide the input by 100. As you can see, some
weight values make the input bigger, and other values make it smaller.


#### Conclusion

The interface for a neural network is simple. It accepts an input variable as information and a
weight variable as knowledge and outputs a prediction. 

Every neural network you’ll ever see
works this way. It uses the knowledge in the weights to interpret the information in the input
data. 

Later neural networks will accept larger, more complicated input and weight values, but
this same underlying premise will always ring true.

Notice several things. 

* First, the neural network does not have access to any information
except one instance. 
* If, after this prediction, you were to feed in number_of_toes[1], the
network wouldn’t remember the prediction it made in the last timestep. 
* A neural network
knows only what you feed it as input. It forgets everything else. 

Later, you’ll learn how to
give a neural network a “short-term memory” by feeding in multiple inputs at once.

Another way to think about a neural network’s weight value is as a measure of sensitivity
between the input of the network and its prediction. 
* If the weight is very high, then even the
tiniest input can create a really large prediction! 
* If the weight is very small, then even large
inputs will make small predictions. 

This sensitivity is akin to volume. “Turning up the weight”
amplifies the prediction relative to the input: weight is a volume knob!

Note that neural networks don’t predict just positive numbers—they can also predict negative
numbers and even take negative numbers as input. Perhaps you want to predict the probability
that people will wear coats today. If the temperature is –10 degrees Celsius, then a negative
weight will predict a high probability that people will wear their coats.

# Making a prediction with multiple inputs

**Neural networks can combine intelligence from
multiple datapoints.**

## Predcition with multiple inputs

What if you could give the network
more information (at one time) than just the average number of toes per player? In that case,
the network should, in theory, be able to make more-accurate predictions. Well, as it turns out, a
network can accept multiple input datapoints at a time.

<img src="https://github.com/rahiakela/img-repo/blob/master/multi-input-datapoints-1.JPG?raw=1" width="800"/>



In [0]:
weights = [0.1, 0.2, 0.0]

def neural_network(inputs, weights):
  pred = weight_sum(inputs, weights)
  return pred

<img src="https://github.com/rahiakela/img-repo/blob/master/multi-input-datapoints-2.JPG?raw=1" width="800"/>

In [0]:
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]

# making input
inputs = [toes[0], wlrec[0], nfans[0]]

# making prediction
prediction = neural_network(inputs, weights)

<img src="https://github.com/rahiakela/img-repo/blob/master/weighted-sum-inputs.JPG?raw=1" width="800"/>

In [0]:
def weight_sum(X, W):
  # input features and weight length must be same
  assert(len(X) == len(W))

  result = 0
  for i in range(len(X)):
    result += (X[i] * W[i])

  return result  

<img src="https://github.com/rahiakela/img-repo/blob/master/depositing-prediction.JPG?raw=1" width="800"/>

In [0]:
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]

# making input
inputs = [toes[0], wlrec[0], nfans[0]]

# making prediction
prediction = neural_network(inputs, weights)
print(prediction)

0.9800000000000001


## Multiple inputs: What does this neural network do?

**It multiplies three inputs by three knob weights and sums them.
This is a weighted sum.**

### Sum of weights

You learned that in order to make
accurate predictions, you need to build neural networks that can combine multiple inputs at
the same time. Fortunately, neural networks are perfectly capable of doing so.

<img src="https://github.com/rahiakela/img-repo/blob/master/empty-networkwith-multiple-inputs.JPG?raw=1" width="800"/>

This new neural network can accept multiple inputs at a time per prediction. This allows the
network to combine various forms of information to make better-informed decisions. But
the fundamental mechanism for using weights hasn’t changed. You still take each input and
run it through its own volume knob. In other words, you multiply each input by its own
weight.

The new property here is that, because you have multiple inputs, you have to sum their
respective predictions. Thus, you multiply each input by its respective weight and then sum
all the local predictions together. This is called a weighted sum of the input, or a weighted sum
for short. Some also refer to the weighted sum as a dot product, as you’ll see.

<img src="https://github.com/rahiakela/img-repo/blob/master/one-input-datapoint.JPG?raw=1" width="800"/>


### Vector Operation

This new need to process multiple inputs at a time justifies the use of a new tool. It’s called a vector,
and if you’ve been following along in your Jupyter notebook, you’ve already been using it. A vector
is nothing other than a list of numbers. In the example, input is a vector and weights is a vector.

As it turns out, vectors are incredibly useful whenever you want to perform operations
involving groups of numbers. In this case, you’re performing a weighted sum between two
vectors (a dot product). You’re taking two vectors of equal length (input and weights),
multiplying each number based on its position (the first position in input is multiplied by
the first position in weights, and so on), and then summing the resulting output.

Anytime you perform a mathematical operation between two vectors of equal length where
you pair up values according to their position in the vector (again: position 0 with 0, 1 with 1,
and so on), it’s called an elementwise operation. Thus elementwise addition sums two vectors,
and elementwise multiplication multiplies two vectors.

<img src="https://github.com/rahiakela/img-repo/blob/master/weighted-sum-inputs.JPG?raw=1" width="800"/>

The intuition behind how and why a dot product (weighted sum) works is easily one of the most
important parts of truly understanding how neural networks make predictions. Loosely stated, a
dot product gives you a notion of similarity between two vectors.

In [0]:
a = [0, 1, 0, 1]
b = [1, 0, 1, 0]
c = [0, 1, 1, 0]
d = [.5, 0, .5, 0]
e = [0, 1, -1, 0]

print(weight_sum(a, b))
print(weight_sum(b, c))
print(weight_sum(b, d))
print(weight_sum(c, c))
print(weight_sum(d, d))
print(weight_sum(c, e))

0
1
1.0
2
0.5
0


The highest weighted sum (w_sum(c,c)) is between vectors that are exactly identical. In
contrast, because a and b have no overlapping weight, their dot product is zero. Perhaps
the most interesting weighted sum is between c and e, because e has a negative weight.
This negative weight canceled out the positive similarity between them.

But a dot product
between e and itself would yield the number 2, despite the negative weight (double
negative turns positive).

In [0]:
print(weight_sum(e, e))

2


### Intuition of dot product

 Let’s become familiar with the various properties of the dot
product operation.

Sometimes you can equate the properties of the dot product to a logical AND. Consider a and b:

In [0]:
a = [ 0, 1, 0, 1]
b = [ 1, 0, 1, 0]

If you ask whether both a[0] AND b[0] have value, the answer is no. If you ask whether both
a[1] AND b[1] have value, the answer is again no. Because this is always true for all four
values, the final score equals 0. Each value fails the logical AND.

In [0]:
b = [ 1, 0, 1, 0]
c = [ 0, 1, 1, 0]

b and c, however, have one column that shares value. It passes the logical AND because b[2]
and c[2] have weight. This column (and only this column) causes the score to rise to 1.

In [0]:
c = [ 0, 1, 1, 0]
d = [.5, 0,.5, 0]

Fortunately, neural networks are also able to model partial ANDing. In this case, c and d share
the same column as b and c, but because d has only 0.5 weight there, the final score is only 0.5.

In [0]:
print(weight_sum(c, d))

0.5


We exploit this property when modeling probabilities in neural networks.

In [0]:
d = [.5, 0,.5, 0]
e = [-1, 1, 0, 0]
print(weight_sum(d, e))

-0.5


Given these intuitions, what does this mean when a neural network makes a prediction?
Roughly speaking, it means the network gives a high score of the inputs based on how
similar they are to the weights. Notice in the following example that nfans is completely
ignored in the prediction because the weight associated with it is 0. The most sensitive
predictor is wlrec because its weight is 0.2. But the dominant force in the high score is
the number of toes (ntoes), not because the weight is the highest, but because the input
combined with the weight is by far the highest.

<img src="https://github.com/rahiakela/img-repo/blob/master/deposit-prediction.JPG?raw=1" width="800"/>

Here are a few more points to note for further reference. 

* You can’t shuffle weights: they have
specific positions they need to be in. 
* Furthermore, both the value of the weight and the value of
the input determine the overall impact on the final score. 
* Finally, a negative weight will cause
some inputs to reduce the final prediction (and vice versa).

## Multiple inputs: Complete runnable code

### Pure Pythonic Neural Network

I’ve written everything out using basic properties of
Python (lists and numbers). But a better way exists that we’ll begin using in the future.

In [0]:
def weight_sum(X, W):
  assert(len(X) == len(W))

  result = 0
  for i in range(len(X)):
    result += (X[i] * W[i])
  return result

# define the weight
weights = [0.1, 0.2, 0.0]

def neural_network(features, weights):
  prediction = weight_sum(features, weights)
  return prediction

# define features as all game of the season
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]  

# Input feature corresponds to every entry for the first game of the season.
inputs = [toes[0], wlrec[0], nfans[0]]

# make prediction
pred = neural_network(inputs, weights)
print(pred)

0.9800000000000001


### Neural Network using Numpy

There’s a Python library called NumPy, which stands for “numerical Python.” It has
very efficient code for creating vectors and performing common functions (such as dot
products).

In [0]:
import numpy as np

In [0]:
# define the weight
weights = np.array([0.1, 0.2, 0.0])

def neural_network(features, weights):
  prediction = features.dot(weights)
  return prediction

# define features as all game of the season
toes = np.array([8.5, 9.5, 9.9, 9.0])
wlrec = np.array([0.65, 0.8, 0.8, 0.9])
nfans = np.array([1.2, 1.3, 0.5, 1.0])

# Input feature corresponds to every entry for the first game of the season.
inputs = np.array([toes[0], wlrec[0], nfans[0]])

# make prediction
pred = neural_network(inputs, weights)
print(pred)

0.9800000000000001


In [0]:
# Input feature corresponds to every entry for the second game of the season.
inputs = np.array([toes[1], wlrec[1], nfans[1]])

# make prediction
pred = neural_network(inputs, weights)
print(pred)

1.11


In [0]:
# Input feature corresponds to every entry for the third game of the season.
inputs = np.array([toes[2], wlrec[2], nfans[2]])

# make prediction
pred = neural_network(inputs, weights)
print(pred)

1.1500000000000001


Both networks should print out 0.98. Notice that in the NumPy code, you don’t have to
create a weight_sum function. 

Instead, NumPy has a dot function (short for “dot product”) you
can call.

# Making a prediction with multiple outputs

**Neural networks can also make multiple predictions using only a
single input.**

Perhaps a simpler augmentation than multiple inputs is multiple outputs. Prediction occurs
the same as if there were three disconnected single-weight neural networks.

### An empty network with multiple outputs

<img src="https://github.com/rahiakela/img-repo/blob/master/empty-network-with-multiple-outputs-1.JPG?raw=1" width="800"/>

The most important comment in this setting is to notice that the three predictions are
completely separate. Unlike neural networks with multiple inputs and a single output, where
the prediction is undeniably connected, this network truly behaves as three independent
components, each receiving the same input data.

### Inserting one input datapoint

<img src="https://github.com/rahiakela/img-repo/blob/master/empty-network-with-multiple-outputs-2.JPG?raw=1" width="800"/>

### Performing elementwise multiplication

<img src="https://github.com/rahiakela/img-repo/blob/master/empty-network-with-multiple-outputs-3.JPG?raw=1" width="800"/>

### Depositing predictions

<img src="https://github.com/rahiakela/img-repo/blob/master/empty-network-with-multiple-outputs-4.JPG?raw=1" width="800"/>

## Multiple outputs: Complete runnable code

### Pure Pythonic Neural Network

In [0]:
def weight_mul(Xi, W):
  # define multiple output length
  output = [0, 0, 0]
  assert(len(output) == len(W))

  for i in range(len(W)):
    output[i] = (Xi * W[i])
  return output

# define the weight
weights = [0.3, 0.2, 0.9]

def neural_network(feature, weights):
  prediction = weight_mul(feature, weights)
  return prediction

# define siglne feature as all game of the season
wlrec = [0.65, 0.8, 0.8, 0.9]  

# input siglne feature corresponds to one entry for the first game of the season.
input_one = wlrec[0]

# make prediction
pred = neural_network(input_one, weights)
print(pred)

[0.195, 0.13, 0.5850000000000001]


### Neural Network using Numpy

In [0]:
import numpy as np

In [0]:
# define the weight
weights = np.array([0.3, 0.2, 0.9])

def neural_network(feature, weights):
  prediction = np.multiply(weights, feature)
  return prediction

# define single feature as all game of the season
wlrec = np.array([0.65, 0.8, 0.8, 0.9])

# input single feature corresponds to single entry for the first game of the season.
input_one =wlrec[0]

# make prediction
pred = neural_network(input_one, weights)
print(pred)

[0.195 0.13  0.585]


# Predicting with multiple inputs and outputs

**Neural networks can predict multiple outputs given
multiple inputs.**

Finally, the way you build a network with multiple inputs or outputs can be combined to build
a network that has both multiple inputs and multiple outputs. As before, a weight connects each
input node to each output node, and prediction occurs in the usual way.

### An empty network with multiple inputs and outputs

<img src="https://github.com/rahiakela/img-repo/blob/master/network-with-multiple-inputs-and-outputs-1.JPG?raw=1" width="800"/>

### Inserting one input datapoint

<img src="https://github.com/rahiakela/img-repo/blob/master/network-with-multiple-inputs-and-outputs-2.JPG?raw=1" width="800"/>

### For each output, performing a weighted sum of inputs

<img src="https://github.com/rahiakela/img-repo/blob/master/network-with-multiple-inputs-and-outputs-4.JPG?raw=1" width="800"/>

### Depositing predictions

<img src="https://github.com/rahiakela/img-repo/blob/master/network-with-multiple-inputs-and-outputs-5.JPG?raw=1" width="800"/>




## Multiple inputs and outputs: Complete runnable code

### Pure Pythonic Neural Network

In [0]:
# summition of weights and features
def weight_sum(X, W):
  assert(len(X) == len(W))

  result = 0
  for i in range(len(X)):
    result += (X[i] * W[i])
  return result

# multiplication of weights and features vector
def vect_mat_mul(features_vec, weights_matrix):
  assert(len(features_vec) == len(weights_matrix))

  # define the output length
  output = [0, 0, 0]
  for i in range(len(features_vec)):
    output[i] = weight_sum(features_vec, weights_matrix[i])
  return output

# define the weight
# toes % win # fans
weights = [
  [0.1, 0.1, -0.3],  # hurt?
  [0.1, 0.2, 0.0],   # win?
  [0.0, 1.3, 0.1]    # sad?
]

def neural_network(features_vec, weights_matrix):
  prediction = vect_mat_mul(features_vec, weights_matrix)
  return prediction

# define features as all game of the season
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]  

# input feature corresponds to every entry for the first game of the season.
inputs = [toes[0], wlrec[0], nfans[0]]

# make prediction
pred = neural_network(inputs, weights)
print(pred)

[0.555, 0.9800000000000001, 0.9650000000000001]


### Neural Network using Numpy

In [0]:
# define the weight
# toes % win # fans
weights = np.array([
   [0.1, 0.1, -0.3],  # hurt?
   [0.1, 0.2, 0.0],   # win?
   [0.0, 1.3, 0.1]    # sad?                 
])

def neural_network(features, weights):
  prediction = [features.dot(weights[i]) for i in range(len(features))]
  return prediction

# define features as all game of the season
toes = np.array([8.5, 9.5, 9.9, 9.0])
wlrec = np.array([0.65, 0.8, 0.8, 0.9])
nfans = np.array([1.2, 1.3, 0.5, 1.0])

# Input feature corresponds to every entry for the first game of the season.
inputs = np.array([toes[0], wlrec[0], nfans[0]])

# make prediction
pred = neural_network(inputs, weights)
print(pred)

[0.555, 0.9800000000000001, 0.9650000000000001]


## Multiple inputs and outputs: How does it work?

**It performs three independent weighted sums of the input
to make three predictions.**

You can take two perspectives on this architecture: think of it as either three weights coming
out of each input node, or three weights going into each output node. For now, I find the
latter to be much more beneficial. Think about this neural network as three independent dot
products: three independent weighted sums of the input. Each output node takes its own
weighted sum of the input and makes a prediction.

<img src='https://github.com/rahiakela/img-repo/blob/master/empty-network-with-multiple-input-outputs-1.JPG?raw=1' width='800'/>

<img src='https://github.com/rahiakela/img-repo/blob/master/empty-network-with-multiple-input-outputs-2.JPG?raw=1' width='800'/>

<img src='https://github.com/rahiakela/img-repo/blob/master/empty-network-with-multiple-input-outputs-3.JPG?raw=1' width='800'/>

I want to use this list of vectors and series of weighted sums logic to introduce two new concepts. It’s a list of vectors. A list of vectors is called a
matrix. It’s as simple as it sounds. Commonly used functions use matrices. One of these is
called vector-matrix multiplication. The series of weighted sums is exactly that: you take a
vector and perform a dot product with every row in a matrix.





# Predicting on predictions

**Neural networks can be stacked!**

As the following figures make clear, you can also take the output of one network and feed it
as input to another network. This results in two consecutive vector-matrix multiplications.
It may not yet be clear why you’d predict this way; but some datasets (such as image
classification) contain patterns that are too complex for a single-weight matrix. Later, we’ll
discuss the nature of these patterns. For now, it’s sufficient to know this is possible.

## Numpy way

The following listing shows how you can do the same operations coded in the previous
section using a convenient Python library called NumPy. Using libraries like NumPy makes
your code faster and easier to read and write.

### An empty network with multiple inputs and outputs

<img src="https://github.com/rahiakela/img-repo/blob/master/empty-network-with-multiple-input-outputs-4.JPG?raw=1" width="800"/>

In [7]:
import numpy as np

# input layer weights
# toes % win # fans
input_layer_weights = np.array([
    [0.1, 0.2, -0.1], # hid[0]
    [-0.1,0.1, 0.9],  # hid[1]
    [0.1, 0.4, 0.1]   # hid[2]               
]).T
print(input_layer_weights)
print()

# hidden layer weights
# hid[0] hid[1] hid[2]
hidden_layer_weights = np.array([
   [0.3, 1.1, -0.3], # hurt?
   [0.1, 0.2, 0.0],  # win?
   [0.0, 1.3, 0.1]   # sad?                              
]).T
print(hidden_layer_weights)
print()

# combine both layers weights
weights = [input_layer_weights, hidden_layer_weights]
print(weights)

def neural_network(features, weights):
  # calculating the dot product of input layers weights with input features values
  input_layer_x_w_dot_products = features.dot(weights[0])
  # calculating the dot product of hidden layers weights with input layer output values
  prediction = input_layer_x_w_dot_products.dot(weights[1])
  return prediction

[[ 0.1 -0.1  0.1]
 [ 0.2  0.1  0.4]
 [-0.1  0.9  0.1]]

[[ 0.3  0.1  0. ]
 [ 1.1  0.2  1.3]
 [-0.3  0.   0.1]]

[array([[ 0.1, -0.1,  0.1],
       [ 0.2,  0.1,  0.4],
       [-0.1,  0.9,  0.1]]), array([[ 0.3,  0.1,  0. ],
       [ 1.1,  0.2,  1.3],
       [-0.3,  0. ,  0.1]])]


### Predicting the hidden layer

<img src="https://github.com/rahiakela/img-repo/blob/master/empty-network-with-multiple-input-outputs-5.JPG?raw=1" width="800"/>

In [8]:
toes = np.array([8.5, 9.5, 9.9, 9.0])
wlrec = np.array([0.65,0.8, 0.8, 0.9])
nfans = np.array([1.2, 1.3, 0.5, 1.0])

input_features = np.array([toes[0], wlrec[0], nfans[0]])

pred = neural_network(input_features, weights)
print(pred)

[0.2135 0.145  0.5065]


### Predicting the output layer

#### First game prediction

<img src="https://github.com/rahiakela/img-repo/blob/master/empty-network-with-multiple-input-outputs-6.JPG?raw=1" width="800"/>

#### Second game prediction

In [10]:
input_features = np.array([toes[1], wlrec[1], nfans[1]])

pred = neural_network(input_features, weights)
print(pred)

[0.204 0.158 0.53 ]


#### Third game prediction

In [11]:
input_features = np.array([toes[2], wlrec[2], nfans[2]])

pred = neural_network(input_features, weights)
print(pred)

[-0.584  0.018 -0.462]


#### Fourth game prediction

In [12]:
input_features = np.array([toes[3], wlrec[3], nfans[3]])

pred = neural_network(input_features, weights)
print(pred)

[-0.015  0.116  0.253]


# A quick primer on NumPy

Note that the processes for creating
a vector and a matrix are identical.
If you create a matrix with only one
row, you’re creating a vector. And, as
in mathematics in general, you create
a matrix by listing (rows,columns).
I say that only so you can remember
the order: rows come first, columns
come second. 

Let’s see some operations
you can perform on these vectors and
matrices:

In [13]:
a = np.array([0,1,2,3])
b = np.array([4,5,6,7])
c = np.array([[0,1,2,3], [4,5,6,7]])

d = np.zeros((2,4))
e = np.random.rand(2,5)

print(a)
print(b)
print(c)
print(d)
print(e)

[0 1 2 3]
[4 5 6 7]
[[0 1 2 3]
 [4 5 6 7]]
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[0.93184753 0.21462755 0.60319288 0.02026327 0.15013166]
 [0.09517482 0.59987969 0.34441146 0.09522957 0.80873655]]


In [14]:
print(a * 0.1)

[0.  0.1 0.2 0.3]


In [15]:
print(c * 0.2)

[[0.  0.2 0.4 0.6]
 [0.8 1.  1.2 1.4]]


In [16]:
print(a * b)

[ 0  5 12 21]


Again, the most confusing part is that all of these operations look the same if you don’t
know which variables are scalars, vectors, or matrices. When you “read NumPy,” you’re really
doing two things: reading the operations and keeping track of the shape (number of rows and
columns) of each operation. It will take some practice, but eventually it becomes second nature.

Let’s look at a few examples of matrix multiplication in NumPy, noting the input and output shapes
of each matrix.

In [19]:
# Vector of length 4
a = np.zeros((1, 4))
a

array([[0., 0., 0., 0.]])

In [24]:
# Matrix with 4 rows and 3 columns
b = np.zeros((4, 3))
b

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [25]:
c = a.dot(b)
c

array([[0., 0., 0.]])

In [26]:
print(c.shape)

(1, 3)


There’s one golden rule when using the dot function: if you put the (rows,cols) description
of the two variables you’re “dotting” next to each other, neighboring numbers should always be
the same.

In terms of variable shape, you can think of it as follows, regardless of whether you’re dotting vectors or matrices: their shape (number of rows and columns) must line up. The columns of the
left matrix must equal the rows on the right, such that (a,b).dot(b,c) = (a,c).

In [27]:
# Matrix with 2 rows and 4 columns
a = np.zeros((2, 4))
a

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [28]:
# Matrix with 4 rows and 3 columns
b = np.zeros((4, 3))
b

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [29]:
c = a.dot(b)
c

array([[0., 0., 0.],
       [0., 0., 0.]])

In [30]:
print(c.shape)

(2, 3)


In [32]:
# Matrix with 2 rows and 1 columns
e = np.zeros((2, 1))
e

array([[0.],
       [0.]])

In [33]:
# Matrix with 1 rows and 3 columns
f = np.zeros((1, 3))
f

array([[0., 0., 0.]])

In [34]:
g = e.dot(f)
g

array([[0., 0., 0.],
       [0., 0., 0.]])

In [35]:
print(g.shape)

(2, 3)


In [42]:
# Throws an error; .T flips the rows and columns of a matrix so Matrix with 4 rows and 5 columns
h = np.zeros((5, 4)).T   
h

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [45]:
# Matrix with 5 rows and 6 columns
i = np.zeros((5, 6))
i

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [46]:
j = h.dot(i)
j

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [47]:
print(j.shape)

(4, 6)


In [50]:
h = np.zeros((5,4))    # Matrix with 5 rows and 4 columns
i = np.zeros((5,6))    # Matrix with 5 rows and 6 columns
j = h.dot(i)
print(j.shape)         # Throws an error

ValueError: ignored

Everything we’ve done in this guide is a form of what’s called forward propagation, wherein
a neural network takes input data and makes a prediction. It’s called this because you’re
propagating activations forward through the network. In these examples, activations are all the
numbers that are not weights and are unique for every prediction.