#### Why batch? 

Batches can be calculated in parallel. This is typically done on GPUs rather than CPUs. 

Allows a certain generalization of input features rather than tuning a model piecewise to single sets of parameters in series. In other words, batch learning gives a more stable model over time.

That said, showing all samples at once instead of batches will probably result in overfitting. Typicaly batch sizes are 32, 64 or maybe 128.

#### Coding a batch of inputs taken from four neurons over time to three destination neurons. 

Implements the matrix product: output matrix consists of dot products of corresponding rows and columns of weights and inputs. 

In other words, first element of output matrix is dot product between first row of matrix A and first column of matrix B. Second element is dot prod between first row of matrix A and second column of matrix B. When all columns of matrix B are exhausted, output matrix begins second row and proceeds to record dot products of second row of matrix A with all columns of matrix B, so on and so forth.

In [5]:
import numpy as np

# Recall: size at index[1] of first element in dot product needs to match size of index[0] of second element. 
# in order to do this, we have to switch the rows an columns of the weights matrix. 
# Switching rows and columns is done by Transpose. This can be done on np arrays.

inputs = [[1,2,3,2.5],
          [2.0, 5.0, -1.0, 2.0],
          [-1.5, 2.7, 3.3, -0.8]]

weights = [[0.2, 0.8, -0.5, 1.0],
           [0.5, -0.91, 0.26, -0.5],
           [-0.26, -0.27, 0.17, 0.87]]

biases= [2,3,0.5]


output = np.dot(inputs, np.array(weights).T) + biases # in this case, inputs needs to come first.

print(output)

[[ 4.8    1.21   2.385]
 [ 8.9   -1.81   0.2  ]
 [ 1.41   1.051  0.026]]


#### Adding another layer. 

Need another set of weights and biases.

This looks like 4 neurons -> 3 neurons -> 3 neurons.

In [10]:
inputs = [[1,2,3,2.5],
          [2.0, 5.0, -1.0, 2.0],
          [-1.5, 2.7, 3.3, -0.8]]

weights = [[0.2, 0.8, -0.5, 1.0],
           [0.5, -0.91, 0.26, -0.5],
           [-0.26, -0.27, 0.17, 0.87]]

biases = [2,3,0.5]

# second layer

weights2 = [[0.1, -0.14, 0.5],
           [-0.5, 0.12, -0.33],
           [0.44, 0.73, -0.13]]

biases2 = [-1,2,-0.5]

layer1_outputs = np.dot(inputs, np.array(weights).T) + biases

layer2_outputs = np.dot(layer1_outputs, np.array(weights2).T) + biases2

print(layer2_outputs)

[[ 0.5031  -1.04185  2.18525]
 [ 0.2434  -2.7332   2.0687 ]
 [-0.99314  1.41254  0.88425]]


#### It's better to convert concept of layers into an object. 

Denote input features with X.

We need to initialize weights as random values between -1 and +1, but tighter ranges are generally better. In this case, we'll go for between -0.1 and +0.1.

Initialize weights using np.random.randn, which gives a gaussian distribution around 0. 

Biases are typically initialied as 0, but that can result in dead networks.

To this end, it helps to normalize and scale input dataset. 




In [16]:
# Quick run of np.random.randn to understand how I initialize weights:

print(0.10*np.random.randn(4,3))

[[ 0.17640523  0.04001572  0.0978738 ]
 [ 0.22408932  0.1867558  -0.09772779]
 [ 0.09500884 -0.01513572 -0.01032189]
 [ 0.04105985  0.01440436  0.14542735]]


In [22]:
np.random.seed(0)

X = [[1,2,3,2.5],
    [2.0, 5.0, -1.0, 2.0],
    [-1.5, 2.7, 3.3, -0.8]]

# define hidden layers
# will need to know number of initial inputs and number of neurons in destination

class Layer_Dense:
    def __init__(self, n_inputs, n_neurons):
        
        # np.random.randn params are the desired output shape
        # first param is size of input coming in
        # second param is number of neurons in destination
        
        self.weights = 0.10 * np.random.randn(n_inputs, n_neurons) # order reversed from prev examples to avoid transposition
        
        # Will need one bias for each neuron in destination, shape is 1D
        # np.zeros first param IS the shape of desired output, so input as tuple
        
        self.biases = np.zeros((1, n_neurons))
        
    def forward(self, inputs):
        self.output = np.dot(inputs, self.weights) + self.biases

# input size = number of features in each sample. In this case, 4
# number of neurons = any number you want
# only req is that input size for layer2 is the output size of layer1

layer1 = Layer_Dense(4,5)
layer2 = Layer_Dense(5,2)

layer1.forward(X)
print("Layer 1 output: ")
print()
print(layer1.output)

print()

print("Layer 2 output: ")
print()
layer2.forward(layer1.output)
print(layer2.output)

Layer 1 output: 

[[ 0.10758131  1.03983522  0.24462411  0.31821498  0.18851053]
 [-0.08349796  0.70846411  0.00293357  0.44701525  0.36360538]
 [-0.50763245  0.55688422  0.07987797 -0.34889573  0.04553042]]

Layer 2 output: 

[[ 0.148296   -0.08397602]
 [ 0.14100315 -0.01340469]
 [ 0.20124979 -0.07290616]]


#### The rows above represent output activations for each sample, i.e. each batch in the input. X had 3 samples, so outputs have 3 rows.
#### The columns represent output activations for each neuron in the destination layer. Layer one had 5 destination neurons and Layer two had 2 destination neurons.
#### The next thing: *Activation functions!* 