# Kaj so nevronske mreže


# Nevron

Izhodna vrednost enega nevrona se izračuna po enačbi:


$\Large o = \sum_{j=0}^{n}(i_j w_j) + b$

* $i_j$ je specifičen input
* $w_j$ je weight specifičnega inputa
* $b$   je bias nevrona

<hr>

![Nevron](images/01.PNG)

Izhodno vrednost dobimo po zgornji enačbi:

$o = \sum_{j=0}^{n}(i_j w_j) + b = i_0 \cdot w_0 + i_1 \cdot w_1 + i_2 \cdot w_2 + b = 1 \cdot 0.2 + 2 \cdot 0.8 + 3 \cdot (-0.5) + 2 = 2.3$

In [None]:
inputs = [1,2,3]
weights = [0.2, 0.8, -0.5]
bias = 2

output = inputs[0]*weights[0] + inputs[1]*weights[1] + inputs[2]*weights[2] + bias
print(output)

![Nevron - 4 inputs](images/02.PNG)

$o = \sum_{j=0}^{n}(i_j w_j) + b = i_0 \cdot w_0 + i_1 \cdot w_1 + i_2 \cdot w_2 + i_3 \cdot w_3 + b = 1 \cdot 0.2 + 2 \cdot 0.8 + 3 \cdot (-0.5) + 2.5 \cdot 1.0 + 2 = 4.8$

In [None]:
inputs = [1,2,3,2.5]
weights = [0.2, 0.8, -0.5, 1.0]
bias = 2

output = inputs[0]*weights[0] + inputs[1]*weights[1] + inputs[2]*weights[2] + inputs[3]*weights[3] + bias
print(output)

<hr>

**Dot product**

$\Large \vec{a}^{\,}\cdot \vec{b}^{\,} = [1,2,3]\cdot [2,3,4] = 1\cdot 2 + 2\cdot 3 + 3\cdot 4 = 20$

In [None]:
# Primer računanja s tensorji za en neuron
import numpy as np

inputs = [1.0, 2.0, 3.0, 2.5]
weights = [0.2, 0.8, -0.5, 1.0]
bias = 2.0


output = np.dot(weights, inputs) + bias

print(output)

---

# Plast nevronov


![Dense layer](images/03.PNG)

In [None]:
inputs = [1, 2, 3, 2.5]                
weights1 = [ 0.20,  0.80, -0.50,  1.00]  # uteži neurona1
weights2 = [ 0.50, -0.91,  0.26, -0.50]  # uteži neurona2
weights3 = [-0.26, -0.27,  0.17,  0.87]  # uteži neurona3

bias1 = 2.0  # bias neurona1
bias2 = 3.0  # bias neurona2
bias3 = 0.5  # bias neurona3

layer_outputs = [
    # Output Neuron 1:
    inputs[0]*weights1[0] +
    inputs[1]*weights1[1] +
    inputs[2]*weights1[2] +
    inputs[3]*weights1[3] + bias1,

    # Output Neuron 2:
    inputs[0]*weights2[0] +
    inputs[1]*weights2[1] +
    inputs[2]*weights2[2] +
    inputs[3]*weights2[3] + bias2,

    # Output Neuron 3:
    inputs[0]*weights3[0] +
    inputs[1]*weights3[1] +
    inputs[2]*weights3[2] +
    inputs[3]*weights3[3] + bias3
]

print(layer_outputs)

<hr>

Ista koda, zapisana z **numpy.dot()**.

In [None]:
import numpy as np

inputs = [1.0, 2.0, 3.0, 2.5]
weights = [[ 0.20,  0.80, -0.50,  1.00],  # uteži neurona1
           [ 0.50, -0.91,  0.26, -0.50],  # uteži neurona2
           [-0.26, -0.27,  0.17,  0.87]]  # uteži neurona1
biases = [2.0, 3.0, 0.5]

layer_outputs = np.dot(weights, inputs) + biases

print(layer_outputs)

<hr>

![2 Dense layers](images/04.PNG)

In [None]:
import numpy as np

inputs = [1.0, 2.0, 3.0, 2.5]

# Dense layer 1
weights = [[ 0.20,  0.80, -0.50,  1.00],  # uteži neurona1
           [ 0.50, -0.91,  0.26, -0.50],  # uteži neurona2
           [-0.26, -0.27,  0.17,  0.87]]  # uteži neurona3
biases = [2.0, 3.0, 0.5]

# Dense layer 2
weights2 = [[ 0.10, -0.14,  0.50],  # uteži neurona1
            [-0.50,  0.12, -0.33],  # uteži neurona2
            [-0.44,  0.73, -0.13]]  # uteži neurona3
biases2 = [-1, 2, -0.5]


layer1_outputs = np.dot(weights, inputs) + biases
layer2_outputs = np.dot(weights2, layer1_outputs) + biases2

print("Layer 1: ", layer1_outputs)
print("Layer 2: ", layer2_outputs)

<hr>

In [None]:
import numpy as np
np.random.seed(2020)


class Layer_Dense:
    # Layer initialization
    def __init__(self, n_inputs, n_neurons):
        # Initialize weights and biases
        self.weights = 0.01 * np.random.randn(n_neurons, n_inputs)
        print("Weights: ")
        print(self.weights)
        self.biases = np.zeros(n_neurons)
        print("Bias: ")
        print(self.biases)

    # Forward pass
    def forward(self, inputs):
        # Calculate output values from inputs, weights and biases
        self.output = np.dot(self.weights, inputs) + self.biases

inputs = [1.0, 2.0, 3.0, 2.5]

print("Creating DENSE 1")
dense1 = Layer_Dense(4, 3)
print(30*"=")

print("Creating DENSE 2")
dense2 = Layer_Dense(3, 3)
print(30*"=")

dense1.forward(inputs)
dense2.forward(dense1.output)

print("Dense 1 output: ")
print(dense1.output)
print()

print("Dense 2 outut: ")
print(dense2.output)

<hr>

Za primer vzemimo batch, kjer imamo 3 vzorce:
```python
inputs = [[1.0, 2.0, 3.0, 2.5],  # sample 1
          [2.0, 5.0, -1.0, 2.0], # sample 2
          [-1.5, 2.7, 3.3, -0.8]]# sample 3
```

In [None]:
import numpy as np

inputs = [[ 1.0, 2.0,  3.0,  2.5],  # first  inputs to layer1
          [ 2.0, 5.0, -1.0,  2.0],  # second inputs to layer1
          [-1.5, 2.7,  3.3, -0.8]]  # third  inputs to layer1

weights = [[ 0.20,  0.80, -0.50,  1.00],  # uteži neurona1
           [ 0.50, -0.91,  0.26, -0.50],  # uteži neurona2
           [-0.26, -0.27,  0.17,  0.87]]  # uteži neurona3
biases = [2.0, 3.0, 0.5]


layer_outputs = np.dot(np.array(inputs), np.array(weights)) + biases
print(layer_outputs)

![Matrix calculation](images/05.PNG)

In [None]:
import numpy as np


inputs = [[ 1.0, 2.0,  3.0,  2.5],  # first  inputs to layer1
          [ 2.0, 5.0, -1.0,  2.0],  # second inputs to layer1
          [-1.5, 2.7,  3.3, -0.8]]  # third  inputs to layer1

#          # N1    # N2    # N3
weights = [[ 0.2 ,  0.5 , -0.26],
           [ 0.8 , -0.91, -0.27],
           [-0.5 ,  0.26,  0.17],
           [ 1.  , -0.5 ,  0.87]]

#        # N1   # N2  # N3
biases = [2.0,   3.0,  0.5]

layer_outputs = np.dot(np.array(inputs), np.array(weights)) + biases

print(layer_outputs)

<hr>

In [None]:
import numpy as np
np.random.seed(2020)

class Layer_Dense:
    # Layer initialization
    def __init__(self, n_inputs, n_neurons):
        # Initialize weights and biases
        
        # <=== HERE ===>
        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        # <=== HERE ===>
        
        print("Weights: ")
        print(self.weights)
        
        # <=== HERE ===>
        self.biases = np.zeros((1, n_neurons))
        # <=== HERE ===>
        
        print("Bias: ")
        print(self.biases)

    # Forward pass
    def forward(self, inputs):
        # Calculate output values from inputs, weights and biases
        
        # <=== HERE ===>
        self.output = np.dot(inputs, self.weights) + self.biases
        # <=== HERE ===>
        
inputs = np.array([[ 1.0, 2.0,  3.0,  2.5],   # first  inputs to layer1
                   [ 2.0, 5.0, -1.0,  2.0],   # second inputs to layer1
                   [-1.5, 2.7,  3.3, -0.8]])  # third  inputs to layer1
print("Inputs:")
print(inputs)

dense1 = Layer_Dense(4,3)
dense1.forward(inputs)

print("Dense OUTPUT")
print(dense1.output)

---

# Activation functions

![Activation function](images/06.PNG)

## Step Activation Function

$\Large 
y = 
\left\{
	\begin{array}{ll}
		1  & x > 0 \\
        0  & x \leq 0
	\end{array}
\right.
$

In [None]:
import matplotlib.pyplot as plt
import numpy as np

def step_function(x):
    return 1 if x>0 else 0

X = np.arange(-10, 10, 0.01)
y = [step_function(x) for x in X]

plt.plot(X, y)
plt.grid()
plt.show()

## Sigmoid Activation Function

$\Large y = \frac{1}{1+e^{-x}}$

In [None]:
import matplotlib.pyplot as plt
import numpy as np

def sigmoid(x):
    return 1 / ( 1 + np.exp(-x))

X = np.arange(-10, 10, 0.01)
y = [sigmoid(x) for x in X]

plt.plot(X, y)
plt.grid()
plt.show()

In [None]:
# Vanishing gradient problem
print(f"{sigmoid(20):.50f}")
print(f"{sigmoid(21):.50f}")

## Rectified Linear Units - ReLU


$\Large 
y = 
\left\{
	\begin{array}{ll}
		x  & x > 0 \\
        0  & x \leq 0
	\end{array}
\right.
$

In [None]:
import matplotlib.pyplot as plt
import numpy as np

def relu(x):
    return x if x>0 else 0

X = np.arange(-10, 10, 0.01)
y = [relu(x) for x in X]

plt.plot(X, y)
plt.grid()
plt.show()

## Uporaba aktivacijske funkcije v naši neuronski merži

In [None]:
import numpy as np
np.random.seed(2020)

class Activation_ReLU:
    
    # Forward pass
    def forward(self, inputs):
        # Calculate output valzes from input
        self.output = np.maximum(0, inputs)

n_samples = 4
neurons = 3
X = np.random.normal(size=(n_samples,neurons))
print("Inputs:")
print(X)

activation = Activation_ReLU()
activation.forward(X)
print("Output")
print(activation.output)

<hr>

In [None]:
import numpy as np
np.random.seed(2020)

class Layer_Dense:
    # Layer initialization
    def __init__(self, n_inputs, n_neurons):
        # Initialize weights and biases
        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        print("Weights: ")
        print(self.weights)
        self.biases = np.zeros((1, n_neurons))
        print("Bias: ")
        print(self.biases)

    # Forward pass
    def forward(self, inputs):
        # Calculate output values from inputs, weights and biases
        self.output = np.dot(inputs, self.weights) + self.biases

# <=== HERE ===>
class Activation_ReLU:
    # Forward pass
    def forward(self, inputs):
        # Calculate output valzes from input
        self.output = np.maximum(0, inputs)
# <=== HERE ===>
        
        
inputs = [[1.0, 2.0, 3.0, 2.5],
          [2.0, 5.0, -1.0, 2.0],
          [-1.5, 2.7, 3.3, -0.8]]

print("Ustvarimo DENSE 1")
dense1 = Layer_Dense(4, 3)
# <=== HERE ===>
activation1 = Activation_ReLU()
# <=== HERE ===>
print("Ustvarimo DENSE 2")
dense2 = Layer_Dense(3, 3)

dense1.forward(inputs)
# <=== HERE ===>
activation1.forward(dense1.output)
# <=== HERE ===>
dense2.forward(activation1.output)

print(30*"=")
print()

print("Dense 1 output: ")
print(dense1.output)
# <=== HERE ===>
print("ReLU 1 output:")
print(activation1.output)
# <=== HERE ===>
print("Dense 2 output: ")
print(dense2.output)

---

## Classification dataset

In [None]:
import matplotlib.pyplot as plt
import numpy as np

def vertical_data(samples, classes):
    X = np.zeros((samples*classes, 2))
    y = np.zeros(samples*classes, dtype='uint8')
    for class_number in range(classes):
        ix = range(samples*class_number, samples*(class_number+1))
        X[ix] = np.c_[np.random.randn(samples)*.1 + (class_number)/3, np.random.randn(samples)*.1 + 0.5]
        y[ix] = class_number
        
    return X, y

X, y = vertical_data(samples=100, classes=3)

fig, ax = plt.subplots(figsize=(8, 8))
ax.scatter(X[:, 0], X[:, 1], c=y, cmap="brg", marker="o", s=500, alpha=0.6)
plt.show()

## Softmax Activation Function

$\Large S_j = \frac{e^{o_j}}{\sum_{l=0}^{L}e^{o_l}}$

* $S_j$ je confidence score $j$ razreda
* $o_j$ je izhodna vrednost neurona
* $\sum_{l=0}^{L}e^{o_l}$ je seštevek $e^o$ vseh izhodnih vrednosti neuronov

In [None]:
import numpy as np

def softmax(inputs):
    exp_values = np.exp(inputs)
    return exp_values / np.sum(exp_values)

layer_outputs = [1.0, 2.0, 3.0, 2.5]

softmax_output = softmax(layer_outputs)
print(softmax_output)
print(sum(softmax_output))

In [None]:
# Softmax activation
class Activation_Softmax:
    # Forward pass
    def forward(self, inputs):
        # Get unnormalized probabilities
        exp_values = np.exp(inputs)
        # Normalize them for each sample
        probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)
        self.output = probabilities
        
layer_outputs = [[1.0, 2.0, 3.0, 2.5],
          [2.0, 5.0, -1.0, 2.0],
          [-1.5, 2.7, 3.3, -0.8]]
softmax = Activation_Softmax()
softmax.forward(layer_outputs)
print(softmax.output)

In [None]:
import numpy as np

def softmax(inputs):
    exp_values = np.exp(inputs)
    return exp_values / np.sum(exp_values)

layer_outputs = np.array([1_000, 2_000, 3_000, 2_500])
print("Layer outputs:")
print(layer_outputs)

softmax_output = softmax(layer_outputs)
print(softmax_output)

In [None]:
import numpy as np

def softmax(inputs):
    exp_values = np.exp(inputs)
    return exp_values / np.sum(exp_values)

# ============================================
layer_outputs = np.array([1, 2, 3, 2.5])
print("Layer outputs:")
print(layer_outputs)

softmax_output = softmax(layer_outputs)
print(softmax_output)

print(50*"*")
# ============================================

layer_outputs = np.array([1, 2, 3, 2.5])

layer_outputs = layer_outputs - layer_outputs.max()
print("Layer outputs:")
print(layer_outputs)

softmax_output = softmax(layer_outputs)
print(softmax_output)

In [None]:
# Softmax activation
class Activation_Softmax:
    # Forward pass
    def forward(self, inputs):
        # Get unnormalized probabilities
        exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
        # Normalize them for each sample
        probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)
        self.output = probabilities

        
layer_outputs = [[1.0, 2.0, 3.0, 2.5],
          [2.0, 5.0, -1.0, 2.0],
          [-1.5, 2.7, 3.3, -0.8]]
softmax = Activation_Softmax()
softmax.forward(layer_outputs)
print(softmax.output)

<hr>

In [None]:
import numpy as np
np.random.seed(2020)

# Dense layer
class Layer_Dense:
    # Layer initialization
    def __init__(self, n_inputs, n_neurons):
        # Initialize weights and biases
        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))

    # Forward pass
    def forward(self, inputs):
        # Calculate output values from inputs, weights and biases
        self.output = np.dot(inputs, self.weights) + self.biases

# ReLU activation
class Activation_ReLU:
    # Forward pass
    def forward(self, inputs):
        # Calculate output values from inputs
        self.output = np.maximum(0, inputs)

# <=== HERE ===>
# Softmax activation
class Activation_Softmax:
    # Forward pass
    def forward(self, inputs):
        # Get unnormalized probabilities
        exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
        # Normalize them for each sample
        probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)
        self.output = probabilities
# <=== HERE ===>      
        
# <=== HERE ===>
X, y = vertical_data(samples=100, classes=3)
# <=== HERE ===>

dense1 = Layer_Dense(2, 3)
activation1 = Activation_ReLU()
dense2 = Layer_Dense(3, 3)
# <=== HERE ===>
activation2 = Activation_Softmax()
# <=== HERE ===>

# <=== HERE ===>
dense1.forward(X[:5])
# <=== HERE ===>
activation1.forward(dense1.output)
dense2.forward(activation1.output)
# <=== HERE ===>
activation2.forward(dense2.output)
# <=== HERE ===>

print("Dense 1 output: ")
print(dense1.output)
print("ReLU 1 output:")
print(activation1.output)
print("Dense 2 outut: ")
print(dense2.output)
# <=== HERE ===>
print(50*"*")
print("Softmax output - PREDICTION")
print(activation2.output)
# <=== HERE ===>

# Loss Function and Accuracy

## Categorical Cross-Entropy Loss

$\Large L = - \sum_{j}y_j log(\hat{y_j})$

* $L$ je **loss** vrednost
* $y_j$ je resnična vrednost
* $\hat{y_j}$ je napovedana vrednost

In [None]:
import numpy as np

inputs = [0.7, 0.1, 0.2]
real_values =  [1, 0, 0]

loss = -(real_values[0]*np.log(inputs[0]) +
         real_values[1]*np.log(inputs[1]) +
         real_values[2]*np.log(inputs[2]))

print(loss)

<hr>

$\Large L = - \sum_{j}y_j log(\hat{y_j}) = \\ 
\Large -( y_0 log(\hat{y_0}) + y_1 log(\hat{y_1}) + y_2 log(\hat{y_2})) = \\ 
\Large -(1 \cdot log(\hat{y_0}) + 0 \cdot log(\hat{y_1}) + 0 \cdot log(\hat{y_2})) = \\
\Large - log(\hat{y_0}) = - log(\hat{y_k})$

* $L$ je **loss** vrednost
* $y_j$ je resnična vrednost
* $\hat{y_j}$ je napovedana vrednost
* $k$ - index pravilnega razreda

Na kratko zapisano:

$\Large L = - log(\hat{y_k})$

In [None]:
inputs = [0.7, 0.1, 0.2]
real_values =  [1, 0, 0]

correct_class_index = real_values.index(1)
print("Correct class index: ", correct_class_index)

loss = -np.log(inputs[correct_class_index])
print(loss)

In [None]:
inputs = [0.7, 0.0, 0.3]
real_values =  [0, 1, 0]

correct_class_index = real_values.index(1)
print("Correct class index: ", correct_class_index)
loss = -np.log(inputs[correct_class_index])
print(loss)

In [None]:
print(-np.log(0))        # ne-korigiran rezultat
print(-np.log(0 + 1e-7)) # korigiran rezultat povečan za minimalno vrednost

In [None]:
print(-np.log(0.1))        # ne-korigiran rezultat
print(-np.log(0.1 + 1e-7)) # korigiran rezultat povečan za minimalno vrednost

---

In [None]:
print(-np.log(1))

Da se rešimo tega problema bomo največjo številko pomanjšal za neko minimalno vrednost.

In [None]:
print(-np.log(1)) # pravilna in željena vrednost
print(-np.log(1-1e-7)) # korigirana vrednost

In [None]:
print(-np.log(0.9)) # pravilna in željena vrednost
print(-np.log(0.9 - 1e-7)) # korigirana vrednost

---

In [None]:
inputs = [1.7, 0.0, 0.3]
print("Inputs:")
print(inputs)

cor_inputs = np.clip(inputs, 1e-7, 1-1e-7)
print("Corrected inputs:")
print(cor_inputs)

real_values =  [0, 1, 0]

correct_class_index = real_values.index(1)
print("Correct class index: ", correct_class_index)
loss = -np.log(cor_inputs[correct_class_index])
print(loss)

<hr>

In [None]:
# Common loss class
class Loss:

    # Calculates the data and regularization losses
    # given model output and ground truth values
    def calculate(self, output, y):

        # Calculate sample losses
        sample_losses = self.forward(output, y)

        # Calculate mean loss
        data_loss = np.mean(sample_losses)

        # Return loss
        return data_loss


In [None]:
# Cross-entropy loss
class Loss_CategoricalCrossentropy(Loss):

    # Forward pass
    def forward(self, y_pred, y_true):

        # Number of samples in a batch
        samples = len(y_pred)

        # Clip data to prevent division by 0
        # Clip both sides to not drag mean towards any value
        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)

        # Probabilities for target values -
        # only if categorical labels
        if len(y_true.shape) == 1:
            correct_confidences = y_pred_clipped[
                range(samples),
                y_true
            ]

        # Mask values - only for one-hot encoded labels
        elif len(y_true.shape) == 2:
            correct_confidences = np.sum(
                y_pred_clipped * y_true,
                axis=1
            )
        print("Correct confidencesa:")
        print(correct_confidences)
        # Losses
        negative_log_likelihoods = -np.log(correct_confidences)
        return negative_log_likelihoods


softmax_outputs = np.array([[0.7, 0.1, 0.2],
                            [0.1, 0.5, 0.4],
                            [0.02, 0.9, 0.08]])
#class_targets = np.array([0, 2, 2])
class_targets = np.array([[1, 0, 0],
                          [0, 1, 0],
                          [0, 1, 0]])

loss_function = Loss_CategoricalCrossentropy()
loss = loss_function.calculate(softmax_outputs, class_targets)
print(loss)

<hr>

In [None]:
# Dense layer
class Layer_Dense:

    # Layer initialization
    def __init__(self, n_inputs, n_neurons):
        # Initialize weights and biases
        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))

    # Forward pass
    def forward(self, inputs):
        # Calculate output values from inputs, weights and biases
        self.output = np.dot(inputs, self.weights) + self.biases


# ReLU activation
class Activation_ReLU:

    # Forward pass
    def forward(self, inputs):
        # Calculate output values from inputs
        self.output = np.maximum(0, inputs)



# Softmax activation
class Activation_Softmax:

    # Forward pass
    def forward(self, inputs):

        # Get unnormalized probabilities
        exp_values = np.exp(inputs - np.max(inputs, axis=1,
                                            keepdims=True))
        # Normalize them for each sample
        probabilities = exp_values / np.sum(exp_values, axis=1,
                                            keepdims=True)

        self.output = probabilities

# <=== HERE ===>
# Common loss class
class Loss:

    # Calculates the data and regularization losses
    # given model output and ground truth values
    def calculate(self, output, y):

        # Calculate sample losses
        sample_losses = self.forward(output, y)

        # Calculate mean loss
        data_loss = np.mean(sample_losses)

        # Return loss
        return data_loss


# Cross-entropy loss
class Loss_CategoricalCrossentropy(Loss):

    # Forward pass
    def forward(self, y_pred, y_true):

        # Number of samples in a batch
        samples = len(y_pred)

        # Clip data to prevent division by 0
        # Clip both sides to not drag mean towards any value
        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)


        # Probabilities for target values -
        # only if categorical labels
        if len(y_true.shape) == 1:
            correct_confidences = y_pred_clipped[
                range(samples),
                y_true
            ]

        # Mask values - only for one-hot encoded labels
        elif len(y_true.shape) == 2:
            correct_confidences = np.sum(
                y_pred_clipped * y_true,
                axis=1
            )

        # Losses
        negative_log_likelihoods = -np.log(correct_confidences)
        return negative_log_likelihoods
# <=== HERE ===>

        

X, y = vertical_data(samples=100, classes=3)
# Create Dense layer with 2 input features and 3 output values
dense1 = Layer_Dense(2, 3)

# Create ReLU activation (to be used with Dense layer):
activation1 = Activation_ReLU()

# Create second Dense layer with 3 input features (as we take output
# of previous layer here) and 3 output values
dense2 = Layer_Dense(3, 3)

# Create Softmax activation (to be used with Dense layer):
activation2 = Activation_Softmax()

# <=== HERE ===>
# Create loss function
loss_function = Loss_CategoricalCrossentropy()
# <=== HERE ===>

# <=== HERE ===>
# Perform a forward pass of our training data through this layer
dense1.forward(X)
# <=== HERE ===>


# Perform a forward pass through activation function
# it takes the output of first dense layer here
activation1.forward(dense1.output)


# Perform a forward pass through second Dense layer
# it takes outputs of activation function of first layer as inputs
dense2.forward(activation1.output)

# Perform a forward pass through activation function
# it takes the output of second dense layer here
activation2.forward(dense2.output)

# <=== HERE ===>
# Let's see output of the first few samples:
print("Predictions")
print(activation2.output[:5])

# Perform a forward pass through loss function
# it takes the output of second dense layer here and returns loss
loss = loss_function.calculate(activation2.output, y)

# Print loss value
print('loss:', loss)
# <=== HERE ===>

<hr>

In [None]:
np.random.seed(2020)
# Dense layer
class Layer_Dense:

    # Layer initialization
    def __init__(self, n_inputs, n_neurons):
        # Initialize weights and biases
        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))

    # Forward pass
    def forward(self, inputs):
        # Calculate output values from inputs, weights and biases
        self.output = np.dot(inputs, self.weights) + self.biases


# ReLU activation
class Activation_ReLU:

    # Forward pass
    def forward(self, inputs):
        # Calculate output values from inputs
        self.output = np.maximum(0, inputs)



# Softmax activation
class Activation_Softmax:

    # Forward pass
    def forward(self, inputs):

        # Get unnormalized probabilities
        exp_values = np.exp(inputs - np.max(inputs, axis=1,
                                            keepdims=True))
        # Normalize them for each sample
        probabilities = exp_values / np.sum(exp_values, axis=1,
                                            keepdims=True)

        self.output = probabilities


# Common loss class
class Loss:

    # Calculates the data and regularization losses
    # given model output and ground truth values
    def calculate(self, output, y):

        # Calculate sample losses
        sample_losses = self.forward(output, y)

        # Calculate mean loss
        data_loss = np.mean(sample_losses)

        # Return loss
        return data_loss


# Cross-entropy loss
class Loss_CategoricalCrossentropy(Loss):

    # Forward pass
    def forward(self, y_pred, y_true):

        # Number of samples in a batch
        samples = len(y_pred)

        # Clip data to prevent division by 0
        # Clip both sides to not drag mean towards any value
        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)


        # Probabilities for target values -
        # only if categorical labels
        if len(y_true.shape) == 1:
            correct_confidences = y_pred_clipped[
                range(samples),
                y_true
            ]

        # Mask values - only for one-hot encoded labels
        elif len(y_true.shape) == 2:
            correct_confidences = np.sum(
                y_pred_clipped * y_true,
                axis=1
            )

        # Losses
        negative_log_likelihoods = -np.log(correct_confidences)
        return negative_log_likelihoods


        

X, y = vertical_data(samples=100, classes=3)
# Create Dense layer with 2 input features and 3 output values
dense1 = Layer_Dense(2, 3)

# Create ReLU activation (to be used with Dense layer):
activation1 = Activation_ReLU()

# Create second Dense layer with 3 input features (as we take output
# of previous layer here) and 3 output values
dense2 = Layer_Dense(3, 3)

# Create Softmax activation (to be used with Dense layer):
activation2 = Activation_Softmax()

# Create loss function
loss_function = Loss_CategoricalCrossentropy()

# Perform a forward pass of our training data through this layer
dense1.forward(X)

# Perform a forward pass through activation function
# it takes the output of first dense layer here
activation1.forward(dense1.output)


# Perform a forward pass through second Dense layer
# it takes outputs of activation function of first layer as inputs
dense2.forward(activation1.output)

# Perform a forward pass through activation function
# it takes the output of second dense layer here
activation2.forward(dense2.output)

# Let's see output of the first few samples:
print(activation2.output[:5])

# Perform a forward pass through loss function
# it takes the output of second dense layer here and returns loss
loss = loss_function.calculate(activation2.output, y)

# Print loss value
print('loss:', loss)


# <=== HERE ===>

predictions = np.argmax(activation2.output, axis=1)
accuracy = np.mean(predictions==y)
print("Accuracy: ", accuracy)

fig, ax = plt.subplots(figsize=(12, 12))
ax.scatter(X[:, 0], X[:, 1], c=y, cmap="brg", marker="o", s=500, alpha=0.6)
ax2 = ax.twinx()
ax2.scatter(X[:, 0], X[:, 1], c=predictions, cmap="brg", marker="o", s=100, edgecolors="black")
plt.show()
# <=== HERE ===>