#### Problem 4.7

Terms:<br>
$\langle \overrightarrow{x},\overrightarrow{t} \rangle =$ training example, where $\overrightarrow{x}$ is the vector of network input and $\overrightarrow{t}$ is the vector of target network output values.<br>
$\eta =$ learning rate (e.g. 0.5, or some other small value from 0 to 1) <br>
$x_{ji} =$ the input from $unit_i$ to $unit_j$ (For $unit_j$, the input $x$ comes from $unit_i$, hence $x_{ji}$)<br>
$w_{ji} =$ the weight from $unit_i$ to $unit_j$<br>
$n_{in} =$ the number of network inputs<br>
$n_{out} =$ the number of network outputs<br>
$n_{hidden} =$ the number of units in the hidden layer<br>

Algorithm:<br>
Create a feed-forward network with $n_{in}$ inputs, $n_{hidden}$ hidden units, and $n_{out}$ output units<br>
Initialize all network weights to small random numbers (e.g. between -0.5 and 0.5)<br>
Until the termination condition is met:<br>
&nbsp;&nbsp;&nbsp;&nbsp;For each $\langle \overrightarrow{x},\overrightarrow{t} \rangle$  in $training\_examples$, do:<br><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Propogate the input forward through the network:<br><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1) Input instance $\overrightarrow{x}$ to the network and compute the output of $o_u$ of every unit $u$ in the network<br><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Propogate the errors backward through the network:<br><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2) For each network output unit $k$, calculate its error term $\delta_k$:<br>
$$\delta_k \leftarrow o_k(1-o_k)(t_k-o_k)$$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3) For each hidden unit $h$, calculate its error terms $\delta_h$:
$$\delta_h \leftarrow o_h(1-o_h) \sum\limits_{k\in outputs} w_{kh}\delta_k$$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;4a) Update each network weight $w_{ji}$
$$w_{ji} \leftarrow w_{ji}+\bigtriangleup w_{ji}$$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;where
$$\bigtriangleup w_{ji} = \eta \delta_j x_{ji}$$<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;4b) Adding momentum changes the delta equation such that:
$$\bigtriangleup w_{ji} = \eta \delta_j x_{ji} + \alpha \bigtriangleup w_{ji} (n-1)$$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;This serves to make future updates depend partially on the prior updates, i.e., adds momentum.

_Citation: Thomas M. Mitchell. 1997. Machine Learning (1 ed.), Page 98, McGraw-Hill, Inc., New York, NY, USA._
<br><br>
Assigning Variables:<br>
Weights reference: <br>
0: $w_{ca}$<br>
1: $w_{cb}$<br>
2: $w_{c0}$<br>
3: $w_{dc}$<br>
4: $w_{d0}$<br>

In [87]:
import math
weights = [.1,.1,.1,.1,.1]
learning_rate = 0.3
alpha = 0.9
training_examples = {'a':[1,0],'b':[0,1],'d':[1,0]}
for i in range(2):
    print('Observation',i+1)
    for k,v in training_examples.items():
        print(k,v[i])

Observation 1
a 1
b 0
d 1
Observation 2
a 0
b 1
d 0


In [125]:
class ANN:
    def __init__(self, a, b, d):
        self.input_a    = a
        self.input_b    = b
        self.weights_ca = 0.1
        self.weights_cb = 0.1
        self.weights_c0 = 0.1
        self.weights_dc = 0.1
        self.weights_d0 = 0.1
        self.weights_ca_delta = 0
        self.weights_cb_delta = 0
        self.weights_c0_delta = 0
        self.weights_dc_delta = 0
        self.weights_d0_delta = 0
        self.d          = d
        self.output     = []
        self.hidden_layer = []


    def feed_forward(self,itera):
        output_calc = lambda x: 1/(1+math.e**-x)
        #for i in range(len(self.input_a)):
        temp_hidden_val = self.weights_ca * self.input_a[itera]\
                           + self.weights_cb * self.input_b[itera]\
                           + self.weights_c0
        hidden_val = output_calc(temp_hidden_val)
        self.hidden_layer.append(hidden_val)
        temp_out_val = self.weights_dc * self.hidden_layer[itera] + self.weights_d0
        out_val = output_calc(temp_out_val)
        self.output.append(out_val)
        print('output',self.output)
        print('hidden',self.hidden_layer)
        print('a',self.input_a)
        print('b',self.input_b)
        print('d',self.d)

    def backprop(self):
        ca_2 = 0
        cb_2 = 0
        c0_2 = 0
        dc_2 = 0
        d0_2 = 0
        error = []
        # 𝛿𝑘←𝑜𝑘(1−𝑜𝑘)(𝑡𝑘−𝑜𝑘)
        # ℎ←𝑜ℎ(1−𝑜ℎ)∑𝑘∈𝑜𝑢𝑡𝑝𝑢𝑡𝑠 𝑤𝑘ℎ 𝛿𝑘
        hidden_error = []
        for i in range(len(self.output)):
            delta_k = self.output[i]*(1-self.output[i])*(self.d[i]-self.output[i])
            delta_h = self.hidden_layer[i]*(1-self.hidden_layer[i]) * (self.weights_dc * delta_k)
            print('delta h',delta_h)
            print('delta k',delta_k)
            error.append(delta_k)
            hidden_error.append(delta_h)
            #print(delta_k,delta_h)
        
        # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1
        #d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(self.output)))
        #d_weights1 = np.dot(self.input.T,  (np.dot(2*(self.y - self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1)))

        # update the weights with the derivative (slope) of the loss function
        #self.weights1 += d_weights1
        #self.weights2 += d_weights2
        pass

In [126]:
nn = ANN(training_examples['a'],training_examples['b'],training_examples['d'])
nn.feed_forward(0)
nn.backprop()

output [0.5386684799635422]
hidden [0.549833997312478]
a [1, 0]
b [0, 1]
d [1, 0]
delta h 0.0028376060621625463
delta k 0.11464307343435433


#### Problem 4.8