In [1]:
import numpy as np

### 1. Numpy implimentation of Gradient Descent

- Let's say our cost function is sum of squared error devided by 2(for calculation convience): $$E = \frac{1}{2}\sum_\mu(y^{\mu}-\hat{y}^{\mu})^2$$
- $$E = \frac{1}{2}\sum_\mu(y^{\mu}-\hat{y}^{\mu}) = \frac{1}{2}\sum_\mu(y^{\mu}-f(\sum_iw_ix_{i}^{mu}))^2$$
- $\mu$ is the number of records, $i$ is the number of variables 
- let first take a look at the gradient for one record
- now, take the derative of out cost function in terms of $w_i$:
$$\frac{d}{dw_i}=-(y-\hat{y})f'(h)x_i$$
- $f'(h)$ is the derative of the activation function, $\eta$ is the learning rate
- $$\Delta w_i = \eta(y-\hat{y})f'(h)x_i$$
- we define error term: $$\delta = (y-\hat{y})f'(h)$$
- now our weight update is: $$w_i = w_i + \eta\delta x_i$$


#### Now let's implement it 

In [3]:
# Defining the sigmoid function for activations
def sigmoid(x):
    return 1/(1+np.exp(-x))     # is it basically what used in logistic regression 
# Derivative of the sigmoid function
def sigmoid_prime(x):
    return sigmoid(x) * (1 - sigmoid(x))


##### calssification of school admition 

In [10]:
import numpy as np
from data_prep import features, targets, features_test, targets_test

print('features size:',features.shape)
print('target size:', targets.shape)

features size: (360, 6)
target size: (360,)


##### in real world, we will vectorized this, but for now, let's just iterate through them 

In [16]:
# Use to same seed to make debugging easier
np.random.seed(42)
n_records, n_features = features.shape                                ## get number of observation and number of variables 
last_loss = None                                                      ## set cost function to be none as initiation 
## it is common to initiat random small weights as 1 / n_features**.5
weights = np.random.normal(scale=1 / n_features**.5, size=n_features) # Initialize weights
# Neural Network hyperparameters
epochs = 1000           ## number of iteration                                              
learnrate = 0.5         ## learning rate 

for e in range(epochs):                        # run 1000 iterations
    del_w = np.zeros(weights.shape)
    for x, y in zip(features.values, targets): # Loop through all records, x is the input, y is the target
        output = sigmoid(np.dot(x,weights))          # Calculate the output
        error = y-output                             # calculate the error
        error_term = error * output * (1 - output)   # error term \eta
        del_w += error_term*x                        # and add it to the total weight change
        ## del_w is a vector of 6 items 
        ## end if inner loop
        
    weights += learnrate * del_w / n_records         # take the average change and update it 
    
    # Printing out the mean square error on the training set
    if e % (epochs / 5) == 0:   ## print out 5 results 
        print(e)
        out = sigmoid(np.dot(features, weights))
        loss = np.mean((out - targets) ** 2)
        if last_loss and last_loss < loss:
            print("Train loss: ", loss, "  WARNING - Loss Increasing")
        else:
            print("Train loss: ", loss)
        last_loss = loss

0
Train loss:  0.26276093849966364
200
Train loss:  0.20084292908073417
400
Train loss:  0.19779851396686018
600
Train loss:  0.19723507746241067
800
Train loss:  0.19706766341315074


In [17]:
# Calculate accuracy on test data
tes_out = sigmoid(np.dot(features_test, weights))
predictions = tes_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

Prediction accuracy: 0.725
