## Problem 5

In this problem we will expand our knowledge of Linear regression to Non-Linear Regression. The intitial part of the problem remain similar where user has to extract the data from given file and segregate it into 2 variables. User can also plot these recovered points (just as did in Problem 1) and see that a straight line would not be the best possible fit for given dataset. Therefore we resort to Non-Linear Regression.

### Basics of NLR

Instead of creating new input features $ x_i'$ from old ones $ x_i $ (which we did in LR by adding a column of 1s to every element of input datapoint), it is common to use a feature
function which maps from $ x_i$ to $ x_i'$. For eg, the feature vector for NLR of degree 2 looks like below:

$$
\phi = \left(\begin{array}{cc} 
x_i ²\\
x_i \\
1
\end{array}\right)
$$

The objective of gradient descent remain the same which is to minimize the squared loss
$$
J(w) = \frac{1}{m}\sum_{i=1}^n \big( h_w(x^{(i)}) - y^{(i)}\big)^2
$$
where the hypthesis function $h_w(x)$ is given by the linear model
$$
h_w(x) = w^T\phi = w_0 + w_1 x + w_2 x^2
$$

In [None]:
##########################################
##       Created by: SAHIL ARORA        ## 
##  HOCHSCHULE- RAVENSBURG WEINGARTEN   ##
##          Date: 15.09.2020            ##
##########################################

# Dependencies imported

# For vector computations and notations
import numpy as np 

# For Plotting
import matplotlib.pyplot as plt

# Defining Solving Parameters

# Range 0.0001 to 0.00001
alpha = 1 * 10 ** -4
acc = 10 ** -5

degree = 2

User can refer to Problem 1 for code on how to read data from given .txt file.

### Implementing gradient descent

Now that we have the hypothesis and the square loss function we can implement the gradient descent routine. A function $\nabla J(w)$ which returns the gradient of the sqaure loss function. The gradient is just a vector with all the partial derivatives
$$
\nabla J(w) = \bigg[\frac{\partial J(w)}{\partial w_1} , \dotsc, \frac{\partial J(w)}{\partial w_d} \bigg]^T
$$
where
$$
\frac{\partial J(w)}{\partial w_j} = \frac{1}{m} \sum_{i=1}^m \big( h_w(x^{(i)}) - y^{(i)}\big) x_j^{(i)}
$$

Further, we will define the following functions:
1. Producing a feature function (\phi) from given input point.
2. Produce a hypothesis value (which is dot product of \phi and weight vector)
3. Function to compute the gradient (as denoted in equation above)

In [None]:

############################
##    Feature Function    ##
############################

def feature(x):
    global degree
    # produces a feature vector with given inputs 

    phi_temp = np.array([])
    
    # depending on degree, make a feauture vector with powers of input x
    for p in range(degree + 1):
        phi_temp = np.append(phi_temp, [x ** p]) # storing in an array

    return phi_temp



In [None]:
############################
##    Hypothesis Func     ##
############################

def hypo(ix,w):
    # function for dot product
    h = np.dot(ix,w)

    return h

In [None]:
############################
##   Gradient Function    ##
##  (Sqaured Loss Func)   ##
############################

def sq_gradient(x,w,y):
    
    # Valid for sqaure loss only
    # See analytical solution
    
    sq_grad = feature(x)*(hypo(feature(x), w) - y) 
    
    return sq_grad


Once all the functions are defined then the job is pretty straight forward as seen in Problem 1. The gradient descent function can be written with weight changes after each iteration. Implement gradient descent in the function `gradient_descent(x,y,w,acc,alpha)`.
Recall the update rule of gradient descent which is
$$
w^{(k+1)} = w^{(k)} - \alpha \nabla J(w^{(k)})
$$


In [None]:
############################
##    Gradient Descent    ##
############################

def gradient_decent(x,y,w,acc):
    global itr, m

    delta_w = np.ones(degree + 1) # initialized randomly

    itr = 0
    while all(acc < abs(a) for a in delta_w):
        
        sq_g = 0

        # Compute Cumulative gradient
        for a in range(m):

            # Gradient computation is Normalized by number of data points available
            sq_g = sq_g + sq_gradient(x[a],w,y[a])/m
            a = a+1
       
        delta_w = alpha * sq_g 
        
        # alpha is learning rate
        w = w - (delta_w)
        
        itr = itr+1
    
    return w, itr

After the convergency has reached, we print out the result and plot the graph so visualize the results. Note that we sort the input datapoints to get a continous polynomial graph.

In [None]:
weight = np.zeros(degree+1)

sq_weight, sq_itr = gradient_decent(x_data, y_data, weight, acc)
print('The optimized weight vector is {}.'.format(sq_weight))
print('Solving criteria with Sq Loss Func: Convergency = {} and Learining Rate = {}'.format(acc,alpha))
print('Total iterations done = {}'.format(sq_itr))


############################
##  Plot Regression Line  ##
############################

# sorint to get a continuous polynomial output
x_data.sort()

new_y = np.array([])
for every in x_data:
    new_y = np.append(new_y, [hypo(feature(every), sq_weight)]) 

plt.plot(x_data, new_y, '-')
plt.legend(['Training data', 'Non - Linear regression'])
plt.show()