# Linear Regression with `numpy`
- [Why numpy over python](https://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists) ?

## Loading the dataset

Before we can start working on our actual algorithms and models, we have to import our data in the correct format.

In [7]:
# Loading the data
import numpy as np

data_x = np.linspace(1.0, 10.0, 100)[:, np.newaxis]
data_y = np.sin(data_x) + 0.1*np.power(data_x,2) + 0.5*np.random.randn(100,1)
data_x /= np.max(data_x)

## Sanity Checks

So now that we've imported our data we can quickly check if the data has in fact been **loadd correctly.** 
A *simple* way to do that would be to print the shape of the data we've imported. 

So these simple checks to ensure that bigger errors don't pop up later are called *sanity checks*

In [12]:
print (data_x.shape)
print (data_y.shape)

# Adding bias to x
data_x = np.hstack((np.ones_like(data_x), data_x))

(100, 8)
(100, 1)


## Training Data and Testing Data

When training an ML Model a best-practice is to split your available data into two sets. One for training and one for testing and evaluation. 

**Why?**

If a model trains on a set of data and learns it's patterns well enough then it will obviously perform well on the same set of data. 

We should keep the sets mutually exclusive 

In [3]:
# Shuffling data
order = np.random.permutation(len(data_x))
portion = 20

# Splitting data into train and test 
test_x = data_x[order[:portion]]
test_y = data_y[order[:portion]]
train_x = data_x[order[portion:]]
train_y = data_y[order[portion:]]

In [11]:
def get_gradient(w, x, y):
    y_estimate = x.dot(w).flatten()
    # Error = expected_value - predicted_value
    error = (y.flatten() - y_estimate)
    gradient = -(1.0/len(x)) * error.dot(x)
    return gradient, error**2

## Training and Convergence

All Machine Learning or Deep Learning models require training. Training is the phase in which your model/algorithm attempts to *learn* and *understand* the data that you've provided.

The **learning rate** is the rate at which your algorithm learns the data. We'll get into detail about this later but for now we can just say that the learning rate should neither be too high, or too small for good training.

**Convergence** occurs when you've *finished* training your data. Technically there's no way to define when training is done but in essence we define convergence to have occured when your weights don't change by some threshold value between two successive training iterations.

In [5]:
# Initialising a random vector of weights
w = np.random.randn(2)

# Learning rate
alpha = 0.5

# Threshold to terminate learning
tolerance = 1e-5

# Perform Gradient Descent
iterations = 1
while True:
    gradient, error = get_gradient(w, train_x, train_y)
    new_w = w - alpha * gradient
    
    # Stopping Condition
    if np.sum(abs(new_w - w)) < tolerance:
        print ("Converged.")
        break
    
    # Print error every 50 iterations
    if iterations % 100 == 0:
        print ("Iteration: "+str(iterations)+" - Error: "+ str(np.sum(error)))
    
    iterations += 1
    w = new_w

Iteration: 100 - Error: 113.35754457226828
Iteration: 200 - Error: 111.52822837248205
Iteration: 300 - Error: 111.51988941729705
Converged.
