# **Here, I will implement the linear regression algorithm all by myself**

## First, we import everything

In [79]:
import numpy as np
import matplotlib.pyplot as plt

## Then, we define our 'line' that we want to fit to our data

In [80]:
def model(X,w,b):
  return np.dot(X,w) + b

## Now, I define our minimum squared error cost function

In [81]:
def cost_function(X,y,w,b):
  m=X.shape[0]
  predictions=model(X,w,b)
  cost=(1/(2*m))*np.sum((predictions-y)**2)
  return cost

## Now, I define the gradient descent algorithm that we will use to actually get to the mimimum of the error curve

In [82]:
def gradient_descent(X,y,w,b,learning_rate):
  m=X.shape[0]
  predictions=model(X,w,b)

  w_gradient = (1/m)*np.dot(X,(predictions-y))
  b_gradient = (1/m)*np.sum(predictions-y)
  w-=(learning_rate)*(w_gradient)
  b-=(learning_rate)*(b_gradient)

  return w,b

## After this comes the function that will train our model on the data we wish to train it on

In [83]:
def train(X,y,w,b,learning_rate,num_iterations):
  costs=[]
  for i in range(num_iterations):
    w,b=gradient_descent(X,y,w,b,learning_rate)
    cost=cost_function(X,y,w,b)
    costs.append(cost)

    if(i%100 == 0):
      print(f"At iteration number {i}, the cost is {cost}, 'w' is {w} and 'b' is {b}")

  return w,b,costs

## After this, we enter the data that we will train our model on. Here, ive chosen a simple line of
# y = 2*x
## for understanding

In [85]:
X=np.array([1,2,3,4,5,6,7,8,9])
w=np.random.rand()
y=np.array([2,4,6,8,10,12,14,16,18])
b=0
alpha=0.01 #Learning rate
iterations=5000
costs=[]

In [86]:
w,b,costs=train(X,y,w,b,alpha,iterations)

At iteration number 0, the cost is 7.8787577688124495, 'w' is 1.2863561648604787 and 'b' is 0.052217841595574736
At iteration number 100, the cost is 0.0018214240787959616, 'w' is 1.9790961602187234 and 'b' is 0.13153236927906195
At iteration number 200, the cost is 0.0012073775660815587, 'w' is 1.982980684769435 and 'b' is 0.10708993559109235
At iteration number 300, the cost is 0.0008003411199223012, 'w' is 1.9861433548119334 and 'b' is 0.0871895972661529
At iteration number 400, the cost is 0.0005305265943588205, 'w' is 1.9887183113264661 and 'b' is 0.07098730454429629
At iteration number 500, the cost is 0.00035167313076366465, 'w' is 1.9908147681059083 and 'b' is 0.05779585597903532
At iteration number 600, the cost is 0.0002331155350479338, 'w' is 1.992521643932068 and 'b' is 0.04705575158534152
At iteration number 700, the cost is 0.00015452659849980353, 'w' is 1.9939113339626464 and 'b' is 0.038311462296960655
At iteration number 800, the cost is 0.00010243191059321176, 'w' is 

In [89]:
X_test=[200,500,1303]
preds=model(X_test,w,b)
preds

array([ 399.99982886,  999.99956381, 2605.99885437])

# Important learnings

* Since we only have 1 feature, keep `w` as a scalar and not as a vector
* If you want to predict for an array of `X_test`, keep `model` as `np.dot(X,w)` and not `X*w` since that wont work if `X` is anything but a scalar