# Here, we will implement multiple linear regression, which is just linear regression but with multiple features

## First, we begin with the imports required to make all this work

In [1]:
import numpy as np

## For simplicity, I will use the curve
# $y=2x^2+5x$
## so that `w` will be a vector and not a merely scalar

## First, we begin with a simple definition of our model:
## $y=w*x+b$

In [2]:
def model(X,w,b):
  return b+np.dot(w,X.T)

## Then, we define our cost function

In [3]:
def cost_function(X,y,w,b):
  m=X.shape[0]
  y_cap=model(X,w,b)
  cost=0
  cost+=(np.sum((y_cap-y)**2))/(2*m)
  return cost

## Now, we define the gradient descent function

In [4]:
def gradient_descent(X,y,w,b,learning_rate):
  m=X.shape[0]
  y_cap=model(X,w,b)
  w_gradient = (learning_rate)*np.dot((y_cap-y),X)/m
  b_gradient = (learning_rate)*np.sum((y_cap-y))/m
  w-=w_gradient
  b-=b_gradient

  return w,b

## Next, we move onto the actual train function

In [14]:
def train(X,y,w,b,learning_rate,num_iterations):
  costs=[]

  for i in range(num_iterations):
    w,b=gradient_descent(X,y,w,b,learning_rate)
    cost=cost_function(X,y,w,b)
    costs.append(cost)

    if(i%100==0):
      print(f"The cost at iteration number {i} is {cost} and 'w' is {w} and 'b' is {b}")

  return w,b,costs

## Now, we're done with the functions. Lets move on to our training data

In [9]:
X=np.array([[1,1],[4,2],[1,-1],[4,-2],[9,3],[16,4],[16,-4],[9,-3]])
y=np.array([7,18,-3,-2,33,52,12,3])
w=np.random.rand(1,2)
b=0
alpha=0.01
iterations=10000

In [15]:
w,b,costs=train(X,y,w,b,alpha,iterations)

The cost at iteration number 0 is 6.202418867300205e-29 and 'w' is [[2. 5.]] and 'b' is -8.892886427247472e-16
The cost at iteration number 100 is 6.202418867300205e-29 and 'w' is [[2. 5.]] and 'b' is -8.892886427247472e-16
The cost at iteration number 200 is 6.202418867300205e-29 and 'w' is [[2. 5.]] and 'b' is -8.892886427247472e-16
The cost at iteration number 300 is 6.202418867300205e-29 and 'w' is [[2. 5.]] and 'b' is -8.892886427247472e-16
The cost at iteration number 400 is 6.202418867300205e-29 and 'w' is [[2. 5.]] and 'b' is -8.892886427247472e-16
The cost at iteration number 500 is 6.202418867300205e-29 and 'w' is [[2. 5.]] and 'b' is -8.892886427247472e-16
The cost at iteration number 600 is 6.202418867300205e-29 and 'w' is [[2. 5.]] and 'b' is -8.892886427247472e-16
The cost at iteration number 700 is 6.202418867300205e-29 and 'w' is [[2. 5.]] and 'b' is -8.892886427247472e-16
The cost at iteration number 800 is 6.202418867300205e-29 and 'w' is [[2. 5.]] and 'b' is -8.89288

## Above we see that the code works quite well i.e.our model learns the function quite well since the cost quite visibly tends to zero and the `w` vector takes on the coefficients of our function i.e. `[2,5]` and `b` approaches `0`

## But something that isnt visible is that I had to change the training data since the model wasn't learning on the initial data at all.

## The original arrays were:
##`X = [[1,1],[4,2],[1,-1],[25,5],[2401,49],[49,7],[64,8],[2500,50]]`
## `y = [7,18,-3,75,5047,133,168,5250]`

## If you replace `X` and `y` with these arrays, you will see the error message that the `cost`,`w` and `b` are all `nan` i.e.`(not a number)`

## This means that we need to make our model robust enough to handle data in a large range i.e. data that lies in multiple orders of magnitude