# LINEAR REGRESSION
# Loss
When we think about how we can assign a slope and intercept to fit a set of points, we have to define what the best fit is.

For each data point, we calculate loss, a number that measures how bad the model’s (in this case, the line’s) prediction was. You may have seen this being referred to as error.

We can think about loss as the squared distance from the point to the line. We do the squared distance (instead of just the distance) so that points above and below the line both contribute to total loss in the same way:


In this example:

* For point A, the squared distance is 9 (3²)
* For point B, the squared distance is 1 (1²)
* So the total loss, with this model, is 10. If we found a line that had less loss than 10, that line would be a better       model for this data.

In [1]:
x = [1, 2, 3]
y = [5, 1, 3]

#y = x
m1 = 1
b1 = 0

#y = 0.5x + 1
m2 = 0.5
b2 = 1

m3 = 5
b3 = 2

y_predicted1 = [m1*x_val + b1 for x_val in x]
y_predicted2 = [m2*x_val + b2 for x_val in x]
y_predicted3 = [m3*x_val + b3 for x_val in x]
total_loss1 = 0
total_loss2 = 0
total_loss3 = 0

for i in range(len(y)):
  total_loss1 += (y[i] - y_predicted1[i])**2
  total_loss2 += (y[i] - y_predicted2[i])**2
  total_loss3 += (y[i] - y_predicted3[i])**2
  
print(total_loss1, total_loss2,total_loss2)
better_fit = 2

17 13.5 13.5


# Gradient Descent for Intercept
***
To find the gradient of loss as intercept changes, the formula comes out to be:

$
\frac{2}{N}\sum_{i=1}^{N}-(y_i-(mx_i+b)) 
$

* N is the number of points we have in our dataset
* m is the current gradient guess
* b is the current intercept guess
Basically:

we find the sum of y_value - (m*x_value + b) for all the y_values and x_values we have
and then we multiply the sum by a factor of -2/N. N is the number of points we have.
***

In [2]:
def get_gradient_at_b(x,y,m,b):
  # diff = ([lambda x1, y1: y1-((m*x1)+b) for x1, y1 in range(x,y) ])
  diff =0
  for i in range(len(x)):
    x_val = x[i]
    y_val = y[i]
    diff+= y_val - ((m*x_val)+b)
  N = len(x)
  b_gradient = -2/N * diff
  
  print(b_gradient)
  
  return b_gradient

# Gradient Descent for Slope
***
We have a function to find the gradient of b at every point. To find the m gradient, or the way the loss changes as the slope of our line changes, we can use this formula:

$ \frac{2}{N}\sum_{i=1}^{N}-x_i(y_i-(mx_i+b)) $

Once more:

* N is the number of points you have in your dataset
* m is the current gradient guess
* b is the current intercept guess
* To find the m gradient:

we find the sum of x_value * (y_value - (m*x_value + b)) for all the y_values and x_values we have
and then we multiply the sum by a factor of -2/N. N is the number of points we have.
Once we have a way to calculate both the m gradient and the b gradient, we’ll be able to follow both of those gradients downwards to the point of lowest loss for both the m value and the b value. Then, we’ll have the best m and the best b to fit our data!

***

In [3]:
def get_gradient_at_m(x, y, m, b):
    diff = 0
    N = len(x)
    for i in range(N):
      y_val = y[i]
      x_val = x[i]
      diff += x_val*(y_val - ((m * x_val) + b))
    m_gradient = -2/N * diff
    return m_gradient

# Put it Together
Now that we know how to calculate the gradient, we want to take a “step” in that direction. However, it’s important to think about whether that step is too big or too small. We don’t want to overshoot the minimum error!

We can scale the size of the step by multiplying the gradient by a learning rate.

To find a new b value, we would say:

# new_b = current_b - (learning_rate * b_gradient) 
where current_b is our guess for what the b value is, b_gradient is the gradient of the loss curve at our current guess, and learning_rate is proportional to the size of the step we want to take.

In a few exercises, we’ll talk about the implications of a large or small learning rate, but for now, let’s use a fairly small value.

In [4]:
def get_gradient_at_b(x, y, b, m):
  N = len(x)
  diff = 0
  for i in range(N):
    x_val = x[i]
    y_val = y[i]
    diff += (y_val - ((m * x_val) + b))
  b_gradient = -(2/N) * diff  
  return b_gradient

def get_gradient_at_m(x, y, b, m):
  N = len(x)
  diff = 0
  for i in range(N):
      x_val = x[i]
      y_val = y[i]
      diff += x_val * (y_val - ((m * x_val) + b))
  m_gradient = -(2/N) * diff  
  return m_gradient

#Your step_gradient function here
def step_gradient(x, y, b_current, m_current):
    b_gradient = get_gradient_at_b(x, y, b_current, m_current)
    m_gradient = get_gradient_at_m(x, y, b_current, m_current)
    b = b_current - (0.01 * b_gradient)
    m = m_current - (0.01 * m_gradient)
    return [b, m]

months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
revenue = [52, 74, 79, 95, 115, 110, 129, 126, 147, 146, 156, 184]

# current intercept guess:
b = 0
# current slope guess:
m = 0

b, m = step_gradient(months, revenue, b, m)
print(b, m)

2.355 17.78333333333333


# Convergence
How do we know when we should stop changing the parameters m and b? How will we know when our program has learned enough?

To answer this, we have to define convergence. Convergence is when the loss stops changing (or changes very slowly) when parameters are changed.

Hopefully, the algorithm will converge at the best values for the parameters m and b.

# Learning Rate
We want our program to be able to iteratively learn what the best m and b values are. So for each m and b pair that we guess, we want to move them in the direction of the gradients we’ve calculated. But how far do we move in that direction?

We have to choose a learning rate, which will determine how far down the loss curve we go.

A small learning rate will take a long time to converge — you might run out of time or cycles before getting an answer. A large learning rate might skip over the best value. It might never converge! Oh no!


Finding the absolute best learning rate is not necessary for training a model. You just have to find a learning rate large enough that gradient descent converges with the efficiency you need, and not so large that convergence never happens.

# LINEAR REGRESSION
## Put it Together II 
At each step, we know how to calculate the gradient and move in that direction with a step size proportional to our learning rate. Now, we want to make these steps until we reach convergence.

1.
We have all of the functions we have defined throughout the lesson.

Now, let’s create a function called gradient_descent() that takes in x, y, learning_rate, and a num_iterations.

For now, return [-1,-1].

2.
In the function gradient_descent(), create variables b and m and set them both to zero for our initial guess.

Return b and m from the function.

3.
Update your step_gradient() function to take in the parameter learning_rate (as the last parameter) and replace the 0.01s in the calculations of b_gradient and m_gradient with learning_rate.


4.
Let’s go back and finish the gradient_descent() function.

Create a loop that runs num_iterations times. At each step, it should:

Call step_gradient() with b, m, x, y, and learning_rate
Update the values of b and m with the values step_gradient() returns.
5.
Outside of the function, the line that calls gradient_descent on months and revenue, with a learning rate of 0.01 and 1000 iterations.

It stores the results in variables called b and m.

6.
the lines that will plot the result to the browser.

In [None]:

import matplotlib.pyplot as plt

def get_gradient_at_b(x, y, b, m):
  N = len(x)
  diff = 0
  for i in range(N):
    x_val = x[i]
    y_val = y[i]
    diff += (y_val - ((m * x_val) + b))
  b_gradient = -(2/N) * diff  
  return b_gradient

def get_gradient_at_m(x, y, b, m):
  N = len(x)
  diff = 0
  for i in range(N):
      x_val = x[i]
      y_val = y[i]
      diff += x_val * (y_val - ((m * x_val) + b))
  m_gradient = -(2/N) * diff  
  return m_gradient

#Your step_gradient function here
def step_gradient(b_current, m_current, x, y, learning_rate):
    b_gradient = get_gradient_at_b(x, y, b_current, m_current)
    m_gradient = get_gradient_at_m(x, y, b_current, m_current)
    b = b_current - (learning_rate * b_gradient)
    m = m_current - (learning_rate * m_gradient)
    return [b, m]
  
#Your gradient_descent function here:  
def gradient_descent(x, y, learning_rate, num_iterations):
  b = 0
  m = 0
  for i in range(num_iterations):
    b, m = step_gradient(b, m, x, y, learning_rate)
  return [b,m]  

months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
revenue = [52, 74, 79, 95, 115, 110, 129, 126, 147, 146, 156, 184]

#Uncomment the line below to run your gradient_descent function
b, m = gradient_descent(months, revenue, 0.01, 1000)

#Uncomment the lines below to see the line you've settled upon!
y = [m*x + b for x in months]

plt.plot(months, revenue, "o")
plt.plot(months, y)

plt.show()



# Use Your Functions on Real Data
We have constructed a way to find the “best” b and m values using gradient descent! Let’s try this on the set of baseball players’ heights and weights that we saw at the beginning of the lesson.

1.
Run the code in script.py.

This is a scatterplot of weight vs height.

2.
We have imported your gradient_descent() function. Call it with parameters:

* X
* y
* num_iterations of 1000
* learning_rate of 0.0001
* Store the result in variables called b and m.


3.
Create a list called y_predictions. Set it to be every element of X multiplied by m and added to b.

The easiest way to do this would be a list comprehension:

new_y = [element*slope + intercept for element in y]


4.
Plot X vs y_predictions on the same plot as the scatterplot.



In [None]:

#from gradient_descent_funcs import gradient_descent
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("height_weight.csv")

X = df["height"]
y = df["weight"]
b, m = gradient_descent(X, y, num_iterations = 1000, learning_rate = 0.0001 )
y_predictions = [m*x + b for x in X]
plt.plot(X, y, 'o')
#plot your line here:
plt.plot(X, y_predictions)

plt.show()

# LINEAR REGRESSION
## Scikit-Learn
### Congratulations! You’ve now built a linear regression algorithm from scratch.

Luckily, we don’t have to do this every time we want to use linear regression. We can use Python’s scikit-learn library. Scikit-learn, or sklearn, is used specifically for Machine Learning. Inside the linear_model module, there is a LinearRegression() function we can use:

from sklearn.linear_model import LinearRegression
You can first create a LinearRegression model, and then fit it to your x and y data:

### line_fitter = LinearRegression()
### line_fitter.fit(X, y)
The .fit() method gives the model two variables that are useful to us:

the line_fitter.coef_, which contains the slope
the line_fitter.intercept_, which contains the intercept
We can also use the .predict() function to pass in x-values and receive the y-values that this line would predict:

### y_predicted = line_fitter.predict(X)

Note: the num_iterations and the learning_rate that you learned about in your own implementation have default values within scikit-learn, so you don’t need to worry about setting them specifically!

In [None]:

from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import numpy as np

temperature = np.array(range(60, 100, 2))
temperature = temperature.reshape(-1, 1)
sales = [65, 58, 46, 45, 44, 42, 40, 40, 36, 38, 38, 28, 30, 22, 27, 25, 25, 20, 15, 5]
line_fitter = LinearRegression()
line_fitter.fit(temperature, sales)
sales_predict  = line_fitter.predict(temperature)
print(sales_predict)
plt.plot(temperature, sales, 'o')
plt.plot(temperature, sales_predict)
plt.show()