# Machine Learning with Python - Linear Regression
The basics of prediction with linear models

### closed form, gradient descent, and sklearn
Working through all the methods to conduct OLS optimization for linear regression. Ending with **sci-kit learn** the most popular framework for fiting popular small data models

In [None]:
!pip install yfinance

In [None]:
import numpy as np
import pandas as pd
from datetime import datetime
import yfinance as yf

let's use some finance data collected from the Yahoo! finance API using [yfinance](https://github.com/ranaroussi/yfinance). for the linear regression example. We will compare Microsoft monthly returns (MSFT) vs. the market (SPY or S&P 500).

In [None]:
start_date = datetime(2010, 1, 1)
end_date = datetime(2021, 12, 31)

stock_data = yf.download('SPY MSFT', 
                   interval = "1mo",
                   start=start_date, 
                   end=end_date)

stock_data.head()

In [None]:
stock_data.columns

In [None]:
stock_data_adj_close = stock_data.iloc[:, stock_data.columns.get_level_values(0) == 'Adj Close'].pct_change().dropna()
stock_data_adj_close.columns = ['MSFT','SPY']
stock_data_adj_close.head()


In [None]:
from seaborn import scatterplot

In [None]:
scatterplot(data = stock_data_adj_close, x = 'MSFT', y = 'SPY')

Let's look at creating our linear regession model using closed form [OLS model](https://towardsdatascience.com/manually-computing-coefficients-for-an-ols-regression-using-python-50d8e413de)

In [None]:
x = stock_data_adj_close.SPY.values

ones = np.ones(len(x))

x = np.vstack((ones,x))

x = x.T

y = stock_data_adj_close.MSFT.values

In [None]:
#closed form calculation
betas = np.linalg.inv(x.T @ x) @ x.T @ y
print(betas)

Now lets do the same thing but with [**gradient descent.**](https://towardsdatascience.com/linear-regression-using-gradient-descent-97a6c8700931)

In [None]:
x = x[:,1] #remove the ones column for this part

In [None]:
# initalize the parameters at 0 (not always a good idea!)
b_1 = 0.01
b_0 = 0.01

alpha = 0.1  # The learning Rate
epochs = 50000  # The number of descent iterations

n = float(len(x)) # Number of elements in x

# Gradient Descent 
for i in range(epochs): 
    y_pred = b_1*x + b_0  # The current predicted value of y
    D_b_1 = (-2/n) * sum(x * (y - y_pred))  # Derivative wrt b_1
    D_b_0 = (-2/n) * sum(y - y_pred)  # Derivative wrt b_0
    b_1 = b_1 - alpha * D_b_1  # Update b_1
    b_0 = b_0 - alpha * D_b_0  # Update b_0
    
print (b_0, b_1)

Now let's see how this is done using sklearn

In [None]:
from sklearn.linear_model import LinearRegression
from matplotlib import pyplot as plt

In [None]:
#keep in mind we would need to split this data. However, this data is timeseries so we can only test it by using it to predict forward.
regr = LinearRegression()
x = x.reshape(-1, 1) #shape needed for sklearn
regr.fit(x, y)
print(regr.score(x, y)) #gives us r^2

In [None]:
plt.scatter(x, y,color='b')
plt.plot(x, regr.predict(x),color='k')

plt.show()

In [None]:
#predict going forward
start_date = datetime(2022, 1, 1)
end_date = datetime(2022, 12, 31)

stock_data = yf.download('MSFT SPY', 
                   interval = "1mo",
                   start=start_date, 
                   end=end_date)

stock_data_adj_close = stock_data.iloc[:, stock_data.columns.get_level_values(0) == 'Adj Close'].pct_change().dropna()

stock_data_adj_close.columns = ['MSFT','SPY']

x = stock_data_adj_close.SPY.values
x = x.reshape(-1,1)

y = stock_data_adj_close.MSFT.values

In [None]:
from sklearn.metrics import r2_score

r2_score(y, regr.predict(x))

In [None]:
plt.scatter(x, y, color='b')

plt.show()