# Why did I make this notebook? 

To make a linear regression model that predicts the closing price of Tesla Stock given the features in the dataset. 

In [41]:
# Importing Images 
from IPython.display import Image
import os
!ls ../input/lgpicture

In [42]:
# Importing Images 
from IPython.display import Image
import os
!ls ../input/lossfunction

# What is Linear  Regression ?

It's an algorithm which can be used to build a model which is made from a linear combination of the features/inputs example provided in training. 

It can be defined as a function using the following formula where 'w' and 'b' are the parameters that need to be optimized to find the best values[1]:

In [43]:
Image("../input/lgpicture/LG-Function.png")


Figure 1. Function defining the Linear Regression Alg.[1]

We use a squared error loss function as the objective function where we try to reduce the value or penalty calculated from the loss function as much as possible to find the model that is best at classification. 

This loss function is part of a cost function also known as the Average Loss or Empirical Risk where the values or penalties from the squared loss function are averaged by applying the model to the training data as shown in figure 2:

In [44]:
Image("../input/lossfunction/lossf.png")


Figure 2. The Cost Function/ Empirical Risk

In [45]:
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score

The modules including modules such as numpy and pandas are imported for use in this project.

# 1. Data Analysis:

The data involving Tesla Stocks is read using a panda module(as 'pd') and stored in the variable 'data'. 

In [46]:
data= pd.read_csv('../input/tesla-stock-data-from-2010-to-2020/TSLA.csv')


We will see what this data consists of using the code below:

In [47]:
data.describe()

Table 1. Tesla Data Statistics  

 We see the types of data shown in the dataset with statistics for each data represented in a column as shown in Table 1.

Let's normalize the data, you see; all the values have different ranges, high, lows as shown in Table 1 but we want everything to be in the same range. Let's use min-max scaling for normalizing the data by creating a function with input 'a' for it:

In [48]:
# Min-Max Scaling for Normalization

def minmax_scaling(a):
    xx=a
    a_std = (xx- xx.min(axis = 0)) / (xx.max(axis = 0) - xx.min(axis = 0))
    return a_std

First, we will make a copy of the original dataset which we will call the normalized dataset. Then we will feed every column in the copied dataset into the minmax scaling function as input. 

The normalized dataset is described using the .describe() function. 


In [49]:
# making copy of data using .copy() function

data_copy= data.copy()

# min-max normalization for features of the data only

data_copy['Open']= minmax_scaling(data_copy['Open'])
data_copy['High']= minmax_scaling(data_copy['High'])
data_copy['Low']= minmax_scaling(data_copy['Low'])
data_copy['Adj Close']= minmax_scaling(data_copy['Adj Close'])
data_copy['Volume']= minmax_scaling(data_copy['Volume'])
data_copy['Close']= minmax_scaling(data_copy['Close'])

# Describing the normalized data

data_copy.describe()

Table 2. Normalized Dataset. 

Table 2 shows the normalized features where the values range from a minimum of 0 to a maximum of 1. 


Let's visualize the dataset using pair-plots from the Seaborn module. 

# 2. Splitting the Data: 

We will now split the data using the train-test split function.

In [50]:
x= data_copy[['Open','High', 'Low', 'Adj Close', 'Volume']].values
y= data_copy['Close'].values 

x_train, x_test, y_train, y_test= train_test_split(x,y,test_size= 0.2, random_state=0)




# 3. Model Fitting 

We will then fit the data into a linear regression model[2]:

In [51]:
regressor = LinearRegression()
regressor.fit(x_train, y_train)



**Predictions are made using the Test Set :**

In [52]:
# Y_test predictions:
y_pred = regressor.predict(x_test)


We will now compute the accuracy of the regression model by getting the accuracy score when comparing the predicted test value('y_pred') with the actual test value('y_test') using the '.score' function[3]:

In [53]:
print('Accuracy on Test Set(%):',regressor.score(x_test, y_test)*100)



The Accuracy achieved is 100 percent showing that the parameters of the model were optimized well when the model was trained with the training data.

# References 

1. https://themlbook.com/

2. https://www.kaggle.com/code/divan0/multiple-linear-regression/notebook

3. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.score