## Multivariable Linear Regression - From Scratch

<br><br>


This project is about **coding from scratch, a machine learning algorithm, that can perform multivariable linear regression** (MLR).

There is a heavy focus on explaining what's going on, step by step, and the code/comments are written as simply as I could make them, to help demonstrate my understanding of machine learning, and MLR. 

There is also a PDF in the same folder as this file,  that explains the underlying mathematics ("MLR Explanation - Includes Math.pdf")

Since the code/comments in this project are fairly basic, the overall project is more suited towards anyone wanting to gauge how well I understand the basics of machine learning, or for anyone who may not be too familiar with how machine learning handles multivariable linear regression, and what the underlying code/mathematics is like.

Now, lets begin:

<br>

*Note: All raw data involved is randomly generated, and is quantitative in nature (so, really simple data). The reason for this, is to ensure that focus is kept on understanding how machine learning works here, as opposed to how to properly clean data, and perform feature engineering and scaling.*

<br><br>

In [1]:
# This is a from scratch project, but we do briefly use the random and pandas modules, just to generate some raw data and 
# display it neatly. 
import random as rm
import pandas as pd


# Additionally, SymPy IS used, but ONLY to help demonstrate that the from scratch code works correctly.
import sympy as sp


<br><br><br>
Now, first, we'll determine how much data we have, specifically the number of independent variables (also known as the 
features/characteristics of something) and the number of instances of data, (how many of that something we have).

(For example, if our data is about houses, and we have 12 independent variables, and 100 instances of data,  then that means we have 100 houses in our data, and the same 12 things have been measured/found for each one of these houses, like: "land size", "value", front footage", "number of bedrooms", etc)

For this demonstration, I'll be going with 12 independent variables, and 100 instances of data, but this can be changed
to your liking (more data means a longer runtime)

In [2]:
n_vars = 12
n_instncs = 100


<br><br><br>
Now, generating values for each of the independent variables. 

The values for all variables will be stored in a list of lists. 
 
Also creating "scales", which will have a bunch of different scales (ranging from millionths, to millions), to help ensure 
our variables are on different scales relative to each other, as commonly found in real life data (for example,
the variable "house_value", is generally on the scale of 100,000 (1e5), to 1,000,000 (1e6), whereas
"house_age" is generally on the scale of 1 (1e0), to 100 (1e2).

In [3]:
data = []
scales = [1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6]


# Iterate through each independent variable:
for i in range(0, n_vars):
    
    # Create a list within data, for that variable
    data.append([])

    # Randomly generate values for that variable, ensuring it has a random scale and range.
    
    # Pick the starting scale (can't be millions, otherwise that variable will be full of identical values)
    n = rm.randint(0, len(scales)-2)
    # Pick the final scale (must be larger than starting scale)
    m = rm.randint(n+1, len(scales)-1)
    
    min = scales[n]
    max = scales[m]
    
    for j in range(0, n_instncs):
        rndm_num = rm.uniform(min, max)
        data[i].append(rndm_num)

        
        
# Note: "scales" could have been created using a list comprehension (geometric sequence), but as this project focuses on 
# simplicity and explanation, a simpler, less scalable approach was used.



<br><br><br>
Now, we'll take a look at our data, and see if the variables are actually on different scales relative to each other.

*Note: We'll put our data in a pandas dataframe SOLELY for presentation/aesthetic purposes, then delete the dataframe, as this is a "from scratch" project.

In [4]:
# Creating a dataframe from our data. We created variable by variable, NOT instance by instance, so the pandas dataframe
# will have each variable be a row, not a column, which is against convention, so we'll have to swap rows and columns 
# (transpose).

df = pd.DataFrame(data)
df = df.transpose()

# Giving the columns (variables), the correct names, for aesthetic/clarity:
df.columns=["x{}".format(i) for i in range(0, 12)]

# Displaying values in scientific notation, to 2 decimal places
pd.set_option('display.float_format', '{:.2E}'.format)
df

Unnamed: 0,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11
0,4.06E+04,3.23E+01,7.31E+05,8.86E+03,5.84E+04,6.26E+02,6.97E-02,2.91E+04,2.12E+05,3.64E+00,4.68E-04,8.75E+05
1,7.77E+04,5.77E+01,8.55E+05,7.41E+03,5.37E+04,6.52E+02,2.50E-02,2.15E+04,7.12E+05,5.43E+00,7.57E-04,4.94E+05
2,1.51E+04,4.12E+01,7.70E+05,3.57E+03,6.12E+04,6.80E+02,7.46E-02,2.73E+04,4.79E+05,3.00E+00,1.58E-04,5.39E+05
3,7.55E+04,7.83E+01,5.01E+05,6.86E+03,6.72E+04,1.36E+02,8.47E-02,6.97E+04,1.96E+05,5.24E+00,1.32E-04,4.18E+04
4,6.18E+04,9.82E+01,3.24E+05,8.35E+03,6.58E+04,6.70E+02,5.67E-02,5.55E+04,5.03E+05,2.25E+00,4.38E-04,3.27E+05
...,...,...,...,...,...,...,...,...,...,...,...,...
95,6.30E+03,2.80E+01,6.85E+05,3.04E+03,7.28E+04,1.08E+02,8.71E-02,1.74E+04,1.49E+05,3.43E+00,9.37E-04,8.81E+05
96,2.06E+04,8.41E+01,2.39E+05,4.81E+03,3.85E+04,3.21E+02,4.58E-02,5.94E+04,8.50E+05,1.93E+00,4.15E-04,9.25E+05
97,5.02E+03,9.13E+01,9.73E+05,2.02E+03,1.03E+04,3.95E+02,5.98E-02,4.76E+04,3.17E+05,1.43E+00,6.92E-04,3.51E+05
98,8.31E+04,8.44E+01,1.08E+05,4.60E+03,6.48E+04,9.40E+02,8.58E-02,9.24E+04,4.92E+05,8.15E+00,3.00E-04,3.66E+05


<br><br><br>
As we can see from our data above, the variables are on different scales relative to each other.
(Some are on the higher scales, some are on the lower scales, which ones specifically is determined randomly)

In [5]:
# We no longer need the dataframe (we only used it for aesthetic presentation), so we'll now delete it.
del df

<br><br><br>
Now that we have our randomly generated raw data, we'll create a completely MADE UP mathematical relationship, between our 
independent variables (x0, x1, x2, etc), and our dependent variable, which we'll call y. 

This relationship will be LINEAR, so it'll look like this: y = a*x0 + c*x1 + d*x2 + e*x3 + ... + b, 

All the coefficients (a, c, d, e, etc) will be randomly generated, from a range of 1 to 100, as well as the constant "k" at the end. 


In [6]:
coeffs = []

# Randomly generate coefficients
for i in range(0, n_vars):
    coeffs.append(rm.randint(1, 100))

# Randomly generate constant
k = rm.randint(1, 100)


<br><br><br>
So, our randomly generated, MADE UP mathematical relationship, is shown below:

In [7]:
rel_str = "y ="
for i in range(0, len(coeffs)):
    rel_str += " {}*x{} +".format(coeffs[i], i)

rel_str += " {}".format(k)
print("The made up mathematical relationship is:\n")
print(rel_str)


The made up mathematical relationship is:

y = 93*x0 + 9*x1 + 7*x2 + 82*x3 + 3*x4 + 25*x5 + 3*x6 + 93*x7 + 1*x8 + 85*x9 + 41*x10 + 27*x11 + 25


<br><br><br>
Now that we've generated all the coefficients and the constant, we can calculate the values for the made up dependent variable, y:

In [8]:
dep_data = []

# Each instance is a row in our dataframe:
for i in range(0, n_instncs):
    
    dep_value = 0
    
    # Iterate through a specific column/instance
    for j in range(0, n_vars):
        
        term = coeffs[j]*data[j][i]
        dep_value += term
    
    dep_value += k
    dep_data.append(dep_value)
    
    
# Displaying the first 5 values of y:
print("The first 5 values of y are: \n")
print(["{:.2E}".format(n) for n in dep_data[:5]])


The first 5 values of y are: 

['3.64E+07', '3.01E+07', '2.48E+07', '1.91E+07', '2.34E+07']


<br><br><br>
Now, we bring our machine learning algorithm into play. It's supposed to create a model, that can predict what the dependent variable y is. First, we'll have it randomly guess what the coefficients are (conventionally referred to as weights now, so w0, w1, w2, w3, etc), as well as the constant "k" (conventionally referred to as bias now)

In [9]:
guess_weights = []

for i in range(0, n_vars):
    guess_weights.append(rm.randint(1, 100))
    
guess_bias = rm.randint(1, 100)

# Now, comparing prediction and actual model:
model_str = "y ="
for i in range(0, len(guess_weights)):
    model_str += " {}*x{} +".format(guess_weights[i], i)

model_str += " {}".format(guess_bias) 

print("\nThe prediction model:")
print(model_str)

print("\nThe actual mathematical relationship is: ")
print(rel_str)



The prediction model:
y = 23*x0 + 40*x1 + 72*x2 + 43*x3 + 54*x4 + 56*x5 + 10*x6 + 2*x7 + 1*x8 + 1*x9 + 95*x10 + 95*x11 + 35

The actual mathematical relationship is: 
y = 93*x0 + 9*x1 + 7*x2 + 82*x3 + 3*x4 + 25*x5 + 3*x6 + 93*x7 + 1*x8 + 85*x9 + 41*x10 + 27*x11 + 25


<br><br><br>
Now that we've gotten our machine learning algorithm (it hasn't done any actual "learning" yet) to create a prediction model 
for us, we'll use the model to make some predictions (we can already see from above that it should be quite inaccurate)

*Note: We'll be making quite a few predictions, so we'll create a function, just to cut down on repetitive code.*

In [10]:
def predict(weights, bias):
    
    predictions = []

    # Iterating across all columns
    for i in range(0, n_instncs):

        predict_val = 0

        # Iterating through a specific column/instance
        for j in range(0, n_vars):

            term = weights[j]*data[j][i]
            predict_val += term

        predict_val += bias
        predictions.append(predict_val)
    
    return predictions




# Displaying the first 5 PREDICTED values of y:

predictions = predict(guess_weights, guess_bias) 
print("\nThe first 5 PREDICTED values of y: ")
print(["{:.2E}".format(n) for n in predictions[:5]])
print("\n")



# Displaying the first 5 differences (between predicted and actual): 

diff_first5 = [predictions[i] - dep_data[i] for i in range(0, 5)]
print("Difference between the first 5 predicted values of y, and the first 5 ACTUAL values of y: ")
print(["{:.2E}".format(n) for n in diff_first5])
print("\n")



# Displaying the differences as proportions of corresponding actual values:

proport_diff = [diff_first5[i] / dep_data[i] for i in range(0, 5)]
print("Proportional Difference: ")
print([ "{:g}%".format(n*100) for n in proport_diff])



The first 5 PREDICTED values of y: 
['1.41E+08', '1.14E+08', '1.11E+08', '4.60E+07', '6.04E+07']


Difference between the first 5 predicted values of y, and the first 5 ACTUAL values of y: 
['1.04E+08', '8.42E+07', '8.62E+07', '2.69E+07', '3.70E+07']


Proportional Difference: 
['286.545%', '280.261%', '346.75%', '140.913%', '157.887%']


<br><br><br>
As we can see, the predictions are pretty far off, even when we've randomly generated the weights/bias, in the exact same way, as the coefficients/constant in the made up mathematical relationship (randomly picked integers between 1 and 100).

So, it seems our model needs to be improved tremendously based off the first 5 predictions, but we'll calculate the proper cost of the model, just to be sure.


*Note: Also creating a function for cost as well.*

In [11]:
def cost(weights, bias):
    
    cost = 0
    predictions = predict(weights, bias)
    for i in range(0, n_instncs):
        diff = (predictions[i] - dep_data[i])**2
        cost += diff
    cost = cost / (2*n_instncs)
    return cost


the_cost = cost(guess_weights, guess_bias)
print("\nThe cost of our model, given its guessed weights/bias, is: {:.2E}".format(the_cost))


The cost of our model, given its guessed weights/bias, is: 2.42E+15


<br><br><br>
As we can see above, the cost was huge, so we need to change the weights and bias, such that the cost will be lower. But how do we figure out what to change the weights and bias by, so that the cost decreases?

Well, in order to do that, we need to know how the cost changes, as our weights and bias change. The GRADIENT of the cost literally tells us exactly how cost changes, as the weights and bias change. So we need to work out the gradient of cost.

Now, we could figure out the gradient of cost, by just using SymPy (a popular Python module), and 
expressing the cost symbolically with SymPy symbols, then using SymPy's differentiation function. This will give us the 
gradient of cost, without having to do any calculus, or other mathematical work. 

HOWEVER, this project is supposed to be "from scratch", so I've gone ahead and done the calculus/mathematical work required,
to figure out the gradient of cost from scratch. I then coded in that mathematical work, and ended up with the gradient of 
the cost function "from scratch". The mathematical work that I've gone through (that includes step by step explanation), 
can be viewed in the PDF, that's in the same folder as this code file.

I've also coded in the gradient of cost just using SymPy, just for comparisons sake (and it also allows you to see
what a particular partial derivative looks like, symbolically, if that interests you)

<br><br>
### Gradient of cost, from scratch:

In [12]:
def grad_cost_sc(weights, bias):
    
    grad_cost_sc = []

    # Iterate through each weight:
    for k in range(0, n_vars):

        # Calculate the partial derivative for that particular weight:
        dC_dwk = 0
        for i in range(0, n_instncs):

            expr = 0
            summation = 0
            for j in range(0, n_vars):
                 summation += (weights[j] * data[j][i])

            expr = 2*data[k][i]*(summation + bias - dep_data[i])
            dC_dwk += expr

        dC_dwk = dC_dwk / (2*n_instncs)
        grad_cost_sc.append(dC_dwk)



    # Calculate the partial derivative for the bias:
    dC_db = 0
    for i in range(0, n_instncs):

        expr = 0
        summation = 0
        for j in range(0, n_vars):
             summation += (weights[j] * data[j][i])

        expr = 2*(summation + bias - dep_data[i])
        dC_db += expr

    dC_db = dC_db / (2*n_instncs)
    grad_cost_sc.append(dC_db)
    return grad_cost_sc

grad_cost = grad_cost_sc(guess_weights, guess_bias)


In [13]:
grad_cost

[3204490448589.09,
 3026392078.5395184,
 40182769865576.53,
 332610135414.0803,
 3377922289789.0244,
 35930064120.38476,
 3448613.166440441,
 2959381518317.2114,
 34630837649322.734,
 328083338.35640514,
 37924.886622242935,
 37770079187486.086,
 63842178.972653754]

<br><br>
As we can see from above, we now have the gradient of cost, but we'll want to verify it using SymPy, so let's do that.

<br><br>
### Gradient of cost, expressed symbolically using SymPy:

In [14]:
# First, expressing the weights and bias symbolically:

w = sp.symbols("w0:{}".format(n_vars))
b = sp.symbols('b')


In [15]:
# Now, expressing the cost symbolically:

cost_sym = 0

# Each lap of this loop, is an entire difference term
for i in range(0, n_instncs):
    
    diff = 0
    
    # Each lap of this loop, is an individual term within a difference term
    for j in range(0, n_vars):
        diff += w[j]*data[j][i]
        
    diff += b - dep_data[i]
    cost_sym += diff**2

# Finally, we just need to divide by 2*n_instncs
cost_sym = cost_sym / (2*n_instncs)
    
    

In [16]:
# Now that we've expressed the cost symbolically, we can find its gradient, by simply using SymPy's differentiation function

grad_cost_sympy = []

for i in range(0, n_vars):
    dC_dwi = sp.diff(cost_sym, w[i])
    grad_cost_sympy.append(dC_dwi)

dC_db = sp.diff(cost_sym, b)
grad_cost_sympy.append(dC_db)



<br><br><br>
So, we've worked out the gradient of cost symbolically, and real quick just for curiosities sake, we'll take a peek at it (purely optional, just for fun).  

Unfortunately, we can't look at the entire gradient, as the expression would be gigantic (unless we use summation notation), so we'll just take a look at the first partial derivative (derivative of cost with respect to w0):

*Warning: If n_vars is set to a large number, then even a single partial derivative can look quite big, as the number of terms present, will be equal to: n_vars + 2*

In [17]:
# Derivative of cost with respect to w0, symbolically:
grad_cost_sympy[0]


52692.2187315113*b + 3512432063.06779*w0 + 2578662.64410617*w1 + 30.7502747847605*w10 + 24145538239.7407*w11 + 29194850640.784*w2 + 284644276.810988*w3 + 2841227051.47236*w4 + 29208521.746297*w5 + 2711.82121128947*w6 + 2461159883.06276*w7 + 29155720301.8291*w8 + 274436.926162413*w9 - 1473635844951.25

<br><br><br>
Awesome! That looks right, now, all we have to do, is substitute the symbolic weights/bias in these SymPy expressions, with the actual values they're equal to. 

To do so, we have to use the SymPy "subs" method, which means the weights/bias, and the values they're equal to, will need to be paired into tuples, and those tuples need to go into a list:

In [18]:
weights_bias_vals = []

# Iterating through each weight, putting both its symbolic form, and numeric form, into a tuple. 
for i in range(0, n_vars):
    weights_bias_vals.append((w[i], guess_weights[i]))

# Putting the bias in symbolic form, and numeric form, into a tuple
weights_bias_vals.append((b, guess_bias))

    

In [19]:
# Now that we have a tuple for every weight/bias, we can use the subs method: 

# Iterating through each partial derivative, and substituting the weights/bias for the values they're equal to:
grad_cost_sympy_num = []

for i in range(0, len(grad_cost_sympy)):

    expr = grad_cost_sympy[i]
    deriv_num = expr.subs(weights_bias_vals)
    grad_cost_sympy_num.append(deriv_num)

    

<br><br><br>
Now that we have the gradient of cost in numeric form using SymPy, we can compare it to our gradient of cost that we got from
scratch, to see if there's any differences. We'll just look at the first five partial derivatives, as the same mathematics is used for every single derivative (so if one is off, then they'll all be off)

In [20]:
for i in range(0, 5):
    diff = grad_cost[i] - grad_cost_sympy_num[i]
    print("Difference: {},           Proportional Difference: {}".format(diff, diff/grad_cost_sympy_num[i]))

    

Difference: -0.000488281250000000,           Proportional Difference: -1.52374069398455E-16
Difference: -0.00000238418579101563,           Proportional Difference: -7.87798054297772E-16
Difference: 0.0156250000000000,           Proportional Difference: 3.88848256411152E-16
Difference: 0.000122070312500000,           Proportional Difference: 3.67007194017192E-16
Difference: 0.00146484375000000,           Proportional Difference: 4.33652294023463E-16


<br><br><br>
Interestingly, there are differences between our "from scratch" partial derivatives, and the ones from SymPy, although they are incredibly small, generally off by only a factor of E-15 (a quadrillionth). This is most likely due to rounding off with floats (can be determined by looking at SymPys: "diff" function source code), or imprecision when storing floats. 

Thus, with our "from scratch" gradient of cost having been verified as correct, we'll use it from now on, to improve
the weights and bias.


<br><br><br>
To improve a particular weight/bias, all we have to do, is add onto it, alpha multiplied by the associated negative partial derivative. 

*Note: The concept of alpha, and how to find a good value for it, is discussed in the PDF.*

So for example: to change weight zero (w0), it would simply be: <br> <br>
w0 = w0 + alpha * -(dC/dW0)

Or for the bias (b), it would be: <br> <br>
b = b + alpha * -(dC/db)

In [21]:
# Improving the weights and bias
alpha = 1/(10**12)

# Iterating through each weight, changing it
for i in range(0, len(guess_weights)):
    guess_weights[i] = guess_weights[i] + alpha*-(grad_cost[i])

# Changing the bias (partial derivative of bias, is the final element of the gradient):
guess_bias = guess_bias + alpha*-(grad_cost[-1])



In [28]:
# Do 10,000 laps of training, you can DECREASE the number of laps to improve runtime, but accuracy will decrease
for i in range(0, 10000):
    
    # Changing the weights
    for j in range(0, len(guess_weights)):
        guess_weights[j] = guess_weights[j] + alpha*-(grad_cost[j])
        
    # Changing the bias
    guess_bias = guess_bias + alpha*-(grad_cost[-1])
    
    
    # Updating the gradient of cost
    grad_cost = grad_cost_sc(guess_weights, guess_bias)
    
    
    

<br><br><br>
Training complete! Let's see how much our model has improved by. First, we'll calculate the new cost:

In [23]:
# Calculate new cost
new_cost = cost(guess_weights, guess_bias)

# Display new cost
print("The cost of our now trained model is: {:.2E}\n".format(new_cost))

# Display old cost, and improvement factor
print("The old cost was {:.2E}, so our cost has gotten {:.2E} times smaller".format(the_cost, (the_cost/new_cost)))

The cost of our now trained model is: 4.94E+09

The old cost was 2.42E+15, so our cost has gotten 4.90E+05 times smaller


<br><br><br>
So, our cost has improved tremendously (the last time I ran the code, the cost got over a billion times smalller), it's definitely looking good so far.
Next, we'll look at the first 5 predictions, and see how far off they were from the actual values.

In [24]:
for i in range(0, 5):
    diff = predict(guess_weights, guess_bias)[i] - dep_data[i]
    proport_diff = diff/dep_data[i]*100
    print("Difference between prediction_{} and actual_{}: {}".format(i, i, diff))
    print("Proportional difference, as a percentage: {:.3f}% \n".format(proport_diff))
    

Difference between prediction_0 and actual_0: -131005.4381391406
Proportional difference, as a percentage: -0.360% 

Difference between prediction_1 and actual_1: -61166.25034687668
Proportional difference, as a percentage: -0.203% 

Difference between prediction_2 and actual_2: 29358.63032885641
Proportional difference, as a percentage: 0.118% 

Difference between prediction_3 and actual_3: -41201.49051652476
Proportional difference, as a percentage: -0.216% 

Difference between prediction_4 and actual_4: -92129.89540355653
Proportional difference, as a percentage: -0.393% 



<br><br><br>
The model our machine learning algorithm came up with, seems far more accurate now, off by only 0.1% or less, but just
to get a concrete idea, we'll work out the average proportional difference (amongst all predictions):


In [25]:
# Calculate average proportional difference
sum_proport = 0
for i in range(0, n_instncs):
    diff = predict(guess_weights, guess_bias)[i] - dep_data[i]
    proport_diff = diff/dep_data[i]
    sum_proport += proport_diff

avg_proport = (sum_proport/n_instncs)*100
print("On average, a prediction will be off by about {:.3f}%".format(avg_proport))
                                                                 

On average, a prediction will be off by about -0.059%


<br><br><br>
**There we have it! A prediction model, with an accuracy of around 99.9% on the data (off by only <0.1% or so on average, assuming 10,000 training laps were performed).**

Now, it is very important to note, that **we tested the accuracy of our model, on the same exact same dataset, as the one used to train it**. This is never done in the real world, instead there'll be a training dataset, and a separate test dataset. I simply chose to have only a single dataset, for simplicity. So in reality, our model would be noticeably less accurate (more on this soon).

<br>





Now, do the weights/bias in our prediction model, actually match the coefficients and constant in the actual relationship?


In [26]:
# Display actual relationship
print("Actual Relationship:")
print(rel_str, "\n")

# Display final model
f_model_str = "y ="

for i in range(0, len(guess_weights)):
    f_model_str += " {:.0f}*x{} +".format(guess_weights[i], i)

f_model_str += " {:.0f}".format(guess_bias) 

print("Prediction Model: (rounded off values for weights/bias)")
print(f_model_str)



Actual Relationship:
y = 93*x0 + 9*x1 + 7*x2 + 82*x3 + 3*x4 + 25*x5 + 3*x6 + 93*x7 + 1*x8 + 85*x9 + 41*x10 + 27*x11 + 25 

Prediction Model: (rounded off values for weights/bias)
y = 94*x0 + 40*x1 + 7*x2 + 49*x3 + 4*x4 + 56*x5 + 10*x6 + 94*x7 + 1*x8 + 1*x9 + 95*x10 + 27*x11 + 35


<br><br><br>
Unfortunately, just from looking at the two equations, we can see that there are quite a few weights, that deviate quite significantly from the coefficients they're supposed to represent (be equal to). 

This means our prediction model, would be increasingly less accurate, as the data we're applying it to, becomes increasingly different to the data it was trained off of. (By different, I mean the scales of the variables in the data it's being applied, become increasingly different to the scales of the variables in the original training data). So if we had evaluated the model using a test dataset, that was fairly different to the training dataset, in terms of scales, then the accuracy would definitely be lower than 99.9%

To help with this particular issue, we can use feature scaling, which is discussed below. 

Apart from that, we've finished! (:



<br><br><br>
## Feedback / Improvements


### 1. Feature Scaling

Firstly, the independent variables created at the start, should be scaled appropriately (there's quite a few different ways to do so). If they aren't, then the variables that are on the largest scales, such as millions or above, are what the 
MLA trains/adapts to the most (so the weights for these larger scale variables, will become highly accurate), whereas the variables on the smallest scales, such as singles, or tenths, etc, are basically ignored/given the least priority, because they have a much smaller impact on the cost (so the weights for these smaller scale variables, will end up pretty inaccurate).

- (This is actually why the final weights the MLA came up with, are different to some of the variables coefficients, because most likely, those variables are the smallest scale ones, that don't matter anywhere near as much)

Additionally, if the independent variables were scaled, then the runtime of the training process, would be 
significantly improved/optimized (assuming the independent variables are on wildly different scales to begin with). 
This is because, in simple/intuitive terms, if the variables are on wildly different scales relative to each other, 
then the graph of the cost function essentially becomes warped (along certain axes), meaning the direction of steepest 
descent at most points, isn't actually pointing directly at the local minimum, but is instead off by a certain angle. 
The bigger the difference in scales between the independent the variables, the more "off" the direction of steepest descent
will be (capping out at almost 90 degrees). Further explanation on this is in the PDF.



### 2. Vectorization

Secondly, vectorization should be utilised as much as possible, instead of what was used, whiiteration and the use of loops. This 
would significantly speed up runtime, as calculations become array based, and currently the training part of this project
has a runtime of in the scale of tens of seconds. (Vectorization wasn't done for this project,
as it requires an understanding of linear algebra, which would make the explanations even longer)