## Multiple Variable Linear Regression

### Goals

Extend our regression model routines to support multiple features

Extend data structures to support multiple features

Rewrite prediction, cost and gradient routines to support multiple features

Utilize NumPy np.dot to vectorize their implementations for speed and simplicity

In [2]:
import copy, math
import numpy as np
import matplotlib.pyplot as plt
# plt.style.use('./deeplearning.mplstyle')
np.set_printoptions(precision=2)  # reduced display precision on numpy arrays

In [4]:
import pandas as pd

In [25]:
import time

### Problem Statement
We will use the motivating example of housing price prediction. The training dataset contains three examples with four features (size, bedrooms, floors and, age).

lets build a linear regression model using these values so we can then predict the price for other houses. For example, a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old.

In [3]:
X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])

In [5]:
# Creating a DataFrame to display the data
columns = ['Size (sqft)', 'Bedrooms', 'Floors', 'Age (years)']
df = pd.DataFrame(X_train, columns=columns)
df['Price (in $1000s)'] = y_train

print("Training Dataset:")
print(df)

Training Dataset:
   Size (sqft)  Bedrooms  Floors  Age (years)  Price (in $1000s)
0         2104         5       1           45                460
1         1416         3       2           40                232
2          852         2       1           35                178


In [6]:
# data is stored in numpy array/matrix
print(f"X Shape: {X_train.shape}, X Type:{type(X_train)})")
print(X_train)
print(f"y Shape: {y_train.shape}, y Type:{type(y_train)})")
print(y_train)

X Shape: (3, 4), X Type:<class 'numpy.ndarray'>)
[[2104    5    1   45]
 [1416    3    2   40]
 [ 852    2    1   35]]
y Shape: (3,), y Type:<class 'numpy.ndarray'>)
[460 232 178]


[[2104    5    1   45]
 [1416    3    2   40]
 [ 852    2    1   35]]

 every 1D array from above 2D array is a sample record with multiple features, for example [2104    5    1   45]

 where, x1 Size 2104, x2 No.of Bedrooms 5, x3 No.of floors 1, x4 Age.of Home 45

 Now we have to set Weights w1, w2, w3, w4 for every feature x1, x2, x3, x4 according to their importance and impact

### Parameter vector w, b

w is a vector with elements.

Each element contains the parameter associated with one feature.

in our dataset, n is 4, so you can say

Size 0.39133535, No.Beds 18.75376741, No.Floors -53.36032453, Age -26.42131618
No.Beds has biggest value that determine its the most important feature, No.of Floor is least imp

b is a scalar parameter.

In [7]:
b_init = 785.1811367994083
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])
print(f"w_init shape: {w_init.shape}, b_init type: {type(b_init)}")

w_init shape: (4,), b_init type: <class 'float'>


### Model Prediction With Multiple Variables

The model's prediction with multiple variables is given by the linear model:

fw,b(x) = (w0.x0 + w1.x1 + w2.x2 + w3.x3) + b

where is a vector dot product

In [18]:
# X_train.shape[0]
print(f"No. of sample records for Housing data: {X_train.shape[0]}")
print(f"No. of Features in our dataset: {X_train.shape[1]}")
print(f"No. of Initial weights : {w_init.shape[0]}")
print(f"Initial bias: {b_init}")
print(f"Prices samples belong to each sample records: {y_train.shape[0]}")

No. of sample records for Housing data: 3
No. of Features in our dataset: 4
No. of Initial weights : 4
Initial bias: 785.1811367994083
Prices samples belong to each sample records: 3


### Single Prediction element by element

The function predict_single_loop is designed to make a prediction for a single example (one house) based on its features. It is not intended to handle multiple records (multiple houses) at once. 

In [41]:
def predict_single_loop(x, w, b): 
    """
    single predict using linear regression
    
    Args:
      x (ndarray): Shape (n,) example with multiple features
      w (ndarray): Shape (n,) model parameters    
      b (scalar):  model parameter     
      
    Returns:
      p (scalar):  prediction
    """
    n = x.shape[0]    # Get the number of features (n) in the input example x
    p = 0             # Initialize the prediction variable p to 0

    tic = time.time()  # capture start time
    
    for i in range(n):
        p_i = x[i] * w[i]  # Multiply each feature by its corresponding weight
        p = p + p_i         # Accumulate the result into p
    p = p + b               # Add the bias term to the accumulated sum
    
    toc = time.time()  # capture end time
    time_taken = 1000*(toc-tic)  # calculate time taken in milliseconds

    return p, time_taken         # Return the prediction p and the time taken

In [42]:
# get a row from our training data
x_vec = X_train[0,:]
print(f"x_vec shape {x_vec.shape}, x_vec value: {x_vec}")

x_vec shape (4,), x_vec value: [2104    5    1   45]


In [43]:
# make a prediction
f_wb, duration = predict_single_loop(x_vec, w_init, b_init)
print(f"f_wb shape {f_wb.shape}, prediction: {f_wb}")
print(f"Total Time Duration: {duration} ms")

f_wb shape (), prediction: 459.9999976194083
Total Time Duration: 0.9996891021728516 ms


### Single Prediction with np.dot instead of looping

lets modify predict_single_loop function to speed up performence, by using np.dot inplace of looping

In [59]:
def predict(x, w, b): 
    """
    single predict using linear regression
    Args:
      x (ndarray): Shape (n,) example with multiple features
      w (ndarray): Shape (n,) model parameters   
      b (scalar):             model parameter 
      
    Returns:
      p (scalar):  prediction
    """
    tic = time.time()  # capture start time
    p = np.dot(x, w) + b # Compute the dot product of x and w, and add b to the result  
    toc = time.time()  # capture end time
    time_taken = 1000*(toc-tic)  # calculate time taken in milliseconds 
    return p, time_taken # Return the prediction p and the time taken

In [61]:
# get a row from our training data
x_vec = X_train[0,:]
print(f"x_vec shape {x_vec.shape}, x_vec value: {x_vec}")

x_vec shape (4,), x_vec value: [2104    5    1   45]


In [62]:
# make a prediction
f_wb, duration = predict(x_vec,w_init, b_init)
print(f"f_wb shape {f_wb.shape}, prediction: {f_wb}")
print(f"Total Time Duration: {duration} ms")

f_wb shape (), prediction: 459.9999976194083
Total Time Duration: 0.0 ms


In [63]:
def predict_multiple_records(X, w, b):
    """
    Predicts for multiple records using linear regression
    
    Args:
      X (ndarray): Shape (m, n), m examples with n features each
      w (ndarray): Shape (n,) model parameters    
      b (scalar):  model parameter
      
    Returns:
      p (ndarray): Shape (m,), predictions for each example
    """
    m = X.shape[0]   # Number of records
    p = np.zeros(m)  # Initialize the prediction array
    
    for i in range(m):
        p[i], timeTaken = predict_single_loop(X[i], w, b)  # Predict for each record
    
    return p


In [65]:
predictions = predict_multiple_records(X_train, w_init, b_init)
print(predictions)

[460. 232. 178.]
