# __Beyond Market Trends: Predicting future house prices__

## __Objective__
Your objective is to build the house price prediction model using linear algebra principles in Python.

## __Steps to Perform__
1. Data Preparartion and Split
2. Create a function **linear_regression()** to calculate the coefficients of the linear equation using the formula: **coefficients=〖(X^T.X)〗^(-1).X^T.y*
3. Create the function **estimate_house_price(coefficients, new_features)** that predicts the price of the house
   

### Step 1: Data Preparartion and Split

In [1]:
import numpy as np

# Data Preparation

# Input dataset contains contains columns: House Area (sq ft), No. of Bedrooms, Bathrooms, Garage Size (sq ft), Price($)
data = np.array([[1500,3,2,400,200000],
                 [1800,4,3,500,250000],
                 [1200,2,1,300,180000],
                 [2100,5,3,550,280000],
                 [1400,3,2,350,190000],
                 [1600,4,2,450,220000],
                 [1300,3,1,320,185000],
                 [2000,4,3,480,270000],
                 [1750,3,2,420,240000],
                 [2400,5,4,600,320000],
                 [1100,2,1,280,160000],
                 [1850,4,3,530,290000],
                 [2500,5,4,620,340000],
                 [1950,4,2,490,280000],
                 [1450,3,2,380,210000],
                 [1350,2,1,310,175000],
                 [2200,4,3,530,310000],
                 [2050,4,3,520,290000],
                 [1650,3,2,410,225000],
                 [2300,5,4,590,330000]])

# Create an array X to store the features/independent variables -> All columns except the last one
X = data[:, :-1]
# Create an array y to store the target variable: Price -> Only the last column
y = data[:, -1]

# Print the values to verify your results
print(f"X:\n {X}")
print(f"y:\n {y}")

X:
 [[1500    3    2  400]
 [1800    4    3  500]
 [1200    2    1  300]
 [2100    5    3  550]
 [1400    3    2  350]
 [1600    4    2  450]
 [1300    3    1  320]
 [2000    4    3  480]
 [1750    3    2  420]
 [2400    5    4  600]
 [1100    2    1  280]
 [1850    4    3  530]
 [2500    5    4  620]
 [1950    4    2  490]
 [1450    3    2  380]
 [1350    2    1  310]
 [2200    4    3  530]
 [2050    4    3  520]
 [1650    3    2  410]
 [2300    5    4  590]]
y:
 [200000 250000 180000 280000 190000 220000 185000 270000 240000 320000
 160000 290000 340000 280000 210000 175000 310000 290000 225000 330000]


### Step 2: Create a function **linear_regression()** to calculate the coefficients of the linear equation using the formula: **coefficients=〖(X^T.X)〗^(-1).X^T.y**

#### The formula (X^T.X)^(-1).X^T.y represents the solution for the least-squares regression coefficients (β), a common technique used in statistics and machine learning to find the best-fitting line (or plane, etc.) for a set of data. 

Here's a breakdown:

- X: A matrix containing the predictor (independent) variables. 
- X^T: The transpose of matrix X. 
- y: A vector containing the response (dependent) variable. 
- (X^T.X)^(-1): The inverse of the matrix product of X^T and X. 
- .(dot): matrix multiplication 
- Solution β: The result of this calculation, (X^T.X)^(-1)X^T y, gives you the optimal vector of coefficients (β) that minimize the sum of squared errors between your predicted values and the actual values of y. 

In [2]:
X_transpose = np.transpose(X)
print(X_transpose)

[[1500 1800 1200 2100 1400 1600 1300 2000 1750 2400 1100 1850 2500 1950
  1450 1350 2200 2050 1650 2300]
 [   3    4    2    5    3    4    3    4    3    5    2    4    5    4
     3    2    4    4    3    5]
 [   2    3    1    3    2    2    1    3    2    4    1    3    4    2
     2    1    3    3    2    4]
 [ 400  500  300  550  350  450  320  480  420  600  280  530  620  490
   380  310  530  520  410  590]]


### Multiplying a matrix by its transpose
https://www.youtube.com/watch?v=1GJHdf5JXoE

In [3]:

print(X_transpose.shape)
print(X.shape)

(4, 20)
(20, 4)


In [4]:
XTX = np.dot(X_transpose, X)
print(XTX)

[[66037500   134750    92350 16805000]
 [  134750      278      190    34420]
 [   92350      190      134    23580]
 [16805000    34420    23580  4288500]]


### Finding the inverse of the dot product of a matrix and its transpose

In [5]:
inv_XTX = np.linalg.inv(XTX)
print(inv_XTX)

[[ 5.69627683e-06  4.24115410e-04 -5.71081897e-06 -2.56941410e-05]
 [ 4.24115410e-04  6.51278809e-01 -1.06510987e-01 -6.30354366e-03]
 [-5.71081897e-06 -1.06510987e-01  2.48204055e-01 -4.87483536e-04]
 [-2.56941410e-05 -6.30354366e-03 -4.87483536e-04  1.54192112e-04]]


### Finding the dot product of the inverse of the dot product of a matrix and its transpose(inv_XTX) and the transpose of the matrix (X_transpose).

In [8]:
print(inv_XTX.shape)
print(X_transpose.shape)

inv_XTX_XT = np.dot(inv_XTX, X_transpose)
print(inv_XTX_XT)
print(inv_XTX_XT.shape)

(4, 4)
(4, 20)
[[-4.72316540e-04 -9.14442994e-04 -3.01900854e-05 -6.61515821e-05
   2.42762825e-04 -7.63280494e-04  4.49670189e-04  7.38695191e-04
   4.37869849e-04  3.52313601e-04 -8.59349497e-05 -1.40045338e-03
   4.08058465e-04  2.02650759e-04 -2.43247562e-04  5.67310030e-04
   5.93243510e-04 -4.25660512e-06  1.25183576e-04  3.96273274e-05]
 [-1.44429897e-01 -1.02781818e-01 -1.86077976e-01  3.60554430e-01
   1.28335745e-01  2.34083270e-01  3.81541501e-01  1.08112137e-01
  -1.64471918e-01  6.61008834e-02 -1.02418644e-01 -2.70682358e-01
  -1.75584488e-02  1.30381917e-01 -3.95647943e-02 -1.85496101e-01
  -1.22241964e-01 -1.22823839e-01 -1.43848022e-01  8.67247790e-02]
 [-2.66844928e-02  6.45469764e-02 -1.17915962e-01 -6.80514330e-02
  -1.73923408e-03 -1.58140738e-01 -2.34747701e-01  7.31544833e-02
  -3.78618682e-02  1.54065200e-01 -1.07595209e-01  4.96369294e-02
   1.43744447e-01 -1.79638866e-01 -1.66492811e-02 -1.23647420e-01
   4.76381427e-02  5.33696009e-02 -3.24159510e-02  1.595111

### Find the dot product of inv_XTX_XT and y

In [None]:
print("inv_XTX_XT.shape",inv_XTX_XT.shape)
print("y.shape", y.shape)

coefficients = np.dot(inv_XTX_XT, y)

print(coefficients)

In [None]:

def linear_regression(X, y):
    X_transpose = np.transpose(X) # Calculate the transpose of X
    XTX = np.dot(X_transpose, X) # Calculate the dot product of X and X_transpose
    coefficients = np.dot(np.dot(np.linalg.inv(XTX),X_transpose), y) # Find the coefficients using the formula
    return coefficients

coefficients = linear_regression(X, y)

# Print the values to verify your results
print(coefficients)

### Step 3: Create the function **estimate_house_price(coefficients, new_features)** that predicts the price of the house

In [None]:
# Estimate the price of a house with the following features:
    # Area = 1800 sq ft
    # Bedrooms = 3
    # Bathrooms = 2
    # Garage = 400 sq ft
    
    
home_price = 79.35099648 * 1850 + (-12162.41836142) * 4 + (-1854.29828123) * 3 + 343.14126097 * 530
print(round(home_price,0))

In [None]:
# Calculate the estimated price and return the output
def estimate_house_price(coefficients, new_features):
    estimated_price = np.dot(coefficients, new_features)
    return estimated_price

# Estimate the price of a house with the following features:
    # Area = 1800 sq ft
    # Bedrooms = 3
    # Bathrooms = 2
    # Garage = 400 sq ft

# Create a new array that stores the features listed above
new_features = np.array([1800, 3, 2, 400])

estimated_price = estimate_house_price(coefficients, new_features)

# Print the result
print(f"Estimated Price: ${round(estimated_price)}")