# Real Estate estimator

In the following challenge, we want to estimate the **price** of a flat depending of data from other flats.
Welcome to the [Numpy documentation](https://docs.scipy.org/doc/numpy/reference/) which will be your friend through this exercise as the Numpy library is your only authorized import, Pandas is forbidden ⚠️. You can also find help on this [Numpy cheatsheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf).

In [75]:
# Load the Numpy library
# YOUR CODE HERE

## The ideal estimator

Considering those 4 flats, we want to find the relation between the `Price` and the 3 criterions: `Size`, `Bedrooms` and `Floors`. Those criterions are the **features** of the estimator.

|Flats |Size (feet^2)|Bedrooms|Floors|Price(1000$)|
|------|-------------|--------|------|------------|
|flat1 |2104|5|1|460|
|flat2 |1416|3|2|232|
|flat3 |1534|3|2|315|
|flat4 |852|2|1|178|

A first approach is to find a linear relation between the `Price` and the features resolving this system of equations:

$$\begin{cases}
    460 = \theta_0 + 2104\theta_1 + 5\theta_2 + 1\theta_3 \\
    232 = \theta_0 + 1416\theta_1 + 3\theta_2 + 2\theta_3 \\
    315 = \theta_0 + 1534\theta_1 + 3\theta_2 + 2\theta_3 \\
    178 = \theta_0 + 852\theta_1 + 2\theta_2 + 1\theta_3
\end{cases}$$

Which can be translated into a matricial equation:

$$Y = X\theta$$

where $Y$ is the vector of `Price`, $X$ is the matrix of features and $\theta$ (theta) is the vector of coefficients to be found.

### Define the matrix `x` of features:

_Hint: `x` should be a 4 by 3 `numpy.ndarray`_

In [76]:
x = None # YOUR CODE HERE
x

array([[2104,    5,    1],
       [1416,    3,    2],
       [1534,    3,    2],
       [ 852,    2,    1]])

In [77]:
x.shape

(4, 3)

In [78]:
type(x)

numpy.ndarray

### Define the vector `Y` of `Price`s

In [79]:
Y = None # Define Y here
Y

array([[460, 232, 315, 178]])

In [80]:
# Make Y a 4 by 1 vector with the right Numpy method
Y = None # YOUR CODE HERE
Y

array([[460],
       [232],
       [315],
       [178]])

### Create the matrix `X` representing the linear system of equation

As you probably noticed, the linear system of equation includes a $\theta_0$ coefficient which appears in the 4 equations. This coefficient is here to represent an [affine relation](https://math.stackexchange.com/questions/275310/what-is-the-difference-between-linear-and-affine-function) rather than a strict linear relation between the `Price` and the features. As a result, we need to add one last _feature_ $x_0$ to the matrix $x$.

In [81]:
# Define x0 as a 4 by 1 vector filled with 1 with the right Numpy method
x0 = None # YOUR CODE HERE 
x0

array([[1.],
       [1.],
       [1.],
       [1.]])

The complete matrix $X$ should look like:

$$\begin{bmatrix}
    1 & 2104 & 5 & 1 \\
    1 & 1416 & 3 & 2 \\
    1 & 1534 & 3 & 2 \\
    1 & 852 & 2 & 1
\end{bmatrix}$$

In [82]:
# Use x0 and x to define the matrix X with the right Numpy method
X = None # YOUR CODE HERE
X

array([[1.000e+00, 2.104e+03, 5.000e+00, 1.000e+00],
       [1.000e+00, 1.416e+03, 3.000e+00, 2.000e+00],
       [1.000e+00, 1.534e+03, 3.000e+00, 2.000e+00],
       [1.000e+00, 8.520e+02, 2.000e+00, 1.000e+00]])

### Find the solution of the system

Now is the time to find the vector of coefficients $\theta$ !

The solution of the equation is:
 
$$Y = X\theta <=> X\theta = Y <=> X^{-1}X\theta = X^{-1}Y <=> \theta = X^{-1}Y$$

where $X^{-1}$ is the inverse of $X$.

In [83]:
# Compute the inverse of the matrix X with the right Numpy method
Xinv = None # YOUR CODE HERE
Xinv

You can check the inversion worked testing:

$$X^{-1}X = I_4$$
where $I_4$ is the 4 by 4 identity matrix.

In [84]:
# Define I4 using the right Numpy method
I4 = None # YOUR CODE HERE
I4

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

Now compute $X^{-1}X$:

In [85]:
XinvX = None # YOUR CODE HERE
XinvX

array([[ 1.00000000e+00, -7.06990022e-13, -1.77635684e-15,
        -8.88178420e-16],
       [-7.09407683e-19,  1.00000000e+00, -3.15353884e-18,
        -7.09407683e-19],
       [ 4.44089210e-16,  1.21858079e-12,  1.00000000e+00,
         0.00000000e+00],
       [ 0.00000000e+00,  1.26121336e-13,  2.22044605e-16,
         1.00000000e+00]])

Does it looks like $I_4$?

If not, you probably use the `*` operator to perform the multiplication between $X^{-1}$ and $X$. Here we want to perform the matrix product you should find the right Numpy method to do so.

If so, you noticed that you do not really get exact $0$ and $1$ is the resulting product. To be sure, you can try the `numpy.allclose` method to check your result:

In [86]:
# YOUR CODE HERE

You are finally able to find $\theta = X^{-1}Y$:

In [89]:
theta = None # YOUR CODE HERE
theta

array([[ 120.97175141],
       [   0.70338983],
       [-199.5480226 ],
       [-143.16384181]])

What do you think about those coefficients? How does the `Price` evolve while the `Size` raises? What about the `Bedrooms` or the `Floors` raising?

You can plot the `Price` against one the feature to visualize this relation.

### Estimation

You finally solve the system finding $\theta$, you are able to estimate the `Price` (in thousands of $) of a 5th flat given this characteristics:

- `Size`: 3000 $ft^2$
- `Bedrooms`: 5 
- `Floors`: 1

with the following formula:

$$Y_{flat5} = X_{flat5}\theta$$

In [88]:
X5 = None # Define X5
Y5 = None # Compute Y5
Y5 # You should find a Price of 1090 000 $

array([[1090.23728814]])