# UGL - Multiple Variable Model Representation



In [2]:
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

In [4]:
# Take two data points - TODO: come up with problem statement/explanantion
X_orig = np.array([[10,5], [20, 2]])
y_orig = np.array([1,2])

In [5]:
# print the length of X_orig
print(len(X_orig))

# print the length of y_orig
print(len(y_orig))

2
2


In [6]:
# print the shape of X_orig
print(X_orig.shape)

# print the shape of y_orig
print(y_orig.shape)

(2, 2)
(2,)


## Hypothesis

#### Model prediction
The model's prediction is also called the "hypothesis", $h_{w}(x)$.  
- The prediction is given by the linear model:

$$ h_{w}(x) =  w_0 + w_1x_1 \tag{2}$$

This the equation for a line, with an intercept $w_0$ and a slope $w_1$

#### Vector notation

For convenience of notation, you'll define $\overrightarrow{x}$ as a vector containing two values:

$$ \vec{x} = \begin{pmatrix}
        x_0 & x_1 
      \end{pmatrix}
$$

- You'll set $x_0 = 1$. 
- $x_1$ will be the city population from your dataset `X_orig`. 


Similarly, you are defining $\vec{w}$ as a vector containing two values:

$$ \vec{w} = \begin{pmatrix}
        w_0 \\ 
        w_1 
      \end{pmatrix}
$$


Now the hypothesis $h_{\vec{w}}(\vec{x})$ can now be written as

$$ h_{\vec{w}}(\vec{x}) = \vec{x} \times \vec{w}  \tag{3}
$$ 

$$
h_{\vec{w}}(\vec{x}) = 
\begin{pmatrix} x_0 & x_1 \end{pmatrix} \times 
\begin{pmatrix} w_0 \\ w_1 \end{pmatrix} 
$$
$$
h_{\vec{w}}(\vec{x}) = x_0 \times w_0 + x_1 \times w_1 
$$
Here is a small example: 



In [7]:
# Here is a small concrete example of x and w as vectors

tmp_x = np.array([1,2])
print(f"The input x is:")
print(tmp_x)
print()

tmp_w = np.array([[3],[4]])
print(f"The parameter w is")
print(tmp_w)
print()

tmp_h = np.dot(tmp_x,tmp_w)
print(f"The model's prediction is {tmp_h}")

The input x is:
[1 2]

The parameter w is
[[3]
 [4]]

The model's prediction is [11]


#### Matrix X

To allow you to process multiple examples (multiple cities) at a time, you can stack multiple examples (cities) as rows of a matrix $\mathbf{X}$.

For example, let's say New York City is $\vec{x^{(0)}}$ and San Francisco is $\vec{x^{(1)}}$.  Then stack New York City in row 1 and San Francisco in row 2 of matrix $\mathbf{X}$:

$$\mathbf{X} = \begin{pmatrix}
 \vec{x^{(0)}} \\ 
 \vec{x^{(1)}}
\end{pmatrix}
$$

Recall that each vector consists of $w_0$ and $w_1$, and $\mathbf{X}$ looks like this:
$$
\mathbf{X} = \begin{pmatrix}
 x^{(0)}_0 & x^{(0)}_1 \\ 
 x^{(1)}_0 & x^{(1)}_1
\end{pmatrix}
$$

Recall that you're fixing $x_0^{(i)}$ for all cities to be `1`, so you can also write $\mathbf{X}$ as:
$$\mathbf{X} =
\begin{pmatrix}
 1 & x^{(0)}_1 \\ 
 1 & x^{(1)}_1
\end{pmatrix}
$$

In [8]:
# Here is a concrete example

tmp_NYC_population = 9
tmp_SF_population = 2
tmp_x0 = 1 # x0 for all cities

tmp_X = np.array([[tmp_x0, tmp_NYC_population],
                  [tmp_x0, tmp_SF_population]
                 ])

print(f"New York City has population {tmp_NYC_population}")
print(f"San Francisco has population {tmp_SF_population}")
print(f"An example of matrix X with city populations for two cities is:\n")
print(f"{tmp_X}")

New York City has population 9
San Francisco has population 2
An example of matrix X with city populations for two cities is:

[[1 9]
 [1 2]]


#### Matrix X in general
In general, when you have $m$ training examples (in this dataset $m$ is the number of cities), and there are $n$ features (here, just 1 feature, which is city population):
- $\mathbf{X}$ is a matrix with dimensions ($m$, $n+1$) (m rows, n+1 columns)
  - Each row is a city and its input features.

$$\mathbf{X} = \begin{pmatrix}
 \vec{x^{(0)}} \\ 
 \vec{x^{(1)}} \\
 \cdots \\
 \vec{x^{(m-1)}}
\end{pmatrix}
= \begin{pmatrix}
 x^{(0)}_0 & x^{(0)}_1 & \cdots & x^{(0)}_{n} \\ 
 x^{(1)}_0 & x^{(1)}_1 & \cdots & x^{(1)}_{n} \\
 \cdots \\
 x^{(m-1)}_0 & x^{(m-1)}_1 & \cdots & x^{(m-1)}_{n} 
\end{pmatrix}
$$

- In this dataset, $n=1$ (city population) and $m=97$ (97 cities in the dataset)

$$\mathbf{X} = \begin{pmatrix}
 \vec{x^{(0)}} \\ 
 \vec{x^{(1)}} \\
 \cdots \\
 \vec{x^{(m-1)}}
\end{pmatrix}
= \begin{pmatrix}
 x^{(0)}_0 & x^{(0)}_1  \\ 
 x^{(1)}_0 & x^{(1)}_1 \\
 \cdots \\
 x^{(97-1)}_0 & x^{(97-1)}_1 
\end{pmatrix}
$$

- $\vec{w}$ is a vector with dimensions ($n+1$, $1$) (n+1 rows, 1 column)
  - Each column represents one feature.

$$\vec{w} = \begin{pmatrix}
w_0 \\ 
w_1 \\
\cdots\\
w_{n}
\end{pmatrix}
$$
- In this dataset, there is just the intercept and the city population feature:
$$\vec{w} = \begin{pmatrix}
w_0 \\ 
w_1 \\
\end{pmatrix}
$$

#### Processing data: Add the column for the intercept

To calculate the cost and implement gradient descent, you will want to first add another column to your data (as $x_0$) to accomodate the $w_0$ intercept term. 
- This allows you to treat $w_0$ as simply another 'feature': feature 0.
- The city population is then $w_1$, or feature 1.

So if your original $\mathbf{X_{orig}}$ looks like this:

$$ 
\mathbf{X_{orig}} = 
\begin{pmatrix}
 x^{(0)}_1 \\ 
 x^{(1)}_1 \\
 \cdots \\
 x^{(97-1)}_1 
\end{pmatrix}
$$

You will want to combine it with a vector of ones:
$$
\vec{1} = 
\begin{pmatrix}
 x^{(0)}_0 \\ 
 x^{(1)}_0  \\
 \cdots \\
 x^{(m-1)}_0
\end{pmatrix}
= 
\begin{pmatrix}
 1 \\ 
 1 \\
 \cdots \\
 1
\end{pmatrix}
$$

So it will look like this:
$$
\mathbf{X} = \begin{pmatrix} \vec{1} & \mathbf{X_{orig}}\end{pmatrix}
=
\begin{pmatrix}
 1 & x^{(0)}_1 \\ 
 1 & x^{(1)}_1 \\
 \cdots \\
 1 & x^{(97-1)}_1 
\end{pmatrix}
$$

Here is a small example of what you'll want to do.

In [11]:
tmp_NYC_population = 9
tmp_SF_population = 2
tmp_x0 = 1 # x0 for all cities
tmp_num_of_cities = 2

tmp_X_orig = np.array([[tmp_NYC_population],
                       [tmp_SF_population]
                      ])

print("Matrix of city populations")
print(tmp_X_orig)
print()

# Use np.ones to create a column vector of ones
tmp_ones = np.ones((tmp_num_of_cities,1))
print("Column vector of ones ({tmp_num_of_cities} rows and 1 column)")
print(tmp_ones)
print()

tmp_X = np.concatenate([tmp_ones, tmp_X_orig], axis=1)
print("Vector of ones stacked to the left of tmp_X_orig")
print(tmp_X)

print(f"tmp_x has shape: {tmp_X.shape}")

Matrix of city populations
[[9]
 [2]]

Column vector of ones ({tmp_num_of_cities} rows and 1 column)
[[1.]
 [1.]]

Vector of ones stacked to the left of tmp_X_orig
[[1. 9.]
 [1. 2.]]
tmp_x has shape: (2, 2)


In this small example, the $\mathbf{X}$ is now:
$$\mathbf{X} = 
\begin{pmatrix}
1 & 9 \\
1 & 2
\end{pmatrix}
$$

Notice that when calling `np.concatenate`, you're setting `axis=1`.  
- This puts the vector of ones on the left and the tmp_X_orig to the right.
- If you set axis = 0, then `np.concatenate` would place the vector of ones ON TOP of tmp_X_orig

In [13]:
print("Calling numpy.concatenate, setting axis=0")
tmp_X_version_2 = np.concatenate([tmp_ones, tmp_X_orig], axis=0)
print("Vector of ones stacked to the ON TOP of tmp_X_orig")
print(tmp_X_version_2)

Calling numpy.concatenate, setting axis=0
Vector of ones stacked to the ON TOP of tmp_X_orig
[[1.]
 [1.]
 [9.]
 [2.]]


So if you set axis=1, $\mathbf{X}$ looks like this:
$$\mathbf{X} = 
\begin{pmatrix}
1 \\ 1 \\
9 \\ 2
\end{pmatrix}
$$
This is **NOT** what you want.

You'll want to set axis=1 so that you get a column vector of ones on the left and a colun vector of the city populations on the right:

$$\mathbf{X} = 
\begin{pmatrix}
1 & x^{(0)}_1 \\
1 & x^{(1)}_1
\end{pmatrix}
$$

In [None]:
# Add a column to X_orig to account for the w_0 term
# X_train = np.stack([np.ones(X_orig.shape), X_orig], axis=1)
m = len(X_col)
col_vec_ones = np.ones((m, 1))
X_train = np.concatenate([col_vec_ones, X_col], axis=1)
# Keep y_orig the same
y_train = y_col

print ('The shape of X_train is: ' + str(X_train.shape))
print ('The shape of y_train is: ' + str(y_train.shape))