## Multiple Variable

### 1. Problem Understanding

-   What is the problem you are trying to solve

    Having a model/dataset that can support multiple variables not just 1. 

-   What kind of data are you working with?

    Numerical and gaussian distribution data that is linear. And matrix instead of an array.

-   What are the goals and objectives of the project?

    To compute gradient descent that can support multiple variables.
    
-   What is the expected output of the machine learning algorithm?

    Two numbers, w and b optimized

-   What are the constraints and limitations of the problem?

    None?

### 2. Equation


The model's prediction with multiple variables is given by the linear model:

$$ f_{\mathbf{w},b}(\mathbf{x}) =  w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \tag{1}$$
or in vector notation:
$$ f_{\mathbf{w},b}(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b  \tag{2} $$ 
where $\cdot$ is a vector `dot product`



The equation for the cost function with multiple variables $J(\mathbf{w},b)$ is:
$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 \tag{3}$$ 
where:
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b  \tag{4} $$ 

Gradient descent for multiple variables:

$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline\;
& w_j = w_j -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{5}  \; & \text{for j = 0..n-1}\newline
&b\ \ = b -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial b}  \newline \rbrace
\end{align*}$$

where, n is the number of features, parameters $w_j$,  $b$, are updated simultaneously and where  

$$
\begin{align}
\frac{\partial J(\mathbf{w},b)}{\partial w_j}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{6}  \\
\frac{\partial J(\mathbf{w},b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{7}
\end{align}
$$
* m is the number of training examples in the data set

    
*  $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value

### 3. Code Implementation

In [134]:
import numpy as np

x = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y = np.array([460, 232, 178])


b = 785.1811367994083
w = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])


def compute_cost(x,y,w,b):
    m = x.shape[0]
    cost = 0.0
    
    f_wb = np.dot(x, w) + b
    cost = (f_wb - y) ** 2
    total_cost = (1/(2*m)) * np.sum(cost)

    return total_cost


def compute_gradient(x,y,w,b):
    #f_wb = np.dot(x,w) + b

    m,n = x.shape
    print("m=", m)
    print("n", n)

    dj_dw = np.zeros(n)
    dj_db = 0.


    for i in range(m):
        error = (np.dot(x[i],w) + b) - y[i]
        dj_dw = error * x[i]
        print("x[i]=",error * x[i])
        # for j in range(n):
        #     dj_dw[j] = dj_dw[j] + error * x[i,j]
        dj_db = dj_db + error
    
    dj_dw = dj_dw / m
    dj_db = dj_db / m

    # dj_dw = (1/m) * np.sum((f_wb - y) * x)
    # dj_db = (1/n) * np.sum(f_wb - y)

    return dj_db, dj_dw
    

In [135]:
#Compute and display gradient 
tmp_dj_db, tmp_dj_dw = compute_gradient(x, y, w, b)
print(f'dj_db at initial w,b: {tmp_dj_db}')
print(f'dj_dw at initial w,b: \n {tmp_dj_dw}')

m= 3
n 4
x[i]= [-5.00876505e-03 -1.19029588e-05 -2.38059175e-06 -1.07126629e-04]
x[i]= [-2.30891812e-03 -4.89177569e-06 -3.26118379e-06 -6.52236758e-05]
x[i]= [-8.61024264e-04 -2.02118372e-06 -1.01059186e-06 -3.53707151e-05]
dj_db at initial w,b: -1.673925169143331e-06
dj_dw at initial w,b: 
 [-2.87008088e-04 -6.73727906e-07 -3.36863953e-07 -1.17902384e-05]


In [136]:
import numpy as np

x = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y = np.array([460, 232, 178])


b = 785.1811367994083
w = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])


def compute_cost(x,y,w,b):
    m = x.shape[0]
    cost = 0.0
    
    f_wb = np.dot(x, w) + b
    cost = (f_wb - y) ** 2
    total_cost = (1/(2*m)) * np.sum(cost)

    return total_cost


def compute_gradient(x,y,w,b):
    #f_wb = np.dot(x,w) + b

    m,n = x.shape
    print("m=", m)
    print("n", n)

    dj_dw = np.zeros(n)
    dj_db = 0.


    for i in range(m):
        error = (np.dot(x[i],w) + b) - y[i]
        print("error=", error)
        # dj_dw = error * x[i]
        # print(dj_dw)
        print("x[i]=", x[i])
        for j in range(n):
            dj_dw[j] = dj_dw[j] + error * x[i,j]
            print(dj_dw[j])
        #print(dj_dw)
        dj_db = dj_db + error
    
    dj_dw = dj_dw / m
    dj_db = dj_db / m

    # dj_dw = (1/m) * np.sum((f_wb - y) * x)
    # dj_db = (1/n) * np.sum(f_wb - y)

    return dj_db, dj_dw
    

In [137]:
#Compute and display gradient 
tmp_dj_db, tmp_dj_dw = compute_gradient(x, y, w, b)
print(f'dj_db at initial w,b: {tmp_dj_db}')
print(f'dj_dw at initial w,b: \n {tmp_dj_dw}')

m= 3
n 4
error= -2.3805917521713127e-06
x[i]= [2104    5    1   45]
-0.005008765046568442
-1.1902958760856563e-05
-2.3805917521713127e-06
-0.00010712662884770907
error= -1.6305918961734278e-06
x[i]= [1416    3    2   40]
-0.007317683171550016
-1.6794734449376847e-05
-5.641775544518168e-06
-0.00017235030469464618
error= -1.0105918590852525e-06
x[i]= [852   2   1  35]
-0.00817870743549065
-1.8815918167547352e-05
-6.652367403603421e-06
-0.00020772101976263002
dj_db at initial w,b: -1.673925169143331e-06
dj_dw at initial w,b: 
 [-2.72623581e-03 -6.27197272e-06 -2.21745580e-06 -6.92403399e-05]


SyntaxError: invalid syntax (3306876842.py, line 2)