<a name="toc_15456_5"></a>
# 5 Gradient Descent With Multiple Variables
Gradient descent for multiple variables:

$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline\;
& w_j = w_j -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{5}  \; & \text{for j = 0..n-1}\newline
&b\ \ = b -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial b}  \newline \rbrace
\end{align*}$$

where, n is the number of features, parameters $w_j$,  $b$, are updated simultaneously and where  

$$
\begin{align}
\frac{\partial J(\mathbf{w},b)}{\partial w_j}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{6}  \\
\frac{\partial J(\mathbf{w},b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{7}
\end{align}
$$
* m is the number of training examples in the data set

    
*  $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value

#### Problem Dataset

| Size (sqft) | Number of Bedrooms  | Number of floors | Age of  Home | Price (1000s dollars)  |   
| ----------------| ------------------- |----------------- |--------------|-------------- |  
| 2104            | 5                   | 1                | 45           | 460           |  
| 1416            | 3                   | 2                | 40           | 232           |  
| 852             | 2                   | 1                | 35           | 178           | 

In [4]:
import numpy as np
import copy,math

In [2]:
X_train=np.array([[2104,5,1,45],
                  [1416,3,2,40],
                  [852,2,1,35]])
y_train=np.array([460,232,178])

In [13]:
def derivatives(X,y,w,b):
    """Calulates partial derivatives for w & b

        Arguments:
            X(ndarray,(m,n)): Data m examples with n features
            y(ndarray,(m,)): target values
            w(ndarray,(n,)): Model parameters
            b(scalar): model parameter
        Returns:
            d_w(ndarray,(n,)): gradient of the cost w.r.t parameters w
            d_b(scalar): gradient of the cost w.r.t parameter b 
    """
    m,n=X.shape #(no of examples,no of features)
    d_w=np.zeros(n,dtype=float) # Cz a derivative for each feature
    d_b=0.
    for i in range(m):
        err=np.dot(X[i],w)+b-y[i] # err=f_w,b-y; f_w,b=X.w+b
        for j in range(n):
            d_w[j]+=err*X[i,j]
        d_b+=err
    d_w/=m;d_b/=m
    return d_w,d_b

In [5]:
def gradient_descent(X,y,w_in,b_in,derivative_func,alpha,num_iters):
    """
    Performs batch gradient descent to learn w & b by taking
    num_iters gradient steps with learning rate alpha

    Args:
        X(ndarray,(m,n)): Data m examples with n features
        y(ndarray,(m,)): target values
        w_in(ndarray,(n,)): Initial model parameters
        b_in(scalar): Initial model parameter
        derivative_func: function to compute the gradient
        alpha(float): Learning rate
        num_iters(int): Number of gradient steps to take 
    
    Returns:
        w (ndarray (n,)) : Updated values of parameters 
        b (scalar)       : Updated value of parameter
    """
    w=copy.deepcopy(w_in) # To not change the global w
    b=b_in 
    for i in range(num_iters):
        d_w,d_b=derivative_func(X,y,w,b) # Calulate the gradients for value w,b
        # Update the values
        w=w-alpha*d_w 
        b=b-alpha*d_b
    return w,b

In [16]:
w_in=np.zeros(X_train.shape[1],dtype=float) #n=X_train.shape[1]
b_in=0
alpha=5.0e-7
num_iters=1000
w,b=gradient_descent(X_train,y_train,w_in,b_in,derivatives,alpha,num_iters)
print(f"Parameters: b:{b},w:{w}")
f_wb=np.zeros_like(y_train,dtype=float)
for i in range(X_train.shape[0]):
    f_wb[i]=np.dot(w,X_train[i])+b
    print(f"Prediction: {f_wb[i]:0.2f}, Target: {y_train[i]}")

Parameters: b:-0.002235407530932535,w:[ 0.20396569  0.00374919 -0.0112487  -0.0658614 ]
Prediction: 426.19, Target: 460
Prediction: 286.17, Target: 232
Prediction: 171.47, Target: 178
