# **Gradient Descent for Linear Regression**

<br>

## **Goals**
In this lab, you will:
- automate the process of optimizing $w$ and $b$ using gradient descent.

<br>

## **Tools**
In this lab, we will make use of: 
- NumPy, a popular library for scientific computing
- Matplotlib, a popular library for plotting data
- plotting routines in the lab_utils.py file in the local directory

In [None]:
import sys, os
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

def running_in_colab() -> bool:
    try:
        import google.colab  # 只有在 Colab 才存在
        return True
    except Exception:
        return False

def import_data_from_github():
    import os, urllib.request, pathlib, shutil

    # 1) 乾淨化 /content/utils
    UTILS_DIR = "/content/utils"
    REPO_DIR = "Machine-Learning-Lab"

    shutil.rmtree(UTILS_DIR, ignore_errors=True)

    os.makedirs("/content/utils", exist_ok=True)
    pathlib.Path("/content/utils/__init__.py").touch()  # 保險

    BASE = f"https://raw.githubusercontent.com/mz038197/{REPO_DIR}/main"
    urllib.request.urlretrieve(f"{BASE}/utils/lab_utils_common.py", "/content/utils/lab_utils_common.py")
    urllib.request.urlretrieve(f"{BASE}/utils/lab_utils_uni.py", "/content/utils/lab_utils_uni.py")
    urllib.request.urlretrieve(f"{BASE}/utils/deeplearning.mplstyle", "/content/utils/deeplearning.mplstyle")

    # 讓 Python 能找到 /content 下面的 utils
    if "/content" not in sys.path:
        sys.path.insert(0, "/content")

if running_in_colab(): import_data_from_github()


def find_repo_root(marker="utils"):
    cur = Path.cwd()
    while cur != cur.parent:  # 防止無限迴圈，到達檔案系統根目錄就停
        if (cur / marker).exists():
            return cur
        cur = cur.parent
    raise FileNotFoundError(f"找不到包含 {marker} 的專案根目錄")

repo_root = find_repo_root()
os.chdir(repo_root)
sys.path.append(str(repo_root)) if str(repo_root) not in sys.path else None
print(f"工作目錄已切換到 {Path.cwd()} 且加入到系統路徑")

from utils.lab_utils_uni import plt_house_x, plt_contour_wgrad, plt_divergence, plt_gradients
plt.style.use('utils/deeplearning.mplstyle')
print("匯入模組及設定繪圖樣式完成!")


<br>

## **Problem Statement**

Let's use the same two data points as before - a house with 1000 square feet sold for $300,000 and a house with 2000 square feet sold for $500,000.

| Size (1000 sqft)     | Price (1000s of dollars) |
| ----------------| ------------------------ |
| 1               | 300                      |
| 2               | 500                      |

In [None]:
x_train = np.array([1.0, 2.0])           #(size in 1000 square feet)
y_train = np.array([300.0, 500.0])           #(price in 1000s of dollars)

<br>

## **Computing Cost**
This was developed in the last lab. We'll need it again here.

In [None]:
def compute_cost(x, y, w, b): 

    m = x.shape[0] 
    cost = 0 
    
    for i in range(m): 
        f_wb = w * x[i] + b   
        cost += (f_wb - y[i]) ** 2  

    total_cost = (1 / (2 * m)) * cost  

    return total_cost

<br>

## **Gradient descent summary**
So far in this course, you have developed a linear model that predicts $f_{w,b}(x^{(i)})$:
$$f_{w,b}(x^{(i)}) = wx^{(i)} + b \tag{1}$$
In linear regression, you utilize input training data to fit the parameters $w$,$b$ by minimizing a measure of the error between our predictions $f_{w,b}(x^{(i)})$ and the actual data $y^{(i)}$. The measure is called the $cost$, $J(w,b)$. In training you measure the cost over all of our training samples $x^{(i)},y^{(i)}$
$$J(w,b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2\tag{2}$$ 

In lecture, *gradient descent* was described as:

$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline
\;  w &= w -  \alpha \frac{\partial J(w,b)}{\partial w} \tag{3}  \; \newline 
 b &= b -  \alpha \frac{\partial J(w,b)}{\partial b}  \newline \rbrace
\end{align*}$$
where, parameters $w$, $b$ are updated simultaneously.  
The gradient is defined as:
$$
\begin{align}
\frac{\partial J(w,b)}{\partial w}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)} \tag{4}\\
  \frac{\partial J(w,b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)}) \tag{5}\\
\end{align}
$$

Here *simultaniously* means that you calculate the partial derivatives for all the parameters before updating any of the parameters.

<br>