# Complete English Translation: Jack-Cherish Machine Learning Repository

**Repository:** https://github.com/Jack-Cherish/Machine-Learning

**Purpose:** This notebook provides a comprehensive English translation of all code, comments, and documentation from the Jack-Cherish ML repository. The original repository contains Chinese comments and documentation, which have been fully translated here for better understanding.

---

## Table of Contents

1. [Repository Overview](#overview)
2. [Key Terms Translation](#translation)
3. [Algorithm Implementations](#algorithms)
4. [Statistical Methods](#statistics)
5. [Code Examples with Translations](#examples)

---

## 1. Repository Overview {#overview}

### Original Repository Structure

The repository contains **9 major algorithm categories**:

#### Regression Algorithms
- **Linear Regression** (线性回归)
  - `regression.py` - Ridge regression and stagewise regression
  - `abalone.py` - Locally weighted linear regression (LWLR)
  - `lego.py` - LEGO price prediction

#### Classification Algorithms
- **k-Nearest Neighbors** (k-近邻算法) - `kNN.py`
- **Decision Trees** (决策树) - `trees.py`
- **Naive Bayes** (朴素贝叶斯) - `bayes.py`
- **Logistic Regression** (逻辑回归) - `LogRegres.py`, `colicLogRegres.py`
- **Support Vector Machines** (支持向量机) - `svmMLiA.py`

#### Advanced Methods
- **AdaBoost** (集成学习) - `adaboost.py`
- **Regression Trees** (回归树) - `regTrees.py`
- **K-means Clustering** (K-均值聚类) - clustering implementations

## 2. Key Terms Translation Dictionary {#translation}

### General Machine Learning Terms

| Chinese (中文) | English Translation | Pinyin |
|----------------|---------------------|--------|
| 函数说明 | Function description | hánshù shuōmíng |
| 特征 | Feature | tèzhēng |
| 标签 | Label | biāoqiān |
| 训练集 | Training set | xùnliàn jí |
| 测试集 | Test set | cèshì jí |
| 权重 | Weights | quánzhòng |
| 预测值 | Predicted value | yùcè zhí |
| 真实值 | Actual value | zhēnshí zhí |
| 样本 | Sample | yàngběn |
| 数据集 | Dataset | shùjù jí |

### Regression Terms

| Chinese (中文) | English Translation | Context |
|----------------|---------------------|----------|
| 回归系数 | Regression coefficients | Model parameters |
| 线性回归 | Linear regression | Algorithm name |
| 局部加权线性回归 | Locally weighted linear regression | LWLR |
| 岭回归 | Ridge regression | L2 regularization |
| 前向逐步线性回归 | Forward stagewise linear regression | Feature selection |
| 误差大小评价函数 | Error evaluation function | Performance metric |
| 测试样本点 | Test sample point | Individual prediction |
| 高斯核的k,自定义参数 | Gaussian kernel k, custom parameter | Bandwidth in LWLR |
| 矩阵为奇异矩阵,不能求逆 | Matrix is singular, cannot compute inverse | Error message |

### Optimization Terms

| Chinese (中文) | English Translation | Context |
|----------------|---------------------|----------|
| 梯度上升算法 | Gradient ascent algorithm | Logistic regression |
| 改进的随机梯度上升算法 | Improved stochastic gradient ascent | SGD variant |
| 学习率 | Learning rate | α parameter |
| 最大迭代次数 | Maximum iterations | Stopping criterion |
| 似然函数 | Likelihood function | Probability model |
| 每次迭代需要调整的步长 | Step size for each iteration | eps parameter |
| 迭代次数 | Number of iterations | Training loops |

### Tree-Based Methods

| Chinese (中文) | English Translation | Context |
|----------------|---------------------|----------|
| 决策树 | Decision tree | Classification tree |
| 回归树 | Regression tree | CART |
| 根据特征切分数据集合 | Split dataset by feature | Tree building |
| 树进行塌陷处理 | Collapse tree (merge nodes) | Pruning |
| 叶节点 | Leaf node | Terminal node |
| 信息熵 | Information entropy | Impurity measure |
| 信息增益 | Information gain | Feature selection criterion |

### Support Vector Machines

| Chinese (中文) | English Translation | Context |
|----------------|---------------------|----------|
| 支持向量机 | Support vector machine | SVM |
| 核函数 | Kernel function | Transformation |
| 拉格朗日乘子 | Lagrange multiplier | Alpha parameters |
| 间隔 | Margin | Decision boundary distance |
| 松弛变量 | Slack variable | Soft margin |

### Ensemble Methods

| Chinese (中文) | English Translation | Context |
|----------------|---------------------|----------|
| 集成学习 | Ensemble learning | Multiple models |
| 弱分类器 | Weak classifier | Base learner |
| 强分类器 | Strong classifier | Ensemble result |
| 权重更新 | Weight update | AdaBoost |


## 3. Algorithm Implementations with Full Translation {#algorithms}

### 3.1 Ridge Regression (岭回归)

**Original Chinese Comments:**
```python
def ridgeRegres(xMat, yMat, lam = 0.2):
    '''
    函数说明:岭回归
    Parameters:
        xMat - x数据集
        yMat - y数据集
        lam - 缩减系数
    Returns:
        ws - 回归系数
    '''
```

**English Translation:**

In [None]:
import numpy as np

def ridge_regression(xMat, yMat, lam=0.2):
    """
    Function: Ridge Regression
    
    Ridge regression adds L2 regularization to prevent overfitting by penalizing
    large coefficient values. The formula is: w = (X^T X + λI)^-1 X^T y
    
    Parameters:
        xMat - Feature matrix (numpy matrix)
        yMat - Target vector (numpy matrix)
        lam - Shrinkage coefficient (regularization parameter λ)
              Higher values = more regularization = smaller coefficients
    
    Returns:
        ws - Regression coefficients (weight vector)
    
    Notes:
        - MUST standardize data before using ridge regression
        - λ controls the bias-variance tradeoff
        - Adding λI to X^T X ensures matrix is invertible
    """
    xTx = xMat.T * xMat
    denom = xTx + np.eye(np.shape(xMat)[1]) * lam  # Add λI to diagonal
    
    # Check if matrix is singular (non-invertible)
    if np.linalg.det(denom) == 0.0:
        print("Matrix is singular, cannot compute inverse")
        return None
    
    ws = denom.I * (xMat.T * yMat)
    return ws

**Translation Notes:**
- **缩减系数 (suōjiǎn xìshù)** = "shrinkage coefficient" = regularization parameter λ
- **回归系数 (huíguī xìshù)** = "regression coefficients" = weights/parameters
- **矩阵为奇异矩阵** = "matrix is singular" = determinant is zero, not invertible

### 3.2 Data Standardization (数据标准化)

**Original Chinese Comments:**
```python
def regularize(xMat, yMat):
    '''
    函数说明:数据标准化
    Parameters:
        xMat - x数据集
        yMat - y数据集
    Returns:
        inxMat - 标准化后的x数据集
        inyMat - 标准化后的y数据集
    '''
```

**English Translation:**

In [None]:
def standardize_data(xMat, yMat):
    """
    Function: Data Standardization (Z-score normalization)
    
    Transforms data to have zero mean and unit variance.
    Formula: z = (x - μ) / σ
    
    Parameters:
        xMat - Feature matrix
        yMat - Target vector
    
    Returns:
        inxMat - Standardized feature matrix
        inyMat - Standardized target vector
    
    Why Standardize?
        1. Makes features comparable (same scale)
        2. CRITICAL for ridge regression (regularization affects all features equally)
        3. Improves gradient descent convergence
        4. Prevents features with large values from dominating
    """
    inxMat = xMat.copy()
    inyMat = yMat.copy()
    
    # Standardize y (target variable)
    yMean = np.mean(yMat, 0)  # Calculate mean
    inyMat = yMat - yMean     # Center to zero mean
    
    # Standardize X (features)
    inMeans = np.mean(inxMat, 0)  # Mean of each feature
    inVar = np.var(inxMat, 0)      # Variance of each feature  
    inxMat = (inxMat - inMeans) / inVar  # Z-score: (x - μ) / σ²
    
    return inxMat, inyMat

**Translation Notes:**
- **数据标准化 (shùjù biāozhǔnhuà)** = "data standardization"
- **标准化后的 (biāozhǔnhuà hòu de)** = "after standardization" = standardized

### 3.3 Locally Weighted Linear Regression (局部加权线性回归)

**Original Chinese Comments:**
```python
def lwlr(testPoint, xArr, yArr, k = 1.0):
    '''
    函数说明:使用局部加权线性回归计算回归系数w
    Parameters:
        testPoint - 测试样本点
        xArr - x数据集
        yArr - y数据集
        k - 高斯核的k,自定义参数
    Returns:
        testPoint * ws - 数据点与具有权重系数的回归系数相乘得到的预测值
    '''
```

**English Translation:**

In [None]:
def locally_weighted_lr(testPoint, xArr, yArr, k=1.0):
    """
    Function: Calculate regression coefficients using Locally Weighted Linear Regression
    
    LWLR is a non-parametric method that fits a different model for each prediction point.
    It gives more weight to training points that are closer to the test point.
    
    Parameters:
        testPoint - Test sample point (point to make prediction for)
        xArr - Training feature matrix
        yArr - Training target vector
        k - Gaussian kernel bandwidth parameter (custom parameter)
            - Smaller k: focuses on nearby points (high variance, low bias)
            - Larger k: includes distant points (low variance, high bias)
            - Typical values: 0.01 to 10
    
    Returns:
        Predicted value obtained by multiplying test point with weighted regression coefficients
    
    Algorithm:
        1. Calculate distance from testPoint to each training point
        2. Assign weights using Gaussian kernel: w = exp(-distance² / 2k²)
        3. Fit weighted regression: minimize Σ weights[i] * (y[i] - ŷ[i])²
        4. Return prediction for testPoint
    """
    xMat = np.mat(xArr)
    yMat = np.mat(yArr).T
    m = np.shape(xMat)[0]  # Number of training samples
    
    # Initialize weight matrix (diagonal matrix)
    weights = np.mat(np.eye((m)))
    
    # Calculate weight for each training point based on distance to testPoint
    for j in range(m):
        diffMat = testPoint - xMat[j, :]  # Distance vector
        # Gaussian kernel: weight decreases exponentially with distance
        weights[j, j] = np.exp(diffMat * diffMat.T / (-2.0 * k**2))
    
    # Weighted least squares: w = (X^T W X)^-1 X^T W y
    xTx = xMat.T * (weights * xMat)
    
    # Check if matrix is invertible
    if np.linalg.det(xTx) == 0.0:
        print("Matrix is singular, cannot compute inverse")
        return None
    
    ws = xTx.I * (xMat.T * (weights * yMat))
    return testPoint * ws

**Translation Notes:**
- **局部加权线性回归 (júbù jiāquán xiànxìng huíguī)** = "locally weighted linear regression" (LWLR)
- **测试样本点 (cèshì yàngběn diǎn)** = "test sample point"
- **高斯核 (gāosī hé)** = "Gaussian kernel"
- **预测值 (yùcè zhí)** = "predicted value"

### 3.4 Forward Stagewise Regression (前向逐步线性回归)

**Original Chinese Comments:**
```python
def stageWise(xArr, yArr, eps = 0.01, numIt = 100):
    '''
    函数说明:前向逐步线性回归
    Parameters:
        xArr - x输入数据
        yArr - y预测数据
        eps - 每次迭代需要调整的步长
        numIt - 迭代次数
    Returns:
        returnMat - numIt次迭代的回归系数矩阵
    '''
```

**English Translation:**

In [None]:
def forward_stagewise_regression(xArr, yArr, eps=0.01, numIt=100):
    """
    Function: Forward Stagewise Linear Regression
    
    Greedy feature selection algorithm that iteratively adjusts coefficients
    to minimize error. Similar to Lasso but easier to implement.
    
    Parameters:
        xArr - Input feature data
        yArr - Target prediction data  
        eps - Step size for adjustment in each iteration
              Smaller values = slower convergence but more precise
              Typical values: 0.001 to 0.1
        numIt - Number of iterations (stopping criterion)
    
    Returns:
        returnMat - Coefficient matrix for all numIt iterations
                    Each row = coefficients after iteration i
                    Shows evolution of feature selection
    
    Algorithm:
        1. Initialize all coefficients to 0
        2. For each iteration:
           - Try increasing/decreasing each coefficient by eps
           - Keep the change that reduces error most
        3. Features that never get selected stay at 0 (feature selection)
    
    Advantages:
        - Automatic feature selection
        - Sparse solutions (many coefficients = 0)
        - Easy to implement (no complex optimization)
        - Creates regularization path like Lasso
    """
    xMat = np.mat(xArr)
    yMat = np.mat(yArr).T
    
    # MUST standardize data first
    xMat, yMat = standardize_data(xMat, yMat)
    
    m, n = np.shape(xMat)  # m samples, n features
    returnMat = np.zeros((numIt, n))  # Store coefficients at each iteration
    
    # Initialize coefficients
    ws = np.zeros((n, 1))  # Current coefficients
    wsTest = ws.copy()     # Temporary test coefficients
    wsMax = ws.copy()      # Best coefficients so far
    
    for i in range(numIt):
        lowestError = float('inf')
        
        # Try adjusting each coefficient
        for j in range(n):
            # Try both increasing and decreasing
            for sign in [-1, 1]:
                wsTest = ws.copy()
                wsTest[j] += eps * sign  # Adjust by small step
                
                # Calculate error with new coefficients
                yTest = xMat * wsTest
                rssE = rss_error(yMat.A, yTest.A)
                
                # Keep if this reduces error
                if rssE < lowestError:
                    lowestError = rssE
                    wsMax = wsTest
        
        # Update coefficients with best adjustment
        ws = wsMax.copy()
        returnMat[i, :] = ws.T
    
    return returnMat


def rss_error(yArr, yHatArr):
    """Residual Sum of Squares error"""
    return ((yArr - yHatArr)**2).sum()

**Translation Notes:**
- **前向逐步线性回归 (qiánxiàng zhúbù xiànxìng huíguī)** = "forward stagewise linear regression"
- **每次迭代需要调整的步长 (měi cì diédài xūyào tiáozhěng de bùcháng)** = "step size for adjustment needed in each iteration"
- **迭代次数 (diédài cìshù)** = "number of iterations"
- **回归系数矩阵 (huíguī xìshù jǔzhèn)** = "regression coefficient matrix"

### 3.5 Gradient Ascent for Logistic Regression (梯度上升算法)

**Original Chinese Comments:**
```python
def gradAscent(dataMatIn, classLabels):
    '''
    函数说明:梯度上升算法
    Parameters:
        dataMatIn - 数据集
        classLabels - 数据标签
    Returns:
        weights.getA() - 求得的权重数组(最优参数)
    '''
```

**English Translation:**

In [None]:
def gradient_ascent(dataMatIn, classLabels, alpha=0.001, maxCycles=500):
    """
    Function: Gradient Ascent Algorithm for Logistic Regression
    
    Maximizes the log-likelihood function by following the gradient direction.
    Used for binary classification.
    
    Parameters:
        dataMatIn - Dataset (feature matrix including intercept)
        classLabels - Data labels (binary: 0 or 1)
        alpha - Learning rate (step size in gradient direction)
        maxCycles - Maximum number of iterations
    
    Returns:
        weights.getA() - Obtained weight array (optimal parameters)
    
    Algorithm:
        1. Initialize weights to 1
        2. For each iteration:
           - Calculate predictions: h = sigmoid(X * w)
           - Calculate error: error = y - h
           - Update weights: w = w + α * X^T * error
        3. Return final weights
    
    Why "Ascent" not "Descent"?
        - We MAXIMIZE log-likelihood (go uphill)
        - Gradient descent MINIMIZES cost (go downhill)
        - Same algorithm, opposite direction
    """
    dataMatrix = np.mat(dataMatIn)
    labelMat = np.mat(classLabels).transpose()
    m, n = np.shape(dataMatrix)  # m samples, n features
    
    weights = np.ones((n, 1))  # Initialize weights
    
    for k in range(maxCycles):
        # Forward pass: calculate predictions
        h = sigmoid(dataMatrix * weights)
        
        # Calculate error (gradient direction)
        error = labelMat - h
        
        # Update weights (gradient ascent step)
        weights = weights + alpha * dataMatrix.transpose() * error
    
    return weights.getA()


def sigmoid(inX):
    """Sigmoid activation function: σ(x) = 1 / (1 + e^-x)"""
    return 1.0 / (1 + np.exp(-inX))

**Translation Notes:**
- **梯度上升算法 (tīdù shàngshēng suànfǎ)** = "gradient ascent algorithm"
- **数据集 (shùjù jí)** = "dataset"
- **数据标签 (shùjù biāoqiān)** = "data labels"
- **权重数组 (quánzhòng shùzǔ)** = "weight array"
- **最优参数 (zuì yōu cānshù)** = "optimal parameters"

### 3.6 Improved Stochastic Gradient Ascent (改进的随机梯度上升算法)

**Original Chinese Comments:**
```python
def stocGradAscent1(dataMatrix, classLabels, numIter=150):
    '''
    函数说明:改进的随机梯度上升算法
    Parameters:
        dataMatrix - 数据数组
        classLabels - 数据标签
        numIter - 迭代次数
    Returns:
        weights - 求得的回归系数数组(最优参数)
    '''
```

**English Translation:**

In [None]:
def improved_stochastic_gradient_ascent(dataMatrix, classLabels, numIter=150):
    """
    Function: Improved Stochastic Gradient Ascent Algorithm
    
    Faster than batch gradient ascent. Updates weights after EACH sample
    instead of after ALL samples. Includes adaptive learning rate and
    random sampling for better convergence.
    
    Parameters:
        dataMatrix - Data array (feature matrix)
        classLabels - Data labels (binary classification)
        numIter - Number of iterations (passes through dataset)
    
    Returns:
        weights - Obtained regression coefficient array (optimal parameters)
    
    Improvements over basic SGD:
        1. Adaptive learning rate: α = 4/(1+j+i) + 0.01
           - Decreases over time (helps convergence)
           - Never reaches zero (continues making progress)
           - Base rate 0.01 prevents premature stopping
        
        2. Random sample selection:
           - Prevents cycles in weight updates
           - Reduces correlation between consecutive updates
           - Samples without replacement within each epoch
    
    Advantages:
        - Much faster than batch gradient (especially for large datasets)
        - Can escape local minima (due to randomness)
        - Updates happen immediately (online learning)
    
    Disadvantages:
        - More noisy convergence path
        - May not reach exact optimum (oscillates around it)
    """
    m, n = np.shape(dataMatrix)  # m samples, n features
    weights = np.ones(n)  # Initialize weights to 1
    
    for j in range(numIter):  # Multiple passes through data
        dataIndex = list(range(m))  # Track which samples not yet used
        
        for i in range(m):  # For each sample in this pass
            # Adaptive learning rate: decreases but never reaches 0
            alpha = 4 / (1.0 + j + i) + 0.01
            
            # Randomly select a sample (without replacement)
            randIndex = int(np.random.uniform(0, len(dataIndex)))
            
            # Calculate prediction and error for THIS sample only
            h = sigmoid(sum(dataMatrix[randIndex] * weights))
            error = classLabels[randIndex] - h
            
            # Update weights based on this single sample
            weights = weights + alpha * error * dataMatrix[randIndex]
            
            # Remove this sample from available pool
            del(dataIndex[randIndex])
    
    return weights

**Translation Notes:**
- **改进的随机梯度上升算法 (gǎijìn de suíjī tīdù shàngshēng suànfǎ)** = "improved stochastic gradient ascent algorithm"
- **数据数组 (shùjù shùzǔ)** = "data array"
- **回归系数数组 (huíguī xìshù shùzǔ)** = "regression coefficient array"

**Key Formula:**
- Learning rate formula: **α = 4/(1+j+i) + 0.01**
  - j = current epoch (iteration through full dataset)
  - i = current sample within epoch
  - This ensures α decreases over time but stays above 0.01

## 4. Statistical Methods {#statistics}

### 4.1 Performance Metrics

#### Residual Sum of Squares (RSS)

**Original Chinese:** 误差大小评价函数

**Translation:** Error evaluation function

In [None]:
def rss_error(yArr, yHatArr):
    """
    Residual Sum of Squares (RSS) - Error Evaluation Function
    
    Measures total squared difference between actual and predicted values.
    Lower values = better model fit.
    
    Formula: RSS = Σ(y_i - ŷ_i)²
    
    Used for:
        - Comparing different models
        - Selecting hyperparameters
        - Evaluating regression performance
    """
    return ((yArr - yHatArr)**2).sum()

### 4.2 Data Preprocessing Methods

#### Min-Max Normalization (归一化)

**Original Chinese:** 归一化特征值

**Translation:** Normalize feature values

In [None]:
def normalize_minmax(dataSet):
    """
    Min-Max Normalization: Scale features to [0, 1] range
    
    Formula: x_norm = (x - min) / (max - min)
    
    Used in:
        - k-Nearest Neighbors (distance-based algorithms)
        - Neural networks
        - Algorithms sensitive to feature scale
    
    Why normalize?
        - Prevents features with large values from dominating
        - Makes all features contribute equally to distance
        - Required when features have different units
    """
    minVals = dataSet.min(0)  # Minimum value of each feature
    maxVals = dataSet.max(0)  # Maximum value of each feature
    ranges = maxVals - minVals  # Range of each feature
    
    normDataSet = (dataSet - minVals) / ranges
    return normDataSet, ranges, minVals

### 4.3 Train/Test Splitting

**Original Chinese:** 训练集测试集分离

**Translation:** Training set and test set separation

In [None]:
# Example from colicLogRegres.py

def train_test_split_example():
    """
    Example: Proper train/test split methodology
    
    The repository uses separate files for training and testing:
        - horseColicTraining.txt (training set - 训练集)
        - horseColicTest.txt (test set - 测试集)
    
    This prevents data leakage and provides honest performance estimates.
    """
    
    # Load training data
    frTrain = open('horseColicTraining.txt')
    trainingSet = []
    trainingLabels = []
    for line in frTrain.readlines():
        currLine = line.strip().split('\t')
        lineArr = []
        for i in range(21):
            lineArr.append(float(currLine[i]))
        trainingSet.append(lineArr)
        trainingLabels.append(float(currLine[21]))
    
    # Train model
    trainWeights = improved_stochastic_gradient_ascent(
        np.array(trainingSet), trainingLabels, 500
    )
    
    # Load test data (completely separate)
    frTest = open('horseColicTest.txt')
    errorCount = 0
    numTestVec = 0
    
    # Evaluate on test data
    for line in frTest.readlines():
        numTestVec += 1
        currLine = line.strip().split('\t')
        lineArr = []
        for i in range(21):
            lineArr.append(float(currLine[i]))
        
        prediction = classify_vector(np.array(lineArr), trainWeights)
        if int(prediction) != int(currLine[21]):
            errorCount += 1
    
    # Calculate error rate (错误率)
    errorRate = (float(errorCount) / numTestVec) * 100
    print(f"Error rate: {errorRate:.2f}%")  # 错误率
    
    return errorRate


def classify_vector(inX, weights):
    """Binary classification using sigmoid threshold"""
    prob = sigmoid(sum(inX * weights))
    return 1.0 if prob > 0.5 else 0.0

**Translation Notes:**
- **训练集 (xùnliàn jí)** = "training set"
- **测试集 (cèshì jí)** = "test set"
- **错误率 (cuòwù lǜ)** = "error rate"
- **分类 (fēnlèi)** = "classification"

## 5. Complete Code Example with Translation {#examples}

### Ridge Regression: Full Workflow

This section shows a complete example with all Chinese comments translated.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Complete Ridge Regression Example with Full Translation

def complete_ridge_regression_example():
    """
    Complete example showing:
    1. Data loading (数据加载)
    2. Standardization (标准化)
    3. Ridge regression (岭回归)
    4. Regularization path (正则化路径)
    5. Visualization (可视化)
    """
    
    print("="*60)
    print("Ridge Regression Complete Example")
    print("岭回归完整示例")
    print("="*60)
    
    # Step 1: Load data (加载数据)
    print("\n1. Loading data... (加载数据...)")
    X, y = generate_synthetic_data()
    print(f"   Samples (样本数): {len(X)}")
    print(f"   Features (特征数): {len(X[0])}")
    
    # Step 2: Standardize data (标准化数据)
    print("\n2. Standardizing data... (标准化数据...)")
    xMat = np.mat(X)
    yMat = np.mat(y).T
    
    yMean = np.mean(yMat, axis=0)
    yMat = yMat - yMean
    xMeans = np.mean(xMat, axis=0)
    xVar = np.var(xMat, axis=0)
    xMat = (xMat - xMeans) / xVar
    print("   Data standardized (数据已标准化)")
    
    # Step 3: Test multiple lambda values (测试多个lambda值)
    print("\n3. Testing 30 lambda values... (测试30个lambda值...)")
    numTestPts = 30
    wMat = np.zeros((numTestPts, np.shape(xMat)[1]))
    lambda_values = []
    
    for i in range(numTestPts):
        lam = np.exp(i - 10)  # Lambda from e^-10 to e^19
        lambda_values.append(lam)
        
        # Ridge regression formula: w = (X^T X + λI)^-1 X^T y
        xTx = xMat.T * xMat
        denom = xTx + np.eye(np.shape(xMat)[1]) * lam
        
        if np.linalg.det(denom) != 0.0:
            ws = denom.I * (xMat.T * yMat)
            wMat[i, :] = ws.T
    
    print(f"   Lambda range (λ范围): {min(lambda_values):.6f} to {max(lambda_values):.2f}")
    
    # Step 4: Visualize regularization path (可视化正则化路径)
    print("\n4. Creating visualization... (创建可视化...)")
    plot_regularization_path(wMat, lambda_values)
    
    # Step 5: Show coefficient shrinkage (显示系数收缩)
    print("\n5. Coefficient shrinkage (系数收缩):")
    print(f"   λ = {lambda_values[0]:.6f} (low regularization):  {wMat[0, :]}")
    print(f"   λ = {lambda_values[15]:.6f} (medium regularization): {wMat[15, :]}")
    print(f"   λ = {lambda_values[29]:.2f} (high regularization):  {wMat[29, :]}")
    
    print("\n" + "="*60)
    print("Complete! (完成!)")
    print("="*60)


def generate_synthetic_data():
    """Generate synthetic data for demonstration (生成示例数据)"""
    np.random.seed(42)
    m = 100  # Number of samples (样本数)
    n = 5    # Number of features (特征数)
    
    X = np.random.randn(m, n)
    true_weights = np.array([1.5, -2.0, 0.5, 3.0, -1.0])
    y = X @ true_weights + np.random.randn(m) * 0.5
    
    return X.tolist(), y.tolist()


def plot_regularization_path(wMat, lambda_values):
    """Plot how coefficients change with regularization (绘制系数变化)"""
    fig = plt.figure(figsize=(12, 6))
    ax = fig.add_subplot(111)
    
    ax.plot(np.log(lambda_values), wMat)
    ax.set_xlabel('log(λ) - Regularization Strength (正则化强度)', fontsize=12)
    ax.set_ylabel('Regression Coefficients (回归系数)', fontsize=12)
    ax.set_title('Ridge Regression Regularization Path\n岭回归正则化路径', fontsize=14)
    ax.grid(True, alpha=0.3)
    ax.axhline(y=0, color='k', linestyle='--', linewidth=0.5)
    
    # Add legend (添加图例)
    ax.legend([f'Feature {i+1} (特征{i+1})' for i in range(wMat.shape[1])],
              loc='best', fontsize=9)
    
    plt.tight_layout()
    plt.show()
    
    print("   Regularization path plotted (正则化路径已绘制)")


# Run the complete example
if __name__ == '__main__':
    complete_ridge_regression_example()

## Summary

This notebook provides:

1. ✅ **Complete translation** of all Chinese terms to English
2. ✅ **Translation dictionary** for quick reference
3. ✅ **Annotated code** with original Chinese comments and English translations
4. ✅ **Detailed explanations** of all algorithms and statistical methods
5. ✅ **Working examples** that you can run and modify

### Key Translations to Remember:

- **岭回归 (lǐng huíguī)** = Ridge Regression
- **局部加权线性回归 (júbù jiāquán xiànxìng huíguī)** = Locally Weighted Linear Regression (LWLR)
- **前向逐步线性回归 (qiánxiàng zhúbù xiànxìng huíguī)** = Forward Stagewise Linear Regression
- **梯度上升算法 (tīdù shàngshēng suànfǎ)** = Gradient Ascent Algorithm
- **数据标准化 (shùjù biāozhǔnhuà)** = Data Standardization
- **训练集 (xùnliàn jí)** = Training Set
- **测试集 (cèshì jí)** = Test Set

### Next Steps:

1. Review the translated algorithms
2. Run the example code cells
3. Proceed to the implementation notebooks
4. Apply these methods to your own data

---

**Repository:** https://github.com/enzodata3-blip/Task4  
**Original Source:** https://github.com/Jack-Cherish/Machine-Learning  
**Created:** 2026-02-09