## 1. Definition

**Linear Regression** is a supervised learning method used to model the relationship between a **dependent variable** (target) and one or more **independent variables** (features). The model is typically expressed as:

$$
f(\mathbf{x}) = \mathbf{w}^T \mathbf{x} + b,
$$

where:
- $\mathbf{w}$ is the vector of weights (or coefficients), 
- $(b)$ is the bias (or intercept),
- $({x})$ is the input vector.

## 2. Objective Function (Cost Function)
A commonly used cost function for Linear Regression is the **Mean Squared Error (MSE)**:
$$
\text{MSE}(\mathbf{w}, b) = \frac{1}{N} \sum_{i=1}^{N} \left(y^{(i)} - \hat{y}^{(i)}\right)^2,
$$



where:
- $N$ is the number of training examples,
- $y^{(i)}$ is the actual target value for the $i$-th sample,
- $\hat{y}^{(i)}$ = $\mathbf{w}^T \mathbf{x}^{(i)} + b$ is the predicted value for the $(i)$-th sample.


## 3. Parameter Estimation
There are two primary methods to estimate the parameters $\mathbf{w}$ and $(b)$:

1. **Analytical Solution (Normal Equation)**

   $$
   \mathbf{w} = (X^T X)^{-1} X^T \mathbf{y},
   $$
   - This provides a closed-form solution but can be computationally expensive when the number of features is very large.

2. **Gradient Descent**  
   - **Batch Gradient Descent**: Update parameters using the entire training set for each step.  
   - **Stochastic/Mini-batch Gradient Descent**: Update parameters using one sample (SGD) or a small batch of samples (mini-batch) at a time. This often converges faster for large datasets.


3. In linear regression we optimize for the **intercept** and **coefficients** of our model

## 4. Assumptions of Linear Regression
1. **Linearity**: The relationship between features and the target is (approximately) linear.  
2. **Independence**: Observations are independent (no autocorrelation).  
3. **Homoscedasticity**: The variance of errors is constant across all levels of the independent variables.  
4. **Normality of Residuals**: Residuals (errors) are normally distributed.  
5. **Low Multicollinearity**: Features should not be excessively correlated with each other.


## 5. Common Interview Questions
1. **Explain the cost function**: Usually MSE, and why it is preferred (differentiability, etc.).  
2. **What is Gradient Descent?**: How it minimizes the cost function, role of the learning rate, and convergence criteria.  
3. **Assumptions of Linear Regression**: (listed above).  
4. **How to handle overfitting?**: Regularization (L1/Lasso, L2/Ridge), cross-validation, feature selection.  
5. **Evaluation metrics**: MSE, RMSE, MAE, $R^2$.

In [1]:
import numpy as np

class LinearRegressionScratch:
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        """
        Инициализация гиперпараметров модели.
        :param learning_rate: скорость обучения (шаг градиентного спуска)
        :param n_iterations: количество итераций (эпох)
        """
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = None  # Весовые коэффициенты
        self.bias = None     # Смещение (bias)

    def fit(self, X, y):
        """
        Обучение (тренировка) линейной регрессии методом градиентного спуска.
        :param X: матрица входных данных формы (N, d), где
                  N - количество наблюдений (образцов),
                  d - количество признаков (фичей).
        :param y: вектор целевых значений формы (N,).
        """
        # Считываем количество образцов (N) и количество признаков (d)
        N, d = X.shape

        # Инициализация весов и смещения нулями
        self.weights = np.zeros(d)
        self.bias = 0.0

        # Цикл по количеству итераций
        for _ in range(self.n_iterations):
            # 1) Считаем предсказание: y_pred = X @ weights + bias
            #   np.dot(X, self.weights) -- это матричное умножение
            y_pred = np.dot(X, self.weights) + self.bias

            # 2) Вычисляем градиент ошибки по весам:
            #   dw = -(2/N) * X^T * (y - y_pred)
            dw = -(2 / N) * np.dot(X.T, (y - y_pred))

            #   Градиент ошибки по смещению:
            #   db = -(2/N) * сумма(y - y_pred)
            db = -(2 / N) * np.sum(y - y_pred)

            # 3) Обновляем веса и смещение:
            #   weight(t+1) = weight(t) - learning_rate * dw
            #   bias(t+1)   = bias(t)   - learning_rate * db
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db

    def predict(self, X):
        """
        Предсказание целевой переменной на основе обученной модели.
        :param X: матрица входных данных формы (N, d).
        :return: вектор предсказаний y_pred формы (N,).
        """
        return np.dot(X, self.weights) + self.bias

In [2]:
# Метрики (рассчитываем вручную)
def mean_squared_error(y_true, y_pred):
    """
    Среднеквадратичная ошибка (MSE)
    """
    return np.mean((y_true - y_pred) ** 2)

def mean_absolute_error(y_true, y_pred):
    """
    Средняя абсолютная ошибка (MAE)
    """
    return np.mean(np.abs(y_true - y_pred))

def r2_score(y_true, y_pred):
    """
    Коэффициент детерминации (R^2).
    Формула: R^2 = 1 - (SS_res / SS_tot),
    где SS_res = сумма (y_true - y_pred)^2,
        SS_tot = сумма (y_true - среднее(y_true))^2.
    :param y_true: вектор истинных значений
    :param y_pred: вектор предсказаний
    :return: одно число (float), значение R^2.
    """
    ss_res = np.sum((y_true - y_pred)**2)
    ss_tot = np.sum((y_true - np.mean(y_true))**2)
    return 1 - (ss_res / ss_tot)

def root_mean_squared_error(y_true, y_pred):
    """
    Корень из среднеквадратичной ошибки (RMSE)
    """
    return np.sqrt(mean_squared_error(y_true, y_pred))

In [3]:
if __name__ == "__main__":
    np.random.seed(42)

    # Генерируем синтетические данные:
    # Предположим, у нас один признак (d=1), N=100 выборок
    X = 2 * np.random.rand(100, 2)  # X.shape = (100, 1)
    y = 4 + 3 * X[:, 0] + np.random.randn(100)  # y.shape = (100,)

    # Создаём объект модели LinearRegressionScratch
    model = LinearRegressionScratch(learning_rate=0.1, n_iterations=1000)

    # Обучаем модель
    model.fit(X, y)

    # Делаем предсказания на обучающем наборе
    y_pred = model.predict(X)

    # Вычисляем метрики
    mse = mean_squared_error(y, y_pred)
    rmse = root_mean_squared_error(y, y_pred)
    mae = mean_absolute_error(y, y_pred)
    r2 = r2_score(y, y_pred)

    # Выводим результаты
    print("Значения весов (weights):", model.weights)
    print("Смещение (bias):", model.bias)
    print("MSE:", mse)
    print("RMSE:", rmse)
    print("MAE:", mae)
    print("R^2:", r2)

Значения весов (weights): [3.16933339 0.17747302]
Смещение (bias): 3.7722722633939476
MSE: 0.9813829922788829
RMSE: 0.9906477639801561
MAE: 0.7886765205572707
R^2: 0.7907056161217358
