In [None]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Geometric Interpretation of Linear Regression\n",
    "\n",
    "This section presents a geometric interpretation of linear regression, showing that the optimal prediction vector is obtained by projecting the target vector $y$ onto the column space of the design matrix $X$. This viewpoint explains the normal equation as a consequence of orthogonality between the residual vector and the feature space.\n",
    "\n",
    "## Vector Space Setup\n",
    "\n",
    "## Column Space of the Design Matrix\n",
    "\n",
    "The column space of the design matrix $X$, denoted as $\\text{Col}(X)$, is the subspace of $\\mathbb{R}^m$ spanned by the columns of $X$. Any vector in $\\text{Col}(X)$ can be written as a linear combination of the columns of $X$.\n",
    "\n",
    "For any parameter vector $\\theta$, the predicted response\n",
    "\n",
    "$$\\hat{y} = X\\theta$$\n",
    "\n",
    "lies in the column space of $X$. Therefore, linear regression restricts predictions to vectors that belong to $\\text{Col}(X)$.\n",
    "\n",
    "## Projection of the Target Vector\n",
    "\n",
    "In general, the observed target vector $y$ does not lie in the column space of $X$. Linear regression therefore seeks a vector $\\hat{y} \\in \\text{Col}(X)$ that is closest to $y$ in the Euclidean sense.\n",
    "\n",
    "This closest vector is obtained by orthogonally projecting $y$ onto the column space of $X$. The resulting vector $\\hat{y} = X\\theta$ represents the best linear approximation to $y$ within $\\text{Col}(X)$.\n",
    "\n",
    "## Orthogonality of the Residual\n",
    "\n",
    "Let the residual vector be defined as\n",
    "\n",
    "$$r = y - \\hat{y} = y - X\\theta.$$\n",
    "\n",
    "A fundamental property of orthogonal projection is that the residual vector is orthogonal to the subspace onto which the projection is made. Therefore, the residual $r$ is orthogonal to the column space of $X$:\n",
    "\n",
    "$$X^\\top r = 0.$$\n",
    "\n",
    "## Derivation of the Normal Equation\n",
    "\n",
    "From the orthogonality condition\n",
    "\n",
    "$$X^\\top r = 0,$$\n",
    "\n",
    "and using the definition of the residual $r = y - X\\theta$, we obtain\n",
    "\n",
    "$$X^\\top (y - X\\theta) = 0.$$\n",
    "\n",
    "Expanding the expression gives\n",
    "\n",
    "$$X^\\top y - X^\\top X\\theta = 0.$$\n",
    "\n",
    "Rearranging terms yields the normal equation\n",
    "\n",
    "$$X^\\top X\\theta = X^\\top y.$$"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.x"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}