# **Equation of a line:**

> ![](https://cdn1.byjus.com/wp-content/uploads/2021/02/Line-Segment-4.png)

A line is the shortest path between two points, extended infinitely in both directions.

Line has a constant **`slope`** (rate of change) everywhere.

**Given two points:** $P₁ = (x₁, y₁)$ and $P₂ = (x₂, y₂)$

Then, slope defines how much $y$ changes for each unit change in $x$.

> **slope** = **rise/run** = $\frac{y_2 - y_1}{x_2 - x_1}$

> ![](https://i.ytimg.com/vi/jlkE4VCnhdE/maxresdefault.jpg)

**Let's call this:** 
> $$m = \frac{y_2 - y_1}{x_2 - x_1}$$

**What this means:** If I move $1$ unit to the right $(Δx = 1)$, I move $m$ units up $(Δy = m)$.

> ![](https://www.mathplanet.com/Oldsite/media/38362/slope01.png)

#### **Equation for ANY Point on the Line:**

For ANY point $(x, y)$ on the line, the slope from $P₁$ to that point must equal $m$.

Using point $P₁ = (x₁, y₁)$ and a general point $P = (x, y)$:

> $$\frac{y - y_1}{x - x_1} = m$$

The slope from $P₁$ to any point $P$ on the line equals the line's slope. 

Cross-multiply:

> $y - y_1 = m(x - x_1)$

The change in $y$ equals the slope times the change in $x$.

Expand the right side:

> $y - y_1 = m x - m x_1$  

Add $y_1$ to both sides:

> $y = m x - m x_1 + y_1$

**Rearrange:**

> $y = m x + (y_1 - m x_1)$ 

#### **Define the Intercept:** 

The term $(y₁ - mx₁)$ is a constant. Let's call it **$b$**:

> $b = y₁ - mx₁$  

**What is $b$ here?** It's the $y$-value where the line crosses the y-axis (when $x = 0$):
- When $x = 0$: $y = m(0) + b = b$

**Physical meaning:** $b$ is the `"starting height"` of the line when $x = 0$.

> ![](https://media.geeksforgeeks.org/wp-content/uploads/20230529095322/X-and-Y-Intercepts-of-a-Line.png)

**The Famous Form:**

> $y = mx + b$ 

Where:
- **$m$** = slope (how steep the line is)
- **$b$** = y-intercept (where the line crosses the y-axis)

**Alternative notation (commonly used in machine learning):**

> $y = wx + b$ 

Where **$w$** stands for `"weight"` instead of $m$ for `"slope"` (same concept, different name).

> ![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRvTMau2WDuwaxNOVKmoptXK7VMyGpbBkIuqA&s)

#### **From $1D$ Input to $nD$ Input:**

**Current equation:** $y = wx + b$ (one input $x$, one output $y$)

What if we have multiple inputs: $x₁, x₂, x₃, ..., xₙ$?

If each input contributes independently to the output then,  

> $$y = w_1 x_1 + w_2 x_2 + w_3 x_3 + \cdots + w_n x_n + b$$
> $$y = \sum_{i=1}^{n} w_i x_i + b$$

Each input $xᵢ$ has its own weight $wᵢ$ that determines how much it contributes to $y$. Then we add the intercept $b$.

-----
----
-----

## **Equation of Line in Vector Form:**

We already know the equation of a line:

> $y = wx + b$ 

Here:

* ($x$) → input value
* ($w$) → slope (how strongly ($x$) affects ($y$))
* ($b$) → intercept (baseline shift)

This is a **`scalar equation`** (single input, single weight).

Real problems rarely depend on just one variable. Suppose the output depends on **`two inputs`**:

> $y = w_1 x_1 + w_2 x_2 + b$ 

Conceptually:   
   * Each input contributes independently
   * Each contribution is scaled by its own importance

This is already a **`linear combination`**.

#### **Generalize to ($n$) inputs (explicit summation):**

For ($n$) inputs:

> $y = w_1 x_1 + w_2 x_2 + \dots + w_n x_n + b$ 

This can be written compactly using summation notation:

> $y = \sum_{i=1}^{n} w_i x_i + b$

At this point:

* ($x_i$) = components of the input
* ($w_i$) = corresponding weights

#### **Introduce vectors (grouping related quantities):**

Instead of treating each input separately, we **group them into vectors**.

Define the **input vector**:

> $\mathbf{x} =
\begin{bmatrix}
x_1 \\
x_2 \\
\vdots \\
x_n
\end{bmatrix}$ 

Define the **weight vector**:

> $\mathbf{w} =
\begin{bmatrix}
w_1 \\
w_2 \\
\vdots \\
w_n
\end{bmatrix}$ 

Conceptually:     
   * A vector represents a **point or direction** in space
   * Grouping allows geometric interpretation

#### **Recognize the dot product:**

The dot product of two vectors is defined as:

> $\mathbf{w}^\top \mathbf{x} = \sum_{i=1}^{n} w_i x_i$ 

So the summation we already had **`is exactly a dot product`**.

This is not a trick — it’s a definition.

#### **Replace the summation with dot product notation:**

Substitute into the equation:

> $y = \mathbf{w}^\top \mathbf{x} + b$ 

This is the **vector form** of:

> $y = wx + b$

#### **Why we write ($w^\top x$) and not ($wx$):**

Vectors by default are **`columns`**.   
   * ($\mathbf{x}) is (n \times 1$)
   * ($\mathbf{w}) is (n \times 1$)

**To multiply them:**   
   * ($\mathbf{w}^\top) becomes (1 \times n$)
   * ($\mathbf{w}^\top \mathbf{x}$) becomes a scalar

This preserves:   
   * Dimensional correctness
   * Mathematical consistency

#### **Geometric meaning of ( $\mathbf{w}^\top \mathbf{x}$ ):**

The dot product measures:

> $\mathbf{w}^\top \mathbf{x} = |\mathbf{w}| |\mathbf{x}| \cos \theta$ 

So it captures:   
   * Alignment between input and weight direction
   * Strength of match

In neural terms:   
> “How well does this input match what the neuron is looking for?”

#### **Role of the bias ($b$):**

**The bias:**   
   * Shifts the output up or down
   * Moves the decision boundary away from the origin

Geometrically:   
> $\mathbf{w}^\top \mathbf{x} + b = 0$ 

defines a **shifted hyperplane**.

---------------
----
----

## **Auggmentation:**

We begin with the standard linear model:

> $y = \mathbf{w}^\top \mathbf{x} + b$ 
 
where:
- $\mathbf{w} = [w_1, w_2, \dots, w_n]^\top$ is the weight column vector
- $\mathbf{x} = [x_1, x_2, \dots, x_n]^\top$  is the input column vector
- $b$ is the bias term (scalar)

Write out the dot product explicitly:

> $y = w_1 x_1 + w_2 x_2 + \cdots + w_n x_n + b$ 

We can treat the bias $b$ as an additional weight $w_0$ if we add a corresponding input feature with a constant value of $1$. Let's define:

> $w_0 = b$ 

Now, instead of putting the $1$ at the end, we'll put it **at the beginning**:

**Augmented Weight Vector:**

> $\tilde{\mathbf{w}} = \begin{bmatrix} w_0 \\ w_1 \\ w_2 \\ \vdots \\ w_n \end{bmatrix}$ 

where $w_0 = b$

**Augmented Input Vector:**

> $\tilde{\mathbf{x}} = \begin{bmatrix} 1 \\ x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}$ 

Now the equation becomes:

> $y = \tilde{\mathbf{w}}^\top \tilde{\mathbf{x}}$ 

**Let's verify this:**

> $\tilde{\mathbf{w}}^\top \tilde{\mathbf{x}} = [w_0, w_1, w_2, \dots, w_n] \begin{bmatrix} 1 \\ x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}$ 
> 
> $= w_0 \cdot 1 + w_1 x_1 + w_2 x_2 + \cdots + w_n x_n$ 
> 
> $= b \cdot 1 + w_1 x_1 + w_2 x_2 + \cdots + w_n x_n$ 
> 
> $= w_1 x_1 + w_2 x_2 + \cdots + w_n x_n + b$ 

Which matches our original equation!

#### **Original Form:**

> $y = \mathbf{w}^\top \mathbf{x} + b$ 
> 
> $\mathbf{w} = \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ w_n \end{bmatrix}, \quad \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}$ 

**Augmented Form (with 1 at beginning):**

> $y = \tilde{\mathbf{w}}^\top \tilde{\mathbf{x}}$ 

> $\tilde{\mathbf{w}} = \begin{bmatrix} w_0 \\ w_1 \\ w_2 \\ \vdots \\ w_n \end{bmatrix} = \begin{bmatrix} b \\ w_1 \\ w_2 \\ \vdots \\ w_n \end{bmatrix}, \quad \tilde{\mathbf{x}} = \begin{bmatrix} 1 \\ x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}$ 

#### **Key Advantages:**
1. **`Unified representation`**: The bias is now just another weight

2. **`Simplified notation`**: No separate "+ b" term

3. **`Matrix operations`**: Easier to work with in linear algebra and optimization

4. **`Consistency`**: All parameters are in one vector

5. **`Gradient computation`**: Derivatives become cleaner in machine learning

This augmented form is used in machine learning implementations, where we prepend a column of 1's to the data matrix $X$ to absorb the bias term into the weight vector.

> **Vector notation expresses a linear function as a dot product between input and weight vectors, allowing a single equation to scale from simple lines to high-dimensional decision boundaries.**