### Vector operations

#### Addition and subtraction
1. Let $\vec{u} = (u_1, u_2)$ and $\vec{v} = (v_1, v_2)$, then $\vec{u} + \vec{v} = (u_1 + v_1,\, u_2 + v_2)$, $\vec{u} - \vec{v} = (u_1 - v_1,\, u_2 - v_2)$

<img src="./addition_subtraction.gif">

#### Dot product (scalar product, inner product)
1. The dot product gives a number as an answer (a 'scalar', not a vector).
2. The dot product is written using a central dot:
$$
\vec{a} \bullet \vec{b}
$$
3. We can calculate the dot product of two vectors in this way:
$$
\vec{a} \bullet \vec{b} = |\vec{a}| |\vec{b}| cos(\theta)
$$

<img src="./dot_product.gif">

$$
|\vec{a}| \text{ is the magnitude (length) of vector a} \\
|\vec{b}| \text{ is the magnitude (length) of vector b} \\
\theta \text{ is the angle between a and b}
$$

#### Cross product
1. The cross product of $\vec{a} \times \vec{b}$ is another vector that is at right angles to both:
<img src="cross_product_0.gif">
<center>And it all happens in 3 dimensions!</center>
    
2. We can calculate the cross product in this way:
$$
\vec{a} \times \vec{b} = |\vec{a}| |\vec{b}| sin(\theta) \vec{n}
$$
<img src="cross_product_1.gif">
$$
|\vec{a}| \text{ is the magnitude (length) of vector a} \\
|\vec{b}| \text{ is the magnitude (length) of vector b} \\
\theta \text{ is the angle between vector a and vector b} \\
\vec{n} \text{ is the unit vector at right angles to both vector a and vector b}
$$

3. Or we can calculate the cross product in another way:
$$
\vec{a} = (a_x, a_y, a_z) \\
\vec{b} = (b_x, b_y, b_z)
$$
<img src="cross_product_2.jpeg" width="50%" height="auto">
$$
\vec{c} = \vec{a} \times \vec{b} = (a_yb_z - a_zb_y,\, a_zb_x - a_xb_z,\, a_xb_y - a_yb_x)
$$

    **Question: how do we extend this to the cross product of a four dimensional vector or more higher, like the right part of the above graph?**


4. Which direction?

    The cross product could point in the completely opposite direction and still be at right angles to the two other vectors, so we have the **"Right Hand Rule"**:
  
        With your right-hand, point your index finger along vector a, and point your middle finger along vector b: the cross product goes in the direction of your thumb.
    <img src="right_hand_rule.jpg">

### Normal equation

#### Lets use the same sample dataset from my another post: [Multivariable linear regression(gradient descent)](https://lnshi.github.io/ml-exercises/ml_basics_in_html/rdm001_multivariable_linear_regression_gradient_descent/multivariable_linear_regression_gradient_descent.html#Lets-say-we-have-sample-data-set:) to reveal the normal equation first:

1. We still try to find the below fitting equation to minimise the $\sum\limits_{i=1}^n\varepsilon_i^2$
  
    $$
    y_\theta(x_1, x_2, \dots, x_m) = \
    \theta \begin{pmatrix}1 & x_1 & x_2 & \dots & x_m\end{pmatrix}, \,
    \theta = \begin{pmatrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_m \end{pmatrix}
    \\
    $$

2. Normal equation:
    
    $$
    \text{Let matrix }X = \
    \begin{pmatrix}
      1 & x_1^{(1)} & x_2^{(1)} & \dots & x_m^{(1)} \\
      1 & x_1^{(2)} & x_2^{(2)} & \dots & x_m^{(2)} \\
      \vdots \\
      1 & x_1^{(n)} & x_2^{(n)} & \dots & x_m^{(n)}
    \end{pmatrix}, \,
    \text{and matrix } y = \
    \begin{pmatrix}
      y^{(1)} \\
      y^{(2)} \\
      \vdots \\
      y^{(n)}
    \end{pmatrix}
    $$
    
    $$
    \text{then matrix } \theta = (X^TX)^{-1}X^Ty
    $$
    
#### How do we get the normal equation?

1. Lets see one simplest example in $R^2$ space:
    
    Example 1: kike below figure, there are two vectors in $R^2$ space, try to find out a constant $\theta$ to make $\theta\vec{a} = \vec{b}$.
    <img src="normal_equation_0.jpg">
      
    Clearly there is no **perfect solution**, coz in $R^2$ space, vector a and b they are non-collinear.
      
      
2. Lets see another example in $R^3$ space:
  
    Example 2: like below figure, there are thres vectors in $R^3$ space, try to find out a combination of $\theta_1 \text{ and } \theta_2$ to make $\theta_1\vec{a_1} + \theta_2\vec{a_2} = \vec{b}$.
    <img src="normal_equation_1.jpg">
      
    Clearly this one has also no **perfection solution**, coz vector b is not in the plane which is decided by vector a and b.
      
      
3. In reality, nearly all cases will be like above two cases, there is no **perfect solution**, but how do we find a **best solution** to minimise the errors? Just like in 'multivariable linear regression' we are trying to find out the fiting equation to minisize the $\sum\limits_{i=1}^n\varepsilon_i^2$.
  
4. Projection
  
    In above 'Example 1', the best solution is: we leave the $\vec{b}$'s component which is vertical to $\vec{a}$ alone, only consider its component which has same direction with $\vec{a}$ ($\vec{b}$'s vertical projection on $\vec{a}$), that is: $\vec{p} = \theta^*\vec{a}$, indicated in below figure:
    <img src="normal_equation_2.jpg">
      
    Then the original problem $\theta\vec{a} = \vec{b}$ is converted to find a $\theta^*$ to make $\theta^*\vec{a} = \vec{p}$ ( **$\theta^*$ is the best estimator of $\theta$** ).
      
    Since $\vec{e} \perp \vec{a}$, then:
      
    $$
    \begin{align*}
      &\vec{a} \bullet (\vec{b} - \vec{p}) = 0 \\
      &\Rightarrow \vec{a} \bullet (\vec{b} - \theta^*\vec{a}) = 0 \\
      &\Rightarrow \vec{a}^T \bullet (\vec{b} - \theta^*\vec{a}) = 0 \text{ (use }\vec{a}^T \text{so it can be better extended to a higher dimensional matrix)} \\
      &\Rightarrow \vec{a}^T \bullet \vec{b} = \theta^*\vec{a}^T \bullet \vec{a} \\
      &\Rightarrow \theta^* = \frac{\vec{a}^T \bullet \vec{b}}{\vec{a}^T \bullet \vec{a}}
    \end{align*}
    $$
      
    Lets see for above 'Example 2' how do we extend the theory we just got:
      
    In above 'Example 2', the best solution is: we leave $\vec{b}$'s component which is vertical to plane P alone, only consider its component which is inside plane P ( $\vec{b}$'s vertical projection on plane P ), that is:
      
    $$
    \vec{p} = \theta^*\begin{pmatrix}\vec{a_1} & \vec{a_2}\end{pmatrix}, \, \
    \theta^* = \
    \begin{pmatrix}
      \theta_1^* \\
      \theta_2^*
    \end{pmatrix}
    \quad
    (\, \text{that is: }\vec{p} = \theta_1^*\vec{a_1} + \theta_2^*\vec{a_2} \,)
    $$
      
    indicated in below figure:
    <img src="normal_equation_3.jpg">
      
    Then the original problem $\theta_1\vec{a_1} + \theta_2\vec{a_2} = \vec{b}$ is converted to find a $\theta^* = \begin{pmatrix}\theta_1^* \\ \theta_2^*\end{pmatrix}$ to make $\vec{p} = \theta^*\begin{pmatrix}\vec{a_1} & \vec{a_2}\end{pmatrix}$ ( **$\theta^*$ is the best estimator of $\theta$** ).
      
    ***
    ***
    <center>Lets verify some very basic stuff</center>
      
    Lets say in $R^3$ space we have three basis vectors $\vec{a_1} = (a_{1x}, a_{1y}, a_{1z})$, $\vec{a_2} = (a_{2x}, a_{2y}, a_{2z})$ and $\vec{a_3} = (a_{3x}, a_{3y}, a_{3z})$, and a combination of constants $\theta_1$, $\theta_2$ and $\theta_3$ to make: $\vec{p} = \theta_1\vec{a_1} + \theta_2\vec{a_2} + \theta_3\vec{a_3}$.
      
    Most straightforward calculation: 
    $$
    \begin{align*}
      \vec{p} &= \theta_1\vec{a_1} + \theta_2\vec{a_2} + \theta_3\vec{a_3} \\
      &= \theta_1(a_{1x}, a_{1y}, a_{1z}) + \theta_2(a_{2x}, a_{2y}, a_{2z}) + \theta_3(a_{3x}, a_{3y}, a_{3z}) \\
      &= (\theta_1a_{1x} + \theta_2a_{2x} + \theta_3a_{3x}, \theta_1a_{1y} + \theta_2a_{2y} + \theta_3a_{3y}, \theta_1a_{1z} + \theta_2a_{2z} + \theta_3a_{3z})
    \end{align*}
    $$
      
    Matrix way:
    $$
    \text{Let matrix }A = \
    \begin{pmatrix}
      | & | & | \\
      \vec{a_1} & \vec{a_2} & \vec{a_3} \\
      | & | & |
    \end{pmatrix}
    = \begin{pmatrix}
      a_{1x} & a_{2x} & a_{3x} \\
      a_{1y} & a_{2y} & a_{3y} \\
      a_{1z} & a_{2z} & a_{3z}
    \end{pmatrix}, \,
    \text{and matrix }\theta = \
    \begin{pmatrix}
      \theta_1 \\
      \theta_2 \\
      \theta_3
    \end{pmatrix}
    $$
      
    $$
    \begin{align*}
      \text{then }\vec{p} &= A\theta \\
      &= (\theta_1a_{1x} + \theta_2a_{2x} + \theta_3a_{3x}, \theta_1a_{1y} + \theta_2a_{2y} + \theta_3a_{3y}, \theta_1a_{1z} + \theta_2a_{2z} + \theta_3a_{3z})
    \end{align*}
    $$
      
    ***
    ***
      
    Lets continue to extend the theory we got from 'Example 1' $R^2$ to 'Example 2' $R^3$:
    
    $$
    \text{Let matrix }A = \begin{pmatrix}| & | \\ \vec{a_1} & \vec{a_2} \\ | & |\end{pmatrix}, \, \
    \text{and matrix}\theta^* = \begin{pmatrix}\theta_1^* \\ \theta_2^*\end{pmatrix} \
    \quad \
    (\text{maybe you already noticed: }|A\theta^* - \vec{b}|^2 \text{ is the }\sum\limits_{i=1}^n\varepsilon_i^2 \text{ we tried to minisize in the least square method})
    $$
      
    We find the $\theta\vec{a} = \vec{b}$ in $R^2$ now in $R^3$ is extended to $A\vec{\theta} = \vec{b}$;
    
    And correspondingly the $\theta^*\vec{a} = \vec{p}$ in $R^2$ now in $R^3$ is extended to $A\vec{\theta^*} = \vec{p}$;
      
    Since $\vec{e} \perp \vec{p}$, then:
    
    $$
    \begin{cases}
      a_1^T(\vec{b} - A\vec{\theta^*}) = 0 \\
      a_2^T(\vec{b} - A\vec{\theta^*}) = 0
    \end{cases}
    $$
    
    $$
    \begin{align*}
      &\Rightarrow \begin{pmatrix}- a_1^T - \\ - a_2^T -\end{pmatrix}(\vec{b} - A\vec{\theta^*}) = 0 \\
      &\Rightarrow A^T(\vec{b} - A\vec{\theta^*}) = 0 \\
      &\text{ (remember in }R^2 \text{ we had }\vec{a}^T(\vec{b} - \theta^*\vec{a}) = 0, \text{ it is just a special case of current one: in }R^2 \text{ we treat }\vec{a} \text{ as a matrix which only has one column)} \\
      &\Rightarrow A^T\vec{b} = A^TA\vec{\theta^*} \\
      &\Rightarrow \vec{\theta^*} = (A^TA)^{-1}A^T\vec{b}
    \end{align*}
    $$
    
    We get the result just like we revealed previously!
    
    And since we cannot guarantee the matrix A is always square matrix, so we cannot always simplify the result to $A^{-1}\vec{b}$.
      

### Credit to [掰开揉碎推导Normal Equation](https://zhuanlan.zhihu.com/p/22757336)