## Unscaling Regression Coefficients

The matrix relationship between the coefficients and the dependent variable can be expressed using the equation below:
$$
\mathbf{Y} = \mathbf{X}\mathbf{w}
$$
For purposes of simplicity, in the subsequent analysis, we use a two-predictor two-record dataset to derive the relationship between the scaled and unscaled coefficients.
This means $X$ can be expressed as:
$$
X = \begin{bmatrix} x_{11} & x_{12}\\
x_{21} & x_{22}
\end{bmatrix}
$$

With the intercept subsumed into the matrix, we can introduce a column of ones into the $X$ matrix to have the below:
$$
X = \begin{bmatrix} 1 & x_{11} & x_{12}\\
1 & x_{21} & x_{22}
\end{bmatrix}
$$
With the unscaled coefficients introduced, we get:
$$
\begin{bmatrix} 
1 & x_{11} & x_{12}\\
1 & x_{21} & x_{22}
\end{bmatrix}\begin{bmatrix} 
w_{0}\\ 
w_{1}\\
w_{2}
\end{bmatrix} = 
\begin{bmatrix} 
y_{1}\\
y_{2}
\end{bmatrix}
$$ 
$w_{0}$ is the intercept, while $w_{1}$ and $w_{2}$ are the coefficients corresponding to the first and second predictors of $X$. This results in the equation below showing the relationship between the unscaled predictors, $w$, the dependent variable, $Y$, and the independent variables, $X$.

The equation is provided below:

$$
y_{1} = w_{0} + x_{11}w_{1} + x_{12}w_{2}
$$ 
$$
y_{2} = w_{0} + x_{21}w_{1} + x_{22}w_{2}
$$

To scale the $\mathbf{X}$ matrix, we subtract the column means from each column and divide by the column standard deviation. This way, we are centering and scaling.

The scaled matrix of $X_{s}$ is represented as:
$$
X_{s} = \begin{bmatrix} \frac{x_{11} - \mu_{1}}{\sigma_{1}} & \frac{x_{12} - \mu_{2}}{\sigma_{2}}\\
\frac{x_{21} - \mu_{1}}{\sigma_{1}} & \frac{x_{22} - \mu_{2}}{\sigma_{2}}
\end{bmatrix}
$$

When we subsume the intercept into the matrix like earlier, we obtain the equation below, and are ready to obtain new scaled coefficients which when multiplied by the $X_{s}$ matrix should produce the same dependent variables.
$$
X_{s} = \begin{bmatrix} 1 & \frac{x_{11} - \mu_{1}}{\sigma_{1}} & \frac{x_{12} - \mu_{2}}{\sigma_{2}}\\
1 & \frac{x_{21} - \mu_{1}}{\sigma_{1}} & \frac{x_{22} - \mu_{2}}{\sigma_{2}}
\end{bmatrix}
$$

$$
\begin{bmatrix} 1 & \frac{x_{11} - \mu_{1}}{\sigma_{1}} & \frac{x_{12} - \mu_{2}}{\sigma_{2}}\\
1 & \frac{x_{21} - \mu_{1}}{\sigma_{1}} & \frac{x_{22} - \mu_{2}}{\sigma_{2}}
\end{bmatrix}\begin{bmatrix} 
v_{0}\\ 
v_{1}\\
v_{2}
\end{bmatrix} = 
\begin{bmatrix} 
y_{1}\\
y_{2}
\end{bmatrix}
$$

This evaluates to:
$$
y_{1} = v_{0} + \frac{x_{11} - \mu_{1}}{\sigma_{1}}v_{1} + \frac{x_{12} - \mu_{2}}{\sigma_{2}}v_{2}
$$


$$
y_{2} = v_{0} + \frac{x_{21} - \mu_{1}}{\sigma_{1}}v_{1} + \frac{x_{22} - \mu_{2}}{\sigma_{2}}v_{2}
$$

When simplified, we obtain:
$$
y_{1} = (v_{0} - \frac{\mu_{1}}{\sigma_{1}}v_{1} - \frac{\mu_{2}}{\sigma_{2}}v_{2}) + \frac{v_{1}}{\sigma_{1}}x_{11} + \frac{v_{2}}{\sigma_{2}}x_{12}
$$

$$
y_{2} = (v_{0} - \frac{\mu_{1}}{\sigma_{1}}v_{1} - \frac{\mu_{2}}{\sigma_{2}}v_{2}) + \frac{v_{1}}{\sigma_{1}}x_{21} + \frac{v_{2}}{\sigma_{2}}x_{22}
$$



This can be expressed in matrix form as:

$$
\begin{bmatrix} 1 & x_{11} & x_{12}\\
1 & x_{21} & x_{22}
\end{bmatrix}\begin{bmatrix} 
v_{0} - \frac{\mu_{1}}{\sigma_{1}}v_{1} - \frac{\mu_{2}}{\sigma_{2}}v_{2}\\ 
\frac{v_{1}}{\sigma_{1}}\\
\frac{v_{2}}{\sigma_{2}}
\end{bmatrix} = 
\begin{bmatrix} 
y_{1}\\
y_{2}
\end{bmatrix}
$$

So, when the matrix relationship is transformed, the unscaled coefficients can expressed in terms of the scaled coefficients. This results in:

$$
w_{0} = v_{0} - (\frac{\mu_{1}}{\sigma_{1}}v_{1} + \frac{\mu_{2}}{\sigma_{2}}v_{2})
$$

$$
w_{1} = \frac{v_{1}}{\sigma_{1}}
$$

$$
w_{2} = \frac{v_{2}}{\sigma_{2}}
$$


Or alternatively, with $n$ predictors, in:

$$
w_{0} = v_{0} - \begin{bmatrix}\frac{\mu_{1}}{\sigma_{1}} & \frac{\mu_{2}}{\sigma_{2}}
& \frac{\mu_{3}}{\sigma_{3}} & \dots & \frac{\mu_{n}}{\sigma_{n}} \end{bmatrix}\begin{bmatrix}v_{1}\\
v_{2}\\
v_{3}\\
\vdots \\
v_{n}
\end{bmatrix} 
$$

$$
\begin{bmatrix}
w_{1}\\
w_{2}\\
w_{3}\\
\vdots \\
w_{n}
\end{bmatrix} = \begin{bmatrix}
\frac{v_{1}}{\sigma_{1}} \\
\frac{v_{2}}{\sigma_{2}} \\
\frac{v_{3}}{\sigma_{3}} \\
\vdots \\
\frac{v_{n}}{\sigma_{n}}
\end{bmatrix}
$$
So, now that we have a relationship between the scaled and unscaled coefficients, we can proceed to obtain the unscaled coefficients from the later output of, say, a PCA analysis. This relationship helps us with transfroming the scaled coefficients into their unscaled version.