In [3]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

style_name = 'bmh' #bmh
mpl.style.use(style_name)
np.set_printoptions(precision=4, linewidth =150)

style = plt.style.library[style_name]
style_colors = [ c['color'] for c in style['axes.prop_cycle'] ]

print(style_colors)

['#348ABD', '#A60628', '#7A68A6', '#467821', '#D55E00', '#CC79A7', '#56B4E9', '#009E73', '#F0E442', '#0072B2']


# 벡터, 행렬에 대한 미분
<hr/>



## 스칼라-벡터, 스칼라-행렬, 벡터-벡터간의 미분

### 스칼라를 스칼라로 미분

- 가장 일반적인 경우로 따로 설명 필요없음

### 스칼라를 벡터로 미분

- $\mathbf{x}: n \times 1$

- 분자레이아웃<sup>numerator layout</sup>

$$
\frac{\partial \, x}{\partial \, \mathbf{x}} = \begin{bmatrix}
\dfrac{\partial x}{\partial x_{1}} & \dfrac{\partial x}{\partial x_{2}} & \cdots & \dfrac{\partial x}{\partial x_{n}}
\end{bmatrix}
$$

- 분모레이아웃<sup>denominator layout</sup>

$$
\frac{\partial \, x}{\partial \, \mathbf{x}} = \begin{bmatrix}
\dfrac{\partial x}{\partial x_{1}} \\
\dfrac{\partial x}{\partial x_{2}} \\
\vdots \\
\dfrac{\partial x}{\partial x_{n}} \\
\end{bmatrix}
$$

- 분모, 분자 레이아웃은 쓰는 사람 마음

### 스칼라를 행렬로 미분

-  $\mathbf{X} : m \times n$

$$
\frac{\partial \, x}{\partial \,  \mathbf{X}} =
\begin{bmatrix}
\dfrac{\partial x}{\partial X_{11}} & \dfrac{\partial x}{\partial X_{12}} & \cdots & \dfrac{\partial x}{\partial X_{1n}} \\
\dfrac{\partial x}{\partial X_{21}} & \dfrac{\partial x}{\partial X_{22}} & \cdots & \dfrac{\partial x}{\partial X_{2n}} \\
\vdots                              & \vdots                              & \ddots & \vdots                              \\
\dfrac{\partial x}{\partial X_{m1}} & \dfrac{\partial x}{\partial X_{m2}} & \cdots & \dfrac{\partial x}{\partial X_{mn}}
\end{bmatrix}
$$


### 벡터를 스칼라로 미분

- 분자레이아웃

$$
\frac{\partial \, \mathbf{x}}{\partial \, x} =
\begin{bmatrix}
\dfrac{\partial x_{1}}{\partial x} \\
\dfrac{\partial x_{2}}{\partial x} \\
\vdots \\
\dfrac{\partial x_{n}}{\partial x} 
\end{bmatrix}
$$

- 분모레이아웃

$$
\frac{\partial \, \mathbf{x}}{\partial \, x} =
\begin{bmatrix}
\dfrac{\partial x_{1}}{\partial x} &
\dfrac{\partial x_{2}}{\partial x} &
\cdots &
\dfrac{\partial x_{n}}{\partial x} 
\end{bmatrix}
$$

### 벡터를 벡터로 미분

- $\mathbf{f} : m \times 1$, $\mathbf{x} : n \times 1$

- 분자레이아웃(야코비안과 같은 경우)

$$
\frac{\partial \, \mathbf{f}}{\partial \, \mathbf{x}} =
\begin{bmatrix}
\dfrac{\partial f_{1}}{\partial x_{1}} & \dfrac{\partial f_{1}}{\partial x_{2}} & \cdots & \dfrac{\partial f_{1}}{\partial x_{n}} \\
\dfrac{\partial f_{2}}{\partial x_{1}} & \dfrac{\partial f_{2}}{\partial x_{2}} & \cdots & \dfrac{\partial f_{2}}{\partial x_{n}} \\
\vdots                                 & \vdots                                 & \ddots & \vdots                                 \\
\dfrac{\partial f_{m}}{\partial x_{1}} & \dfrac{\partial f_{m}}{\partial x_{2}} & \cdots & \dfrac{\partial f_{m}}{\partial x_{n}}
\end{bmatrix}
$$

- 분모레이아웃

$$
\frac{\partial \, \mathbf{f}}{\partial \, \mathbf{x}} =
\begin{bmatrix}
\dfrac{\partial f_{1}}{\partial x_{1}} & \dfrac{\partial f_{2}}{\partial x_{1}} & \cdots & \dfrac{\partial f_{m}}{\partial x_{1}} \\
\dfrac{\partial f_{1}}{\partial x_{2}} & \dfrac{\partial f_{2}}{\partial x_{2}} & \cdots & \dfrac{\partial f_{m}}{\partial x_{2}} \\
\vdots                                 & \vdots                                 & \ddots & \vdots                                 \\
\dfrac{\partial f_{1}}{\partial x_{n}} & \dfrac{\partial f_{2}}{\partial x_{n}} & \cdots & \dfrac{\partial f_{m}}{\partial x_{n}}
\end{bmatrix}
$$



### 행렬을 스칼라로 미분

-  $\mathbf{X} : m \times n$

$$
\frac{\partial \, \mathbf{X}}{\partial \, x} =
\begin{bmatrix}
\dfrac{\partial X_{11}}{\partial x} & \dfrac{\partial X_{12}}{\partial x} & \cdots & \dfrac{\partial X_{1n}}{\partial x} \\
\dfrac{\partial X_{21}}{\partial x} & \dfrac{\partial X_{22}}{\partial x} & \cdots & \dfrac{\partial X_{2n}}{\partial x} \\
\vdots                              & \vdots                              & \ddots & \vdots                              \\
\dfrac{\partial X_{m1}}{\partial x} & \dfrac{\partial X_{m2}}{\partial x} & \cdots & \dfrac{\partial X_{mn}}{\partial x}
\end{bmatrix}
$$

## 새로운 연산자

- 벡터를 행렬로 미분하거나 행렬을 행렬로 미분하기위해 필요한 연산자를 정의

### 크로네커 곱<sup>Kronecker product[1]</sup>



- $\mathbf{A}: p \times q $와 $\mathbf{B}: r \times s $가 있을 때 이 두 행렬의 크로네커 곱은 다음과 같고 $pr \times qs$ 행렬이 된다.

$$
\mathbf{A} \otimes \mathbf{B} = \{ a_{ij}\mathbf{B} \}
$$

$\mathbf{A}$가 2 x 2 행렬이면 다음처럼 된다.

$$
\begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} \otimes \mathbf{B} = \begin{bmatrix} a_{11}\mathbf{B} & a_{12}\mathbf{B} \\ a_{21}\mathbf{B} & a_{22}\mathbf{B} \end{bmatrix}
$$

- 구체적인 예

$$
\mathbf{A} = \begin{bmatrix} \color{RoyalBlue}{3} & \color{OrangeRed}{5} \\ \color{YellowGreen}{9} & \color{Goldenrod}{7} \end{bmatrix} \qquad \text{and} \qquad \mathbf{b} = \begin{bmatrix} 4 \\ 5\\ 6 \end{bmatrix}
$$

$$
\mathbf{A} \otimes \mathbf{b} = \begin{bmatrix}
\color{RoyalBlue}{3 \cdot 4} & \color{OrangeRed}{5 \cdot 4} \\
\color{RoyalBlue}{3 \cdot 5} & \color{OrangeRed}{5 \cdot 5} \\
\color{RoyalBlue}{3 \cdot 6} & \color{OrangeRed}{5 \cdot 6} \\
\color{YellowGreen}{9 \cdot 4} & \color{Goldenrod}{7 \cdot 4} \\
\color{YellowGreen}{9 \cdot 5} & \color{Goldenrod}{7 \cdot 5} \\
\color{YellowGreen}{9 \cdot 6} & \color{Goldenrod}{7 \cdot 6} 
\end{bmatrix}=  \begin{bmatrix}
\color{RoyalBlue}{12} & \color{OrangeRed}{20} \\
\color{RoyalBlue}{15} & \color{OrangeRed}{25} \\
\color{RoyalBlue}{18} & \color{OrangeRed}{30} \\
\color{YellowGreen}{36} & \color{Goldenrod}{28} \\
\color{YellowGreen}{45} & \color{Goldenrod}{35} \\
\color{YellowGreen}{54} & \color{Goldenrod}{42} 
\end{bmatrix}
$$


### vec과 vec 전치<sup>[2]</sup>

- 행렬의 열벡터를 열방향으로 죽 늘어 세워 행렬을 벡터화 시키는 연산자

$$
\text{vec}\left( \begin{bmatrix} \color{RoyalBlue}{a_{11}} & \color{OrangeRed}{a_{12}} \\ \color{RoyalBlue}{a_{21}} & \color{OrangeRed}{a_{22}} \end{bmatrix} \right) = 
\begin{bmatrix} \color{RoyalBlue}{a_{11} \\ a_{21}} \\ \color{OrangeRed}{a_{12} \\ a_{22}} \end{bmatrix}
$$

- vec 전치 <sup>vec transpose</sup> : 전치의 일반화

$$
\begin{bmatrix}
\color{RoyalBlue}{a_{11}}   & \color{Goldenrod}{a_{12}} \\
\color{RoyalBlue}{a_{21}}   & \color{Goldenrod}{a_{22}} \\
\color{OrangeRed}{a_{31}}   & \color{Violet}{a_{32}} \\
\color{OrangeRed}{a_{41}}   & \color{Violet}{a_{42}} \\
\color{YellowGreen}{a_{51}} & \color{Emerald}{a_{52}} \\
\color{YellowGreen}{a_{61}} & \color{Emerald}{a_{62}} \\
\end{bmatrix}^{(2)} =
\begin{bmatrix}
\color{RoyalBlue}{a_{11}} & \color{OrangeRed}{a_{31}} & \color{YellowGreen}{a_{51}} \\
\color{RoyalBlue}{a_{21}} & \color{OrangeRed}{a_{41}} & \color{YellowGreen}{a_{61}} \\
\color{Goldenrod}{a_{12}} & \color{Violet}{a_{32}} &  \color{Emerald}{a_{52}} \\
\color{Goldenrod}{a_{22}} & \color{Violet}{a_{42}} &  \color{Emerald}{a_{62}}
\end{bmatrix}
$$

$$
\begin{bmatrix}
\color{RoyalBlue}{a_{11}}   & \color{Goldenrod}{a_{12}} \\
\color{RoyalBlue}{a_{21}}   & \color{Goldenrod}{a_{22}} \\
\color{RoyalBlue}{a_{31}}   & \color{Goldenrod}{a_{32}} \\
\color{OrangeRed}{a_{41}}   & \color{Violet}{a_{42}} \\
\color{OrangeRed}{a_{51}} & \color{Violet}{a_{52}} \\
\color{OrangeRed}{a_{61}} & \color{Violet}{a_{62}} \\
\end{bmatrix}^{(3)} =
\begin{bmatrix}
\color{RoyalBlue}{a_{11}} & \color{OrangeRed}{a_{41}} \\
\color{RoyalBlue}{a_{21}} & \color{OrangeRed}{a_{51}} \\
\color{RoyalBlue}{a_{31}} & \color{OrangeRed}{a_{61}} \\
\color{Goldenrod}{a_{12}} & \color{Violet}{a_{42}} \\
\color{Goldenrod}{a_{22}} & \color{Violet}{a_{52}} \\
\color{Goldenrod}{a_{32}} & \color{Violet}{a_{62}}
\end{bmatrix}
$$

- (1)전치는 일반적인 전치와 동일하게 됨 $\mathbf{A}^{(1)} = \mathbf{A}^{\text{T}}$

- (행개수)전치는 vec과 동일하게 됨 $\mathbf{A}^{(rows(\mathbf{A}))} = \text{vec}(\mathbf{A})$

- 전치에 들어갈 수 있는 숫자 $(r)$은 행개수를 나눌 수 있는 자연수

- 따라서 행벡터는 (1)전치만 성립

## 벡터를 행렬로 미분

- $\mathbf{x} : p \times 1$
- $\mathbf{X} : m \times n$
- 크로네커곱을 이용하여 벡터를 스칼라로 미분하게 한 다음 벡터를 분자레이아웃으로 미분

$$
\begin{align}
\frac{\partial \, \mathbf{x}}{\partial \, \mathbf{X}} = \frac{\partial}{\partial \, \mathbf{X}} \otimes \mathbf{x} 
&= \begin{bmatrix}
\dfrac{\partial}{\partial \, X_{11}} & \dfrac{\partial}{\partial \, X_{12}} & \cdots & \dfrac{\partial}{\partial \, X_{1n}} \\
\dfrac{\partial}{\partial \, X_{21}} & \dfrac{\partial}{\partial \, X_{22}} & \cdots & \dfrac{\partial}{\partial \, X_{2n}} \\
\vdots & \vdots & \ddots & \vdots \\
\dfrac{\partial}{\partial \, X_{m1}} & \dfrac{\partial}{\partial \, X_{m2}} & \cdots & \dfrac{\partial}{\partial \, X_{mn}}
\end{bmatrix} \otimes \mathbf{x} \\[5pt]
&=\begin{bmatrix}
\dfrac{\partial \, \mathbf{x}}{\partial \, X_{11}} & \dfrac{\partial \, \mathbf{x}}{\partial \, X_{12}} & \cdots & \dfrac{\partial \, \mathbf{x}}{\partial \, X_{1n}} \\
\dfrac{\partial \, \mathbf{x}}{\partial \, X_{21}} & \dfrac{\partial \, \mathbf{x}}{\partial \, X_{22}} & \cdots & \dfrac{\partial \, \mathbf{x}}{\partial \, X_{2n}} \\
\vdots & \vdots & \ddots & \vdots \\
\dfrac{\partial \, \mathbf{x}}{\partial \, X_{m1}} & \dfrac{\partial \, \mathbf{x}}{\partial \, X_{m2}} & \cdots & \dfrac{\partial \, \mathbf{x}}{\partial \, X_{mn}}
\end{bmatrix} \\[5pt]
&=\begin{bmatrix}
\begin{pmatrix} \dfrac{x_1}{\partial \, X_{11}} \\ \dfrac{x_2}{\partial \, X_{11}} \\ \vdots \\ \dfrac{x_p}{\partial \, X_{11}}  \end{pmatrix} & 
\begin{pmatrix} \dfrac{x_1}{\partial \, X_{12}} \\ \dfrac{x_2}{\partial \, X_{12}} \\ \vdots \\ \dfrac{x_p}{\partial \, X_{12}}  \end{pmatrix} &
\cdots &
\begin{pmatrix} \dfrac{x_1}{\partial \, X_{1n}} \\ \dfrac{x_2}{\partial \, X_{1n}} \\ \vdots \\ \dfrac{x_p}{\partial \, X_{1n}}  \end{pmatrix} \\
\begin{pmatrix} \dfrac{x_1}{\partial \, X_{21}} \\ \dfrac{x_2}{\partial \, X_{21}} \\ \vdots \\ \dfrac{x_p}{\partial \, X_{21}}  \end{pmatrix} & 
\begin{pmatrix} \dfrac{x_1}{\partial \, X_{22}} \\ \dfrac{x_2}{\partial \, X_{22}} \\ \vdots \\ \dfrac{x_p}{\partial \, X_{22}}  \end{pmatrix} &
\cdots &
\begin{pmatrix} \dfrac{x_1}{\partial \, X_{2n}} \\ \dfrac{x_2}{\partial \, X_{2n}} \\ \vdots \\ \dfrac{x_p}{\partial \, X_{2n}}  \end{pmatrix} \\
\vdots & \vdots & \ddots & \vdots \\
\begin{pmatrix} \dfrac{x_1}{\partial \, X_{m1}} \\ \dfrac{x_2}{\partial \, X_{m1}} \\ \vdots \\ \dfrac{x_p}{\partial \, X_{m1}}  \end{pmatrix} & 
\begin{pmatrix} \dfrac{x_1}{\partial \, X_{m2}} \\ \dfrac{x_2}{\partial \, X_{m2}} \\ \vdots \\ \dfrac{x_p}{\partial \, X_{m2}}  \end{pmatrix} &
\cdots &
\begin{pmatrix} \dfrac{x_1}{\partial \, X_{mn}} \\ \dfrac{x_2}{\partial \, X_{mn}} \\ \vdots \\ \dfrac{x_p}{\partial \, X_{mn}}  \end{pmatrix} \\
\end{bmatrix} 
\end{align}
$$

- 분모에 vec연산자를 이용하여 벡터를 벡터로 미분하는것도 가능(아래 예제에서 확인함)

- 행렬을 행렬로 미분하는 경우도 두 방식 모두 똑같이 적용 가능

### 예제<sup>[3]</sup>
- $\mathbf{X} : m \times n $, $\mathbf{b} : n \times 1$, $\mathbf{Xb} : m \times 1$

- 분모를 행렬로 그대로 미분하는 경우

$$
\begin{align}
\frac{\partial \, \mathbf{Xb}}{\partial \, \mathbf{X}} 
&= \frac{\partial }{\partial \mathbf{X}} \otimes \mathbf{Xb} \\[5pt]
&= \begin{bmatrix}
\dfrac{\partial \, \mathbf{Xb}}{\partial \, X_{11}} & \dfrac{\partial \, \mathbf{Xb}}{\partial \, X_{12}} & \cdots & \dfrac{\partial \, \mathbf{Xb}}{\partial \, X_{1n}} \\
\dfrac{\partial \, \mathbf{Xb}}{\partial \, X_{21}} & \dfrac{\partial \, \mathbf{Xb}}{\partial \, X_{22}} & \cdots & \dfrac{\partial \, \mathbf{Xb}}{\partial \, X_{2n}} \\
\vdots & \vdots & \ddots & \vdots \\
\dfrac{\partial \, \mathbf{Xb}}{\partial \, X_{m1}} & \dfrac{\partial \, \mathbf{Xb}}{\partial \, X_{m2}} & \cdots & \dfrac{\partial \, \mathbf{Xb}}{\partial \, X_{mn}}
\end{bmatrix} \\[5pt]
&= \begin{bmatrix}
\begin{pmatrix} \dfrac{\partial \, (\mathbf{Xb})_{1}}{\partial \, X_{11}}\\\dfrac{\partial \, (\mathbf{Xb})_{2}}{\partial \, X_{11}}\\\vdots\\\dfrac{\partial \, (\mathbf{Xb})_{m}}{\partial \, X_{11}} \end{pmatrix} & \begin{pmatrix} \dfrac{\partial \, (\mathbf{Xb})_{1}}{\partial \, X_{12}}\\\dfrac{\partial \, (\mathbf{Xb})_{2}}{\partial \, X_{12}}\\\vdots\\\dfrac{\partial \, (\mathbf{Xb})_{m}}{\partial \, X_{12}} \end{pmatrix} & \cdots & \begin{pmatrix} \dfrac{\partial \, (\mathbf{Xb})_{1}}{\partial \, X_{1n}}\\\dfrac{\partial \, (\mathbf{Xb})_{2}}{\partial \, X_{1n}}\\\vdots\\\dfrac{\partial \, (\mathbf{Xb})_{m}}{\partial \, X_{1n}} \end{pmatrix} \\
\begin{pmatrix} \dfrac{\partial \, (\mathbf{Xb})_{1}}{\partial \, X_{21}}\\\dfrac{\partial \, (\mathbf{Xb})_{2}}{\partial \, X_{21}}\\\vdots\\\dfrac{\partial \, (\mathbf{Xb})_{m}}{\partial \, X_{21}} \end{pmatrix} & \begin{pmatrix} \dfrac{\partial \, (\mathbf{Xb})_{1}}{\partial \, X_{22}}\\\dfrac{\partial \, (\mathbf{Xb})_{2}}{\partial \, X_{22}}\\\vdots\\\dfrac{\partial \, (\mathbf{Xb})_{m}}{\partial \, X_{22}} \end{pmatrix} & \cdots & \begin{pmatrix} \dfrac{\partial \, (\mathbf{Xb})_{1}}{\partial \, X_{2n}}\\\dfrac{\partial \, (\mathbf{Xb})_{2}}{\partial \, X_{2n}}\\\vdots\\\dfrac{\partial \, (\mathbf{Xb})_{m}}{\partial \, X_{2n}} \end{pmatrix} \\
\vdots & \vdots & \ddots & \vdots \\
\begin{pmatrix} \dfrac{\partial \, (\mathbf{Xb})_{1}}{\partial \, X_{m1}}\\\dfrac{\partial \, (\mathbf{Xb})_{2}}{\partial \, X_{m1}}\\\vdots\\\dfrac{\partial \, (\mathbf{Xb})_{m}}{\partial \, X_{m1}} \end{pmatrix} & \begin{pmatrix} \dfrac{\partial \, (\mathbf{Xb})_{1}}{\partial \, X_{m2}}\\\dfrac{\partial \, (\mathbf{Xb})_{2}}{\partial \, X_{m2}}\\\vdots\\\dfrac{\partial \, (\mathbf{Xb})_{m}}{\partial \, X_{m2}} \end{pmatrix} & \cdots & \begin{pmatrix} \dfrac{\partial \, (\mathbf{Xb})_{1}}{\partial \, X_{mn}}\\\dfrac{\partial \, (\mathbf{Xb})_{2}}{\partial \, X_{mn}}\\\vdots\\\dfrac{\partial \, (\mathbf{Xb})_{m}}{\partial \, X_{mn}} \end{pmatrix}
\end{bmatrix} \\[5pt]
&= \begin{bmatrix}
\begin{pmatrix} b_1 \\ 0   \\ \vdots \\ 0 \end{pmatrix} & 
\begin{pmatrix} b_2 \\ 0   \\ \vdots \\ 0 \end{pmatrix} & \cdots & 
\begin{pmatrix} b_n \\ 0   \\ \vdots \\ 0 \end{pmatrix} \\
\begin{pmatrix} 0   \\ b_1 \\ \vdots \\ 0 \end{pmatrix} & 
\begin{pmatrix} 0   \\ b_2 \\ \vdots \\ 0 \end{pmatrix} & \cdots & 
\begin{pmatrix} 0   \\ b_n \\ \vdots \\ 0 \end{pmatrix} \\
\vdots & \vdots & \ddots & \vdots \\
\begin{pmatrix} 0   \\ 0 \\ \vdots \\ b_1 \end{pmatrix} & 
\begin{pmatrix} 0   \\ 0 \\ \vdots \\ b_2 \end{pmatrix} & \cdots & 
\begin{pmatrix} 0   \\ 0 \\ \vdots \\ b_n \end{pmatrix} 
\end{bmatrix}
\end{align}
$$

- 분모를 $\text{vec}(\mathbf{X})$로 바꿔서 미분하는 경우 (분모레이아웃)

$$
\begin{align}
\frac{\partial \, \mathbf{Xb}}{\partial \, \mathbf{X}} &= \frac{\partial \, \mathbf{Xb}}{\partial \, \left(\text{vec}(\mathbf{X}) \right)} \\[5pt]
&= \begin{bmatrix}
\dfrac{\partial \, (\mathbf{Xb})_{1}}{\partial \, X_{11}} & \dfrac{\partial \, (\mathbf{Xb})_{2}}{\partial \, X_{11}} & \cdots & \dfrac{\partial \, (\mathbf{Xb})_{m}}{\partial \, X_{11}} \\
\dfrac{\partial \, (\mathbf{Xb})_{1}}{\partial \, X_{21}} & \dfrac{\partial \, (\mathbf{Xb})_{2}}{\partial \, X_{21}} & \cdots & \dfrac{\partial \, (\mathbf{Xb})_{m}}{\partial \, X_{21}} \\
\vdots & \vdots & \ddots & \vdots \\
\dfrac{\partial \, (\mathbf{Xb})_{1}}{\partial \, X_{mn}} & \dfrac{\partial \, (\mathbf{Xb})_{2}}{\partial \, X_{mn}} & \cdots & \dfrac{\partial \, (\mathbf{Xb})_{m}}{\partial \, X_{mn}}
\end{bmatrix} \\[5pt]
&= \begin{bmatrix}
b_1 & 0 & \cdots & 0 \\
0   & b_1 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & b_1 \\ 
b_2 & 0 & \cdots & 0 \\
0   & b_2 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & b_2 \\ - & - & - & - \\
\vdots & \vdots & \vdots & \vdots \\ - & - & - & - \\
b_n & 0 & \cdots & 0 \\
0   & b_n & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & b_n
\end{bmatrix}
\end{align}
$$

- 두 결과가 다른가? 첫번째 결과를 (m)-transpose 시키면 두번째 결과와 같아 진다.

$$
\begin{bmatrix}
\begin{pmatrix} \color{RoyalBlue}{b_1 \\ 0   \\ \vdots \\ 0} \end{pmatrix} & 
\begin{pmatrix} \color{OrangeRed}{b_2 \\ 0   \\ \vdots \\ 0} \end{pmatrix} & \cdots & 
\begin{pmatrix} \color{YellowGreen}{b_n \\ 0   \\ \vdots \\ 0} \end{pmatrix} \\
\begin{pmatrix} \color{RoyalBlue}{0   \\ b_1 \\ \vdots \\ 0} \end{pmatrix} & 
\begin{pmatrix} \color{OrangeRed}{0   \\ b_2 \\ \vdots \\ 0} \end{pmatrix} & \cdots & 
\begin{pmatrix} \color{YellowGreen}{0   \\ b_n \\ \vdots \\ 0} \end{pmatrix} \\
\color{RoyalBlue}{\vdots} & \vdots & \ddots & \vdots \\
\begin{pmatrix} \color{RoyalBlue}{0   \\ 0 \\ \vdots \\ b_1} \end{pmatrix} & 
\begin{pmatrix} \color{OrangeRed}{0   \\ 0 \\ \vdots \\ b_2} \end{pmatrix} & \cdots & 
\begin{pmatrix} \color{YellowGreen}{0   \\ 0 \\ \vdots \\ b_n} \end{pmatrix} 
\end{bmatrix}^{(m)} = \begin{bmatrix}
\color{RoyalBlue}{b_1} & \color{RoyalBlue}{0} & \color{RoyalBlue}{\cdots} & \color{RoyalBlue}{0} \\
\color{RoyalBlue}{0}   & \color{RoyalBlue}{b_1} & \color{RoyalBlue}{\cdots} & \color{RoyalBlue}{0} \\
\color{RoyalBlue}{\vdots} & \color{RoyalBlue}{\vdots} & \color{RoyalBlue}{\ddots} & \color{RoyalBlue}{\vdots} \\
\color{RoyalBlue}{0} & \color{RoyalBlue}{0} & \color{RoyalBlue}{0} & \color{RoyalBlue}{b_1} \\ 
\color{OrangeRed}{b_2} & \color{OrangeRed}{0} & \color{OrangeRed}{\cdots} & \color{OrangeRed}{0} \\
\color{OrangeRed}{0}   & \color{OrangeRed}{b_2} & \color{OrangeRed}{\cdots} & \color{OrangeRed}{0} \\
\color{OrangeRed}{\vdots} & \color{OrangeRed}{\vdots} & \color{OrangeRed}{\ddots} & \color{OrangeRed}{\vdots} \\
\color{OrangeRed}{0} & \color{OrangeRed}{0} & \color{OrangeRed}{0} & \color{OrangeRed}{b_2} \\ - & - & - & - \\
\vdots & \vdots & \vdots & \vdots \\ - & - & - & - \\
\color{YellowGreen}{b_n} & \color{YellowGreen}{0} & \color{YellowGreen}{\cdots} & \color{YellowGreen}{0} \\
\color{YellowGreen}{0}   & \color{YellowGreen}{b_n} & \color{YellowGreen}{\cdots} & \color{YellowGreen}{0} \\
\color{YellowGreen}{\vdots} & \color{YellowGreen}{\vdots} & \color{YellowGreen}{\ddots} & \color{YellowGreen}{\vdots} \\
\color{YellowGreen}{0} & \color{YellowGreen}{0} & \color{YellowGreen}{0} & \color{YellowGreen}{b_n}
\end{bmatrix}
$$

- 따라서 다음과 같다

$$
\left( \frac{\partial }{\partial \mathbf{X}} \otimes \mathbf{Xb} \right)^{(m)} = \frac{\partial \, \mathbf{Xb}}{\partial \, \left(\text{vec}(\mathbf{X}) \right)} 
$$

### 곱의 미분<sup>[1]</sup>

$\mathbf{X} : m \times n$, $\mathbf{Y} : n \times r$ , $\mathbf{Z} : p \times q$ 일 때 미분 결과는 $mp \times rq$

$$
\frac{\partial \, (\mathbf{XY})}{\partial \, \mathbf{Z}} = \left( \frac{\partial \, \mathbf{X} }{\partial \, \mathbf{Z}} \right) \left( \mathbf{I}_{q} \otimes \mathbf{Y} \right) + \left( \mathbf{I}_{p} \otimes \mathbf{X} \right)\left( \frac{\partial \, \mathbf{Y}}{\partial \, \mathbf{Z}} \right)
$$

행렬로 미분을 할때도 곱의 미분법이 그대로 적용되나 차원 맞춤에 주의 해야 한다. 

위 미분이 다음처럼 되지 않는것은 

$$
\frac{\partial \, (\mathbf{XY})}{\partial \, \mathbf{Z}} = \left( \frac{\partial \, \mathbf{X} }{\partial \, \mathbf{Z}} \right)   \mathbf{Y}  +   \mathbf{X} \left( \frac{\partial \, \mathbf{Y}}{\partial \, \mathbf{Z}} \right)
$$

$\frac{\partial \, \mathbf{X} }{\partial \, \mathbf{Z}}$가 $mp \times nq$가 되기 때문에 $\mathbf{Y}$를 바로 곱할 수 가 없기 때문이다. 

뒤에서 곱해지는 $\mathbf{Y}$가 어떤 형태로 변해야 적절히 식의 곱을 유지할 수 있는지 알아보기 위해 $\mathbf{X} : 1 \times 2$, $\mathbf{Y} : 2 \times 1$, $\mathbf{Z} : 2 \times 2$로 두고 예를 들어보면

$$
\mathbf{X}\mathbf{Y} = 
\begin{bmatrix} \color{RoyalBlue}{X_1} & \color{OrangeRed}{X_2} \end{bmatrix} 
\begin{bmatrix} \color{RoyalBlue}{Y_1} \\ \color{OrangeRed}{Y_2} \end{bmatrix} = \color{RoyalBlue}{X_1} \color{RoyalBlue}{Y_1} + \color{OrangeRed}{X_2} \color{OrangeRed}{Y_2}
$$

처럼 $\mathbf{X}$와 $\mathbf{Y}$의 곱은 $X_i Y_i$가 되어야 한다.

아래처럼 $\mathbf{X}$가 미분된 결과에 $\mathbf{Y}$가 $X_i Y_i$ 형태로 적절히 곱해지기 위해서는 

$$
\frac{\partial \, \mathbf{X} }{\partial \, \mathbf{Z}} =
\begin{bmatrix}
\dfrac{\partial \, \color{RoyalBlue}{X_1}}{\partial \, Z_{11}} & \dfrac{\partial \, \color{OrangeRed}{X_2}}{\partial \, Z_{11}} & \dfrac{\partial \, \color{RoyalBlue}{X_1}}{\partial \, Z_{12}} & \dfrac{\partial \, \color{OrangeRed}{X_2}}{\partial \, Z_{12}} \\
\dfrac{\partial \, \color{RoyalBlue}{X_1}}{\partial \, Z_{21}} & \dfrac{\partial \, \color{OrangeRed}{X_2}}{\partial \, Z_{21}} & \dfrac{\partial \, \color{RoyalBlue}{X_1}}{\partial \, Z_{22}} & \dfrac{\partial \, \color{OrangeRed}{X_2}}{\partial \, Z_{22}}
\end{bmatrix}
$$

$\mathbf{Y}$의 형태가 다음처럼 확장되어야 한다.

$$
\begin{bmatrix}
\color{RoyalBlue}{Y_1} & 0 \\ \color{OrangeRed}{Y_2} & 0 \\ 0 & \color{RoyalBlue}{Y_1} \\ 0 & \color{OrangeRed}{Y_2}
\end{bmatrix} = \mathbf{I}_{2} \otimes \mathbf{Y}
$$

### 행렬 미분의  몇가지 공식<sup>[4]</sup>

- 머신러닝을 공부하다 보면 역전파 알고리즘이나 정규분포의 MLE를 구할 때 행렬 미분이 쓰이는 경우가 있는데 그때 유용하게 쓸 수 있는 몇개지 공식들을 정리했다.

#### [1]  $\dfrac{\partial \, \mathbf{X}^{-1}}{\partial \,  x} = -\mathbf{X}^{-1}\dfrac{\partial \, \mathbf{X}}{\partial \, x}\mathbf{X}^{-1}$  (matrix cookbook eq.59)

$$
\begin{align}
&\mathbf{X}^{-1} \mathbf{X} = \mathbf{I} \\[5pt]
&\mathbf{X}^{-1} \frac{\partial \, \mathbf{X}}{\partial \, x} + \frac{\partial \, \mathbf{X}^{-1}}{\partial \, x} \mathbf{X} = \mathbf{0} \\[5pt]
& \frac{\partial \, \mathbf{X}^{-1}}{\partial \, x} \mathbf{X} = - \mathbf{X}^{-1} \frac{\partial \, \mathbf{X}}{\partial \, x} \\[5pt]
&\frac{\partial \, \mathbf{X}^{-1}}{\partial \, x} = - \mathbf{X}^{-1} \frac{\partial \, \mathbf{X}}{\partial \, x} \mathbf{X}^{-1}
\end{align} 
$$

#### [2] $\dfrac{\partial \, }{\partial \, \mathbf{A}} \text{tr}(\mathbf{AB}) = \mathbf{B}^{\text{T}}$  (matrix cookbook eq.100)

$$
\text{tr}(\mathbf{AB}) =\sum_{i} \sum_{j} (\mathbf{A})_{ij}(\mathbf{B})_{ji}
$$

이므로 인덱스 형식으로 쓰면 

$$
\dfrac{\partial \, }{\partial \, (\mathbf{A})_{ij}} \sum_{i} \sum_{j} (\mathbf{A})_{ij}(\mathbf{B})_{ji} = (\mathbf{B})_{ji}
$$

따라서

$$\dfrac{\partial \, }{\partial \, \mathbf{A}} \text{tr}(\mathbf{AB}) = \mathbf{B}^{\text{T}}$$

같은 방법으로

$$
\dfrac{\partial \, }{\partial \, (\mathbf{B})_{ji}} \sum_{i} \sum_{j} (\mathbf{A})_{ij}(\mathbf{B})_{ji} = (\mathbf{A})_{ij}
$$

따라서 

$$\dfrac{\partial \, }{\partial \, \mathbf{B}} \text{tr}(\mathbf{AB}) = \mathbf{A}$$

#### [3] $\dfrac{\partial \,  \lvert \mathbf{X} \rvert}{\partial \, \mathbf{X}} = \lvert \mathbf{X} \rvert \left(\mathbf{X}^{-1}\right)^{\text{T}}$ (matrix cookbook eq.49)

위 식을 보이기 위해서는 미리 알아야할 내용이 조금 있다. 우선 역행렬은 다음처럼 구할 수 있으며<sup>[5]</sup>

$$
\mathbf{X}^{-1} = \frac{1}{\lvert \mathbf{X} \rvert } \left[ C_{ij} \right]^{\text{T}}
$$

위 식에서 $C_{ij}$는 다음처럼 정의되는 여인수이다. $M_{ij}$는 $\mathbf{X}$의 i행과 j열을 제외하여 얻은 부분 행렬의 행렬식을 나타낸다.

$$
C_{ij} = (-1)^{i+j}M_{ij}
$$

이 여인수의 행렬의 전치 $\left[ C_{ij} \right]^{\text{T}}$를 adjugate 행렬<sup>[6]</sup>이라하고 다음처럼 표시한다.

$$
\text{adj}(\mathbf{X}) = \left[ C_{ij} \right]^{\text{T}}
$$

이를 이용하여 역행렬을 다시 나타내면

$$
\mathbf{X}^{-1} = \frac{1}{\lvert \mathbf{X} \rvert } \text{adj}(\mathbf{X}) \tag{1}
$$

표준 미분소 표현<sup>Canonical differential form</sup>에 대해 이와 동등한 미분표현<sup>Equivalent derivative form</sup>을 다음과 같이 몇가지를 써볼 수 있다.<sup>[7]</sup>

$$
\begin{align}
dy = a dx &\implies \frac{dy}{dx} = a \\[5pt]
dy = \mathbf{a} d\mathbf{x} &\implies \frac{dy}{d\mathbf{x}} = \mathbf{a} \\[5pt]
dy = \text{tr}(\mathbf{A} \text{d}\mathbf{X}) &\implies \frac{dy}{\text{d}\mathbf{X}} = \mathbf{A}
\end{align} \tag{2}
$$

위 식에서 $dy$, $dx$, $\text{d}\mathbf{X}$는 미분소<sup>differential or infinitesimal</sup>로 변수의 미소변량을 나타내고 이 미소변량의 비율인 $\frac{dy}{dx}$, $\frac{dy}{\text{d}\mathbf{X}}$을 미분<sup>derivative</sup>라 한다.<sup>[8]</sup> 

세번째 식은 위에서 보인 $\frac{\partial \, }{\partial \, \mathbf{B}} \text{tr}(\mathbf{AB}) = \mathbf{A}$를 이용하면

$$
\frac{d \, \text{tr}(\mathbf{A} \text{d}\mathbf{X}) }{\text{d}\mathbf{X}} =  \mathbf{A}
$$

임을 바로 알 수 있다.

한편 행렬식의 미분에 관한 야코비 공식<sup>Jacobi's_formula[9]</sup>이 있는데 여기서 이를 증명하기는 너무 길고 지루하므로 일단 다음 결과를 받아 들이도록 한다.

$$
\text{d} \lvert \mathbf{X} \rvert = \text{tr} \left( \text{adj}(\mathbf{X}) \text{d}\mathbf{X} \right) \tag{3}
$$

증명은 위키에 아주 자세히 나와 있다.

이상의 내용을 이용하면 보이고자 하는 미분은 비교적 간단하게 보일 수 있다. 식(1)로 부터

$$
\begin{align}
\mathbf{X}^{-1} &= \frac{1}{\lvert \mathbf{X} \rvert} \text{adj}(\mathbf{X}) \\[5pt]
\lvert \mathbf{X} \rvert  \mathbf{X}^{-1} &= \text{adj}(\mathbf{X}) \\[5pt]
\lvert \mathbf{X} \rvert  \mathbf{X}^{-1} \text{d}\mathbf{X} &= \text{adj}(\mathbf{X})\text{d}\mathbf{X} \\[5pt]
\text{tr}\left(\lvert \mathbf{X} \rvert  \mathbf{X}^{-1} \text{d}\mathbf{X}\right) &= \text{tr}\left(\text{adj}(\mathbf{X})\text{d}\mathbf{X}\right)
\end{align}
$$

이며 식(3)에 의해

$$
\text{d} \lvert \mathbf{X} \rvert = \text{tr}\left(\lvert \mathbf{X} \rvert  \mathbf{X}^{-1} \text{d}\mathbf{X}\right)
$$

가 되고 식(2) 3번째 식에 의해

$$
\text{d} \lvert \mathbf{X} \rvert = \text{tr}\left(\lvert \mathbf{X} \rvert  \mathbf{X}^{-1} \text{d}\mathbf{X}\right) \implies \frac{ \text{d} \, \lvert \mathbf{X} \rvert}{\text{d}\mathbf{X}} = \lvert \mathbf{X} \rvert  \mathbf{X}^{-1}
$$

가 됨을 알 수 있다.


또는 좀 더 풀어 써보면 $\frac{d \, \text{tr}(\mathbf{A} \text{d}\mathbf{X}) }{\text{d}\mathbf{X}} =  \mathbf{A}$에 의해

$$
\frac{ \text{d} \, \lvert \mathbf{X} \rvert}{\text{d}\mathbf{X}} = \frac{\text{d} \, \text{tr}\left(\lvert \mathbf{X} \rvert  \mathbf{X}^{-1} \color{RoyalBlue}{\text{d}\mathbf{X}}\right) }{\color{RoyalBlue}{\text{d}\mathbf{X}}} = \lvert \mathbf{X} \rvert  \mathbf{X}^{-1}
$$

한편 $\text{tr}(\mathbf{AB}) = \text{Tr}(\mathbf{BA})$ 이므로

$$
\frac{ \text{d} \, \lvert \mathbf{X} \rvert}{\text{d}\mathbf{X}} = \frac{\text{d} \, \text{tr}\left(\lvert \mathbf{X} \rvert  \mathbf{X}^{-1} \color{RoyalBlue}{\text{d}\mathbf{X}}\right) }{\color{RoyalBlue}{\text{d}\mathbf{X}}} = \frac{\text{d} \, \text{tr}\left( \color{RoyalBlue}{\text{d}\mathbf{X}} \lvert \mathbf{X} \rvert  \mathbf{X}^{-1} \right) }{\color{RoyalBlue}{\text{d}\mathbf{X}}} = \left( \lvert \mathbf{X} \rvert  \mathbf{X}^{-1} \right)^{\text{T}} =  \lvert \mathbf{X} \rvert \left(  \mathbf{X}^{-1} \right)^{\text{T}}
$$


#### [4] $\dfrac{\partial \, }{\partial \, \mathbf{X}} \ln \lvert \mathbf{X} \rvert = \left( \mathbf{X}^{-1} \right)^{\text{T}}$  (matrix cookbook eq.57)

위 결과를 이용하면

$$
\frac{\partial \, }{\partial \, \mathbf{X}} \ln \lvert \mathbf{X} \rvert  = \frac{1}{ \lvert \mathbf{X} \rvert } \frac{\partial}{\partial \, \mathbf{X}} = \frac{1}{ \lvert \mathbf{X} \rvert }  \lvert \mathbf{X} \rvert \mathbf{X}^{-1}=\mathbf{X}^{-1}
$$

또는

$$
\frac{\partial \, }{\partial \, \mathbf{X}} \ln \lvert \mathbf{X} \rvert  = \frac{1}{ \lvert \mathbf{X} \rvert } \frac{\partial}{\partial \, \mathbf{X}} = \frac{1}{ \lvert \mathbf{X} \rvert }  \lvert \mathbf{X} \rvert \left(\mathbf{X}^{-1}\right)^{\text{T}}= \left(\mathbf{X}^{-1}\right)^{\text{T}}
$$

#### [5] $\dfrac{\partial \, \mathbf{a}^{\text{T}} \mathbf{X} \mathbf{b}}{\partial \, \mathbf{X}}=\mathbf{ab}^{\text{T}}$ (matrix cookbook eq.70)

$\mathbf{a}^{\text{T}} : 1 \times m$, $\mathbf{X} : m \times n$, $\mathbf{b} : n \times 1$ 인 임의의 벡터와 행렬이라고 가정한다. 

$\mathbf{a}^{\text{T}} \mathbf{X} \mathbf{b}$는 결과가 숫자 이므로 $\text{tr}(\mathbf{a}^{\text{T}} \mathbf{X} \mathbf{b})$로 트레이스를 씌워도 결과가 변하지 않는다. 따라서 $\frac{\partial \, }{\partial \, \mathbf{A}} \text{tr}(\mathbf{AB}) = \mathbf{B}^{\text{T}}$을 사용하면 다음처럼 간단히 보일 수 있다.

$$
\frac{\partial \, \mathbf{a}^{\text{T}} \mathbf{X} \mathbf{b}}{\partial \, \mathbf{X}} =  \frac{\partial \, \text{tr}\left(\mathbf{a}^{\text{T}} \mathbf{X} \mathbf{b}\right)}{\partial \, \mathbf{X}} =  \frac{\partial \, \text{tr}\left(\mathbf{X} \mathbf{b} \mathbf{a}^{\text{T}} \right)}{\partial \, \mathbf{X}} = \left( \mathbf{b} \mathbf{a}^{\text{T}} \right)^{\text{T}} = \mathbf{a}\mathbf{b}^{\text{T}}
$$

또는 약간 번거롭지만 크로네커 곱과 곱의 미분법을 그대로 적용해서도 보일 수 있다.



$$
\begin{align}
\frac{\partial \, \mathbf{a}^{\text{T}} \mathbf{X} \mathbf{b}}{\partial \, \mathbf{X}} 
&= \frac{\partial \, \left(\mathbf{a}^{\text{T}} \mathbf{X} \right) \mathbf{b}}{\partial \, \mathbf{X}} \\[5pt]
&= \underbrace{\frac{\partial \, \mathbf{a}^{\text{T}} \mathbf{X}}{\partial \, \mathbf{X}}}_{m \times n^2} \underbrace{\left( \mathbf{I}_n \otimes \mathbf{b} \right)}_{n^2 \times n} + \underbrace{\left(\mathbf{I}_m \otimes \mathbf{a}^{\text{T}} \mathbf{X}\right)}_{m \times mn} \underbrace{\frac{\partial \, \mathbf{b}}{\partial \, \mathbf{X}}}_{mn \times n} \\[5pt]
&= \left\{ \underbrace{\frac{\partial \, \mathbf{a}^{\text{T}}}{\partial \, \mathbf{X}}}_{m \times mn} \underbrace{ \left(\mathbf{I}_n \otimes \mathbf{X} \right)}_{mn \times n^2} + \underbrace{\left( \mathbf{I}_m \otimes \mathbf{a}^{\text{T}}\right)}_{m \times m^2} \underbrace{ \frac{\partial \, \mathbf{X}}{\partial \, \mathbf{X}}}_{m^2 \times n^2} \right\} \left( \mathbf{I}_n \otimes \mathbf{b} \right) \\[5pt]
&= \left( \mathbf{I}_m \otimes \mathbf{a}^{\text{T}}\right) \frac{\partial \, \mathbf{X}}{\partial \, \mathbf{X}} \left( \mathbf{I}_n \otimes \mathbf{b} \right)
\end{align}
$$

$\left( \mathbf{I}_m \otimes \mathbf{a}^{\text{T}}\right)$은 $m \times m$인 부분행렬이 행방향으로 $m$개 늘어선 형태로 $m \times m^2$인 행렬이 된다. 부분행렬은 그 행렬이 전체 행렬에서 위치하는 곳의 행을 $\mathbf{a}^{\text{T}}$로 가지는 행렬이다.

$$
\left( \mathbf{I}_m \otimes \mathbf{a}^{\text{T}}\right) =
\begin{bmatrix}
\color{RoyalBlue}{\begin{matrix}a_1 & a_2 & \cdots & a_m\end{matrix}} & | & \mathbf{0}^{\text{T}} & | & \cdots & | & \mathbf{0}^{\text{T}} \\
\mathbf{0}^{\text{T}} & | & \color{RoyalBlue}{\begin{matrix}a_1 & a_2 & \cdots & a_m\end{matrix}} & | & \cdots &  | & \mathbf{0}^{\text{T}} \\
\vdots & | & \vdots & | & \ddots & | & \vdots \\
\mathbf{0}^{\text{T}} & | & \mathbf{0}^{\text{T}} & | & \cdots & | & \color{RoyalBlue}{\begin{matrix}a_1 & a_2 & \cdots & a_m\end{matrix}} 
\end{bmatrix}
$$

$\frac{\partial \, \mathbf{X}}{\partial \, \mathbf{X}}$은 $m \times n$ 부분 행렬이 $m \times n$으로 바둑판 형식으로 늘어선 행렬로 $m^2 \times n^2$행렬이 되며 여기서 각 부분 행렬은 전체 행렬에서 그 부분행렬이 위치하는 자리만 1이고 나머지는 모두 0인 행렬이 된다.

즉, 아래 식처럼 첫번째 부분행렬은 (1,1)만 1이고 나머지는 모두 0인 부분행렬이고, 그 오른쪽 옆 행렬은 (1,2)만 1이고 나머지는 모두 0인 부분행렬이 되는 식이다.

$$
\frac{\partial \, \mathbf{X}}{\partial \, \mathbf{X}} =
\left[
\begin{array}{c|c|c|c}
\begin{matrix}
1 & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\  0 & 0 & \cdots & 0
\end{matrix} & 
\begin{matrix}
0 & 1 & \cdots & 0 \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\  0 & 0 & \cdots & 0
\end{matrix} &
\begin{matrix}\cdots \\ \cdots \\ \cdots \\ \cdots \end{matrix} &
\begin{matrix}
0 & 0 & \cdots & 1 \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\  0 & 0 & \cdots & 0
\end{matrix} \\
\begin{matrix}-&-&-&-\end{matrix} & \begin{matrix}-&-&-&-\end{matrix} & \begin{matrix}-&-&-&-\end{matrix} & \begin{matrix}-&-&-&-\end{matrix} \\
\begin{matrix}
0 & 0 & \cdots & 0 \\ 1 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\  0 & 0 & \cdots & 0
\end{matrix} & 
\begin{matrix}
0 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\  0 & 0 & \cdots & 0
\end{matrix} &
\begin{matrix}\cdots \\ \cdots \\ \cdots \\ \cdots \end{matrix} &
\begin{matrix}
0 & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 1 \\ \vdots & \vdots & \ddots & \vdots \\  0 & 0 & \cdots & 0
\end{matrix} \\
\begin{matrix}-&-&-&-\end{matrix} & \begin{matrix}-&-&-&-\end{matrix} & \begin{matrix}-&-&-&-\end{matrix} & \begin{matrix}-&-&-&-\end{matrix} \\
\begin{matrix}\vdots&\vdots&\vdots&\vdots\end{matrix} & \begin{matrix}\vdots&\vdots&\vdots&\vdots\end{matrix} & \begin{matrix}\vdots&\vdots&\vdots&\vdots\end{matrix} & \begin{matrix}\vdots&\vdots&\vdots&\vdots\end{matrix} \\
\begin{matrix}-&-&-&-\end{matrix} & \begin{matrix}-&-&-&-\end{matrix} & \begin{matrix}-&-&-&-\end{matrix} & \begin{matrix}-&-&-&-\end{matrix} \\
\begin{matrix}
0 & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\  1 & 0 & \cdots & 0
\end{matrix} & 
\begin{matrix}
0 & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\  0 & 1 & \cdots & 0
\end{matrix} &
\begin{matrix}\cdots \\ \cdots \\ \cdots \\ \cdots \end{matrix} &
\begin{matrix}
0 & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\  0 & 0 & \cdots & 1
\end{matrix}
\end{array}
\right]
$$


위 두 행렬을 먼저 곱하면 $m \times n$ 부분행렬이 $n$개 만큼 행방향으로 늘어선 $m \times n^2$인 행렬이 되는데 전체 행렬에서 부분행렬이 있는 위치의 열이 $\mathbf{a}$가 되는 행렬이다.

$$
\left( \mathbf{I}_m \otimes \mathbf{a}^{\text{T}}\right) \frac{\partial \, \mathbf{X}}{\partial \, \mathbf{X}} = 
\left[
\begin{array}{c|c|c|c}
\begin{matrix}
a_1 & 0 & \cdots & 0 \\ a_2 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\  a_m & 0 & \cdots & 0
\end{matrix} &
\begin{matrix}
0 & a_1 & \cdots & 0 \\ 0 & a_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\  0 & a_m & \cdots & 0
\end{matrix} & 
\begin{matrix}\cdots \\ \cdots \\ \cdots \\ \cdots \end{matrix} &
\begin{matrix}
0 & 0 & \cdots & a_1 \\ 0 & 0 & \cdots & a_2 \\ \vdots & \vdots & \ddots & \vdots \\  0 & 0 & \cdots & a_m
\end{matrix}
\end{array}
\right]
$$

한편 $\left( \mathbf{I}_n \otimes \mathbf{b} \right)$은 $n \times n$ 부분행렬이 열방향으로 늘어선 $n^2 \times n$인 행렬로 전체 행렬에서 부분행렬이 있는 위치의 열이 $\mathbf{b}$가 되는 행렬이다.

$$
\left( \mathbf{I}_n \otimes \mathbf{b} \right) = 
\begin{bmatrix}
\begin{matrix}
b_1 & 0 & \cdots & 0 \\ b_2 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ b_n & 0 & \cdots & 0
\end{matrix} \\
\begin{matrix} - & - & - & - \end{matrix} \\
\begin{matrix}
0 & b_1 & \cdots & 0 \\ 0 & b_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & b_n & \cdots & 0
\end{matrix} \\
\begin{matrix} - & - & - & - \end{matrix} \\
\begin{matrix} \vdots & \vdots & \vdots & \vdots \end{matrix}\\
\begin{matrix} - & - & - & - \end{matrix} \\
\begin{matrix}
0 & 0 & \cdots & b_1 \\ 0 & 0 & \cdots & b_2 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & b_n
\end{matrix} 
\end{bmatrix}
$$

마지막으로 두 행렬을 곱하면 원하는 결과를 얻을 수 있다.

$$
\left( \mathbf{I}_m \otimes \mathbf{a}^{\text{T}}\right) \frac{\partial \, \mathbf{X}}{\partial \, \mathbf{X}} \left( \mathbf{I}_n \otimes \mathbf{b} \right) = 
\begin{bmatrix}
a_1 b_1 & a_1 b_2 & \cdots & a_1 b_n \\
a_2 b_1 & a_2 b_2 & \cdots & a_2 b_n \\
\vdots  & \vdots  & \ddots & \vdots \\
a_m b_1 & a_m b_2 & \cdots & a_m b_n
\end{bmatrix} = \mathbf{a} \mathbf{b}^{\text{T}}
$$

#### [6] $\dfrac{\partial \, \mathbf{a}^{\text{T}} \mathbf{X}^{-1} \mathbf{b}}{\partial \, \mathbf{X}}= -\mathbf{X}^{\text{T}} \mathbf{ab}^{\text{T}} \mathbf{X}^{-\text{T}}$ (matrix cookbook eq.61)

위 미분은 앞선 matrix cookbook eq.70 미분 공식과 크로네커 곱의 두 성질<sup>[10]</sup>

$$
\left( \mathbf{A} \otimes \mathbf{B} \right)^{-1} = \mathbf{A}^{-1} \otimes \mathbf{B}^{-1}
$$

$$
\left( \mathbf{A} \otimes \mathbf{B} \right)\left( \mathbf{C} \otimes \mathbf{D} \right) = \mathbf{A}\mathbf{C} \otimes  \mathbf{B}\mathbf{D}
$$

을 이용하여 보일 수 있다.

역행렬을 가지는 $\mathbf{X} : m \times m$와 $\mathbf{a}^{\text{T}} : 1 \times m$, $\mathbf{b} : m \times 1$ 임의의 벡터를 가정한다. 

$$
\begin{align}
\frac{\partial \, \mathbf{a}^{\text{T}} \mathbf{X}^{-1} \mathbf{b}}{\partial \, \mathbf{X}} 
&= \frac{\partial \, \mathbf{a}^{\text{T}} \mathbf{X}^{-1}}{\partial \, \mathbf{X}} \left( \mathbf{I}_m \otimes \mathbf{b} \right) + \left( \mathbf{I}_m \otimes \mathbf{a}^{\text{T}} \mathbf{X}^{-1} \right) \frac{\partial \, \mathbf{b}}{\partial \, \mathbf{X}} \\[5pt]
&= \left( \frac{\partial \, \mathbf{a}^{\text{T}}}{\partial \, \mathbf{X}} \left( \mathbf{I}_m \otimes \mathbf{X}^{-1} \right) + \left( \mathbf{I}_m \otimes \mathbf{a}^{\text{T}} \right) \frac{\partial \, \mathbf{X}^{-1}}{\partial \, \mathbf{X}} \right) \left( \mathbf{I}_m \otimes \mathbf{b} \right) \\[5pt]
&= \left( \mathbf{I}_m \otimes \mathbf{a}^{\text{T}} \right)  \frac{\partial \, \mathbf{X}^{-1}}{\partial \, \mathbf{X}} \left( \mathbf{I}_m \otimes \mathbf{b} \right)
\end{align} \tag{1}
$$

한편 $\mathbf{X}^{-1} \mathbf{X} = \mathbf{I}$에서

$$
\frac{\partial \,\mathbf{X}^{-1} \mathbf{X}}{\partial \, \mathbf{X}} = \frac{\partial \, \mathbf{I}}{\partial \, \mathbf{X}} \\[5pt]
\frac{\partial \,\mathbf{X}^{-1} }{\partial \, \mathbf{X}}\left( \mathbf{I}_m \otimes \mathbf{X} \right) + \left( \mathbf{I}_m \otimes \mathbf{X}^{-1} \right) \frac{\partial \, \mathbf{X}}{\partial \, \mathbf{X}} = \mathbf{0} \\[5pt]
\frac{\partial \,\mathbf{X}^{-1} }{\partial \, \mathbf{X}} \left( \mathbf{I}_m \otimes \mathbf{X} \right) \left( \mathbf{I}_m \otimes \mathbf{X} \right)^{-1} + \left( \mathbf{I}_m \otimes \mathbf{X}^{-1} \right) \frac{\partial \, \mathbf{X}}{\partial \, \mathbf{X}} \left( \mathbf{I}_m \otimes \mathbf{X} \right)^{-1}= \mathbf{0} \\[5pt]
\frac{\partial \,\mathbf{X}^{-1} }{\partial \, \mathbf{X}}  =  - \left( \mathbf{I}_m \otimes \mathbf{X}^{-1} \right) \frac{\partial \, \mathbf{X}}{\partial \, \mathbf{X}} \left( \mathbf{I}_m \otimes \mathbf{X} \right)^{-1}
$$

이제 $ \left( \mathbf{A} \otimes \mathbf{B} \right)^{-1} = \mathbf{A}^{-1} \otimes \mathbf{B}^{-1}$를 이용하면 다음처럼 된다.

$$
\frac{\partial \,\mathbf{X}^{-1} }{\partial \, \mathbf{X}}  =  - \left( \mathbf{I}_m \otimes \mathbf{X}^{-1} \right) \frac{\partial \, \mathbf{X}}{\partial \, \mathbf{X}} \left( \mathbf{I}_m \otimes \mathbf{X}^{-1} \right) \tag{2}
$$

(2)를 (1)에 대입하면

$$
\begin{align}
\frac{\partial \, \mathbf{a}^{\text{T}} \mathbf{X}^{-1} \mathbf{b}}{\partial \, \mathbf{X}} &=  - \left( \mathbf{I}_m \otimes \mathbf{a}^{\text{T}} \right) \left( \mathbf{I}_m \otimes \mathbf{X}^{-1} \right) \frac{\partial \, \mathbf{X}}{\partial \, \mathbf{X}} \left( \mathbf{I}_m \otimes \mathbf{X}^{-1} \right) \left( \mathbf{I}_m \otimes \mathbf{b} \right) \\[5pt]
&= - \left( \mathbf{I}_m \otimes \mathbf{a}^{\text{T}} \mathbf{X}^{-1} \right) \frac{\partial \, \mathbf{X}}{\partial \, \mathbf{X}} \left( \mathbf{I}_m \otimes \mathbf{X}^{-1} \mathbf{b} \right) \quad \because \left( \mathbf{A} \otimes \mathbf{B} \right)\left( \mathbf{C} \otimes \mathbf{D} \right) = \mathbf{A}\mathbf{C} \otimes  \mathbf{B}\mathbf{D}
\end{align}
$$

위 식과 앞선 미분공식 

$$
\begin{align}
\frac{\partial \, \mathbf{a}^{\text{T}} \mathbf{X} \mathbf{b}}{\partial \, \mathbf{X}} 
= \left( \mathbf{I}_m \otimes \mathbf{a}^{\text{T}}\right) \frac{\partial \, \mathbf{X}}{\partial \, \mathbf{X}} \left( \mathbf{I}_n \otimes \mathbf{b} \right) = \mathbf{a} \mathbf{b}^{\text{T}}
\end{align}
$$

을 보일 때의 과정을 비교하면 최종적으로 다음을 보일 수 있다.


$$
\frac{\partial \, \mathbf{a}^{\text{T}} \mathbf{X}^{-1} \mathbf{b}}{\partial \, \mathbf{X}} = - \mathbf{X}^{-\text{T}} \mathbf{a} \mathbf{b}^{\text{T}} \mathbf{X}^{-\text{T}} 
$$

## 참고문헌

1. COURSE NOTES: STATISTICS 550 ADVANCED MATHEMATICAL STATISTICS SPRING 2008, Robert J. Boik, Department of Mathematical Sciences
Montana State University, 2012

2. Old and New Matrix Algebra Useful for Statistics, Thomas P., Minka (December 28, 2000), MIT Media Lab note (1997; revised 12/00). Retrieved 5 February 2016.

3. Linear Algebra & Matrix Calculus:https://www.slideshare.net/ssuser7e10e4/matrix-calculus, 임성빈

3. The Matrix Cookbook, Kaare Brandt Petersen & Michael Syskind Pedersen, 2012

5. Advanced Engineering Mathematics 7.7 & 7.8, Erwin Kreyszig, Wiley

6. Adjugate_matrix:https://en.wikipedia.org/wiki/Adjugate_matrix

7. Matrix_calculus:https://en.wikipedia.org/wiki/Matrix_calculus

8. Differential_(infinitesimal):https://en.wikipedia.org/wiki/Differential_(infinitesimal)

9. Jacobi's_formula:https://en.wikipedia.org/wiki/Jacobi%27s_formula

10. Kronecker product:https://en.wikipedia.org/wiki/Kronecker_product

In [5]:
%%html
<link href='https://fonts.googleapis.com/earlyaccess/nanummyeongjo.css' rel='stylesheet' type='text/css'>
<link href='https://fonts.googleapis.com/earlyaccess/nanumgothiccoding.css' rel='stylesheet' type='text/css'>
<link href='https://fonts.googleapis.com/earlyaccess/notosanskr.css' rel='stylesheet' type='text/css'>
<style>
    h1     { font-family: 'Noto Sans KR' !important; color:#348ABD !important;   }
    h2     { font-family: 'Noto Sans KR' !important; color:#467821 !important;   }
    h3, h4 { font-family: 'Noto Sans KR' !important; color:#A60628 !important;   }
    p  { font-family: 'Georgia','Nanum Myeongjo'; font-size: 12pt; line-height: 200%;  text-indent: 10px; }
    li { font-family: 'Georgia','Nanum Myeongjo'; font-size: 12pt; line-height: 200%; }
    li > p { text-indent: 0px; }
    /* blockquote > p { font-family: 'Georgia','Nanum Myeongjo' !important; font-size: 12pt !important; } */
    sup { font-family: 'Georgia'; } 
    code, pre { font-family: 'Nanum Gothic Coding' !important; }
    .code-body { font-family: 'Nanum Gothic Coding'; }
    .ns { font-family: 'Noto Sans KR'; font-size: 15pt;}
    .summary {font-family: 'Georgia'; font-size: 12pt; line-height: 200%; 
              border-left:3px solid #FF0000;padding-left:20px;margin-top:10px; }
</style>