# Some notable covariance matrix derivatives

Reference: [Matrix Calculus Proofs](http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/proof002.html)

## Basic properties
- Transpose: $dX^\top = d(X^\top) = (dX)^\top$.
- Inverse transpose: $X^{-\top} = (X^{-1})^\top = (X^\top)^{-1}$.
- Product rule: $d(XY) = (dX)Y + X(dY)$.

## Identities
Let $\mathbf{e_i}$ be the $i$-th column of the identity matrix and $L$ be the Cholesky decomposed matrix of a symmetric matrix $C$, _i.e._, $L$ is a lower triangular matrix and $C = LL^\top$. The following holds.

1. $\frac{d}{dc_{ij}}C = \mathbf{e_i}\mathbf{e_j}^\top$ where $\mathbf{e_i}\mathbf{e_j}^\top$ is a matrix containing a 1 in position $i$, $j$ and zeros elsewhere.

2. $dC^{-1} = -C^{-1}dC C^{-1}$.

3. $\frac{d}{dc_{ij}}C^{-1} = -C^{-1} \mathbf{e_i} (C^{-\top} \mathbf{e_j}^\top)^\top$.

4. $\frac{d}{dC} \mathbf{a}^\top C^{-1} \mathbf{b} = -C^{-1} \mathbf{a} \mathbf{b}^\top C^{-1}$.

5. $\frac{d}{dL} \mathbf{a}^\top C^{-1} \mathbf{b} = -L^{-\top} L^{-1} (\mathbf{b} \mathbf{a}^\top + \mathbf{a} \mathbf{b}^\top) L^{-\top}$.

## Proofs
Let $I(x=X)$ be the indicator function
\begin{align*}
    I(x = X) = \begin{cases} 1 & x = X \\ 0 & \text{otherwise} \end{cases}
\end{align*}.

1. $\frac{d}{dc_{ij}}C = \mathbf{e_i}\mathbf{e_j}^\top$.

    \begin{align*}
        \frac{d}{dc_{ij}}C &= 
        \begin{bmatrix}
            \frac{d}{dc_{11}} c_{11} & \ldots & \frac{d}{dc_{1n}} c_{1n} \\
            \vdots & \ddots & \vdots \\
            \frac{d}{dc_{n1}} c_{n1} & \ldots & \frac{d}{dc_{nn}} c_{nn}
        \end{bmatrix} \\
        &= \begin{bmatrix}
            I((i,j)=(1,1)) & \ldots & I((i,j)=(1,n)) \\
            \vdots & \ddots & \vdots \\
            I((i,j)=(n,1)) & \ldots & I((i,j)=(n,n))
        \end{bmatrix} \\
        &= \begin{bmatrix}
            I(i = 1) \\
            \vdots \\
            I(i = n)
        \end{bmatrix}
        \begin{bmatrix}
            I(j = 1) & \ldots & I(j = n)
        \end{bmatrix} \\
        &= \mathbf{e_i} \mathbf{e_j}^\top.
    \end{align*}

2. $dC^{-1} = -C^{-1}dC C^{-1}$.

    \begin{align*}
        0 &= d(C C^{-1}) = (dC)C^{-1} + C(dC^{-1}) \Rightarrow dC^{-1} = -C^{-1} dC C^{-1}.
    \end{align*}

3. $\frac{d}{dc_{ij}}C^{-1} = -C^{-1} \mathbf{e_i} (C^{-\top} \mathbf{e_j}^\top)^\top$.

    \begin{align*}
        \frac{d}{dc_{ij}}C^{-1} &= -C^{-1} \frac{d}{dc_{ij}}C C^{-1} = -C^{-1} \mathbf{e_i} \mathbf{e_j}^\top C^{-1} = -C^{-1} \mathbf{e_i} (C^{-\top} \mathbf{e_j}^\top)^\top.
    \end{align*}

4. $\frac{d}{dC} \mathbf{a}^\top C^{-1} \mathbf{b} = -C^{-1} \mathbf{a} \mathbf{b}^\top C^{-1}$.

    \begin{align*}
        \frac{d}{dc_{ij}} (\mathbf{a}^\top C^{-1} \mathbf{b}) = \mathbf{a}^\top \frac{d}{dc_{ij}}C^{-1} \mathbf{b} &= \underbrace{-\mathbf{a}^\top C^{-1} \mathbf{e_i}}_{\text{scalar}} \underbrace{\mathbf{e_j}^\top C^{-1} \mathbf{b}}_{\text{scalar}} = \mathbf{e_i}^\top C^{-\top} \mathbf{a} \mathbf{b}^\top C^{-\top} \mathbf{e_j} = \mathbf{e_i}^\top C^{-1} \mathbf{a} \mathbf{b}^\top C^{-1} \mathbf{e_j}. \\
        \therefore \frac{d}{dC} (\mathbf{a}^\top C^{-1} \mathbf{b}) &= C^{-1} \mathbf{a} \mathbf{b}^\top C^{-1}.
    \end{align*}

5. $\frac{d}{dL} \mathbf{a}^\top C^{-1} \mathbf{b} = -L^{-\top} L^{-1} (\mathbf{b} \mathbf{a}^\top + \mathbf{a} \mathbf{b}^\top) L^{-\top}$.

    \begin{align*}
        \frac{d}{dl_{ij}} \mathbf{a}^\top C^{-1} \mathbf{b} &= \mathbf{a}^\top \frac{d}{dl_{ij}} (LL^\top)^{-1} \mathbf{b} = \mathbf{a}^\top \frac{d}{dl_{ij}} (L^{-\top} L^{-1}) \mathbf{b} \\
        &= \mathbf{a}^\top \frac{d}{dl_{ij}} (L^{-\top}) L^{-1} \mathbf{b} + \mathbf{a}^\top L^{-\top} \frac{d}{dl_{ij}} (L^{-1}) \mathbf{b} \\
        &= \mathbf{a}^\top (\frac{d}{dl_{ij}} L^{-1})^\top L^{-1} \mathbf{b} + \mathbf{a}^\top L^{-\top} \frac{d}{dl_{ij}} (L^{-1}) \mathbf{b} \\
        &= \mathbf{a}^\top (-L^{-1} \mathbf{e_i} \mathbf{e_j}^\top L^{-1})^\top L^{-1} \mathbf{b} + \mathbf{a}^\top L^{-\top} (-L^{-1} \mathbf{e_i} \mathbf{e_j}^\top L^{-1}) \mathbf{b} \\
        &= -\mathbf{a}^\top (L^{-\top} \mathbf{e_j} \mathbf{e_i}^\top L^{-\top}) L^{-1} \mathbf{b} - \mathbf{a}^\top L^{-\top} (L^{-1} \mathbf{e_i} \mathbf{e_j}^\top L^{-1}) \mathbf{b} \\
        &= -\underbrace{\mathbf{a}^\top L^{-\top} \mathbf{e_j}}_{\text{scalar}} \underbrace{\mathbf{e_i}^\top L^{-\top} L^{-1} \mathbf{b}}_{\text{scalar}} - \underbrace{\mathbf{a}^\top L^{-\top} L^{-1} \mathbf{e_i}}_{\text{scalar}} \underbrace{\mathbf{e_j}^\top L^{-1} \mathbf{b}}_{\text{scalar}} \\
        &= -\mathbf{e_i}^\top L^{-\top} L^{-1} \mathbf{b} \mathbf{a}^\top L^{-\top} \mathbf{e_j} - \mathbf{e_i}^\top L^{-\top} L^{-1} \mathbf{a} \mathbf{b}^\top L^{-\top} \mathbf{e_j} \\
        &= -\mathbf{e_i}^\top L^{-\top} L^{-1} (\mathbf{b} \mathbf{a}^\top + \mathbf{a} \mathbf{b}^\top) L^{-\top} \mathbf{e_j}. \\
        \therefore \frac{d}{dL} \mathbf{a}^\top C^{-1} \mathbf{b} &= -L^{-\top} L^{-1} (\mathbf{b} \mathbf{a}^\top + \mathbf{a} \mathbf{b}^\top) L^{-\top}.
    \end{align*}