# Partials etc involving matricies

## Preliminaries

In [None]:
#%matplotlib widget
%matplotlib inline
%load_ext autoreload
%autoreload 2

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import sympy

### A few ways to get test numpy arrays

In [None]:
np.arange(3), np.arange(4,8), np.arange(5,1,-2)

For experiments with multiplication, arrays of primes may be helpful:

In [None]:
def arangep(n, starting_index=0):
    sympy.sieve.extend_to_no(starting_index + n)
    return np.array(sympy.sieve._list[starting_index:starting_index + n])

In [None]:
arangep(5), arangep(4,2)

In [None]:
M = arangep(4).reshape(2,2)
x = arangep(2,4)
# x = np.arange(2)+1
M,x

## Einstein summation notation

Numpy provides [Einstein summation](https://mathworld.wolfram.com/EinsteinSummation.html) operations with [einsum](https://numpy.org/devdocs/reference/generated/numpy.einsum.html)
1. Repeated indices are implicitly summed over.
1. Each index can appear at most twice in any term.
1. Each term must contain identical non-repeated indices.

In [None]:
es = np.einsum

 $$a_{ik}a_{ij} \equiv \sum_{i} a_{ik}a_{ij}$$

In [None]:
es('ij,j', M, x), es('ij,i', M, x)

___

# Partials

## Preliminaries

A matrix __M__ multiplies a (column) vector __x__ to its right to produce a (column) vector __y__:
$$ \normalsize \mathbf{M} \mathbf{x} = \mathbf{y} $$
where
$$ \normalsize
\mathbf{x} = \sum_{j=1}^{n} x_j \mathbf{\hat{x}}_j \\
\mathbf{y} = \sum_{i=1}^{m} y_i \mathbf{\hat{y}}_i
$$
and $\mathbf{M}$ can be written
$$ \normalsize
\begin{bmatrix}
    m_{1,1} & \dots & m_{1,n} \\
    \vdots & \ddots & \vdots \\
    m_{m,1} & \dots & m_{m,n}
\end{bmatrix} \\
$$

A `python` example:

In [None]:
y = M @ x
y

Using Einstein summation notation, $y_i = m_{ij}x_j$

In [None]:
np.einsum('ij,j', M, x)

## Partial derivative of a matrix multiply of a vector

Wikipedia [defines](https://en.wikipedia.org/wiki/Partial_derivative#Formal_definition) the partial derivative thus: \
Let _U_ be an open subset of $\mathbb{R}^n$ and ${\displaystyle f:U\to \mathbb {R} }$ a function. The partial derivative of _f_ at the point ${\displaystyle \mathbf {a} =(a_{1},\ldots ,a_{n})\in U}$ with respect to the _i_-th variable $x_i$ is defined as

<math>\begin{align}
\frac{\partial }{\partial x_i }f(\mathbf{a}) & = \lim_{h \to 0} \frac{f(a_1, \ldots , a_{i-1}, a_i+h, a_{i+1}, \ldots ,a_n) -
f(a_1, \ldots, a_i, \dots ,a_n)}{h} \\ 
& = \lim_{h \to 0} \frac{f(\mathbf{a}+he_i) -
f(\mathbf{a})}{h} \tag{2.1}
\end{align}</math>

Where $f(\mathbf{a})$ is linear, $f(\mathbf{a}+he_i) = f(\mathbf{a}) + f(he_i) = f(\mathbf{a}) + h f(e_i)$, and we have
$$ \begin{align} \\
\frac{\partial }{\partial x_i }f(\mathbf{a}) &= \lim_{h \to 0} \frac{f(\mathbf{a}+he_i) - f(\mathbf{a})}{h} \\
 & = \lim_{h \to 0} \frac{f(\mathbf{a}) + h f(e_i) - f(\mathbf{a})}{h} \\
 & = \lim_{h \to 0} \frac{h f(e_i)}{h} \\
 & = \lim_{h \to 0} {f(e_i)} \\
 &= f(e_i) \tag{2.2}
\end{align}
$$

### $\partial\mathbf{y} / \partial\mathbf{x}$

How does vector $\mathbf{y}$ vary with vector $\mathbf{x}$, with $M$ held constant? I.e. what is $\partial\mathbf{y}/\partial\mathbf{x}$?

With
$$ %\normalsize
\mathbf{x} = \sum_{j=1}^{n} x_j \mathbf{\hat{x}}_j, \;\;
\mathbf{y} = \sum_{i=1}^{m} y_i \mathbf{\hat{y}}_i
$$

The matrix equation $\mathbf{y} = \mathbf{M} \mathbf{x}$ can be written as
$$ \normalsize
\begin{align}
\mathbf{y} &= \sum_i y_i \mathbf{\hat{y}}_i 
  = \mathbf{M}\mathbf{x}  \tag{2.3} \label{mmul}
\end{align}
$$
where
$$ \normalsize
\begin{align}
y_i &= f_i(x_1, x_2, \dots x_n) \\[6pt]
  &= \sum_j m_{ij}x_j \tag{2.4}
\end{align}
$$

We have
$$ \normalsize
\begin{align}
 \frac{\partial\mathbf{y}}{\partial\mathbf{x}}
 &= \frac{\partial\sum_{i=1}^{m} y_i \mathbf{\hat{y}}_i}{\partial\mathbf{x}} \\[10pt]
 &= \frac{\partial\sum_{i=1}^{m} f_i(x_1, x_2, \dots x_n) \mathbf{\hat{y}}_i}{\partial\mathbf{x}} \\[10pt]
 &= \sum_{i=1}^{m} \frac{\sum_{j=1}^{n} \partial(m_{ij}x_j) \mathbf{\hat{y}}_i}{{\partial x_j} \mathbf{\hat{x}_j}} \\[10pt]
 &= \sum_{i=1}^{m}
     \sum_{j=1}^{n} 
      \frac{\partial(m_{ij}x_j)}
           {\partial x_j} 
        \frac{\mathbf{\hat{y}}_i}{\mathbf{\hat{x}_j}}  \\[10pt]
 &= \sum_{i=1}^{m}
     \sum_{j=1}^{n} m_{ij}
      \frac{\partial x_j}
           {\partial x_j} 
        \frac{\mathbf{\hat{y}}_i}{\mathbf{\hat{x}_j}}  \\[10pt]
 &= \sum_{i=1}^{m}
     \sum_{j=1}^{n} m_{ij}
      \frac{\mathbf{\hat{y}}_i}{\mathbf{\hat{x}_j}}  \\[10pt]
\end{align}
$$

The basis vectors for $\partial\mathbf{y} / \partial\mathbf{x}$ are $\mathbf{\hat{y}}_i / \mathbf{\hat{x}_j}$. We can array the components in a matrix to say \
\
$$ \normalsize
\frac{\partial \mathbf{y}}{\partial \mathbf{x}} =
%\large
\begin{bmatrix}
m_{1,1}\frac{\mathbf{\hat{y}}_1}{\mathbf{\hat{x}_1}} & \cdots &
m_{1,n}\frac{\mathbf{\hat{y}}_1}{\mathbf{\hat{x}_n}} \\
\vdots & \ddots & \vdots \\
m_{m,1}\frac{\mathbf{\hat{y}}_n}{\mathbf{\hat{x}_1}} & \cdots &
m_{m,n}\frac{\mathbf{\hat{y}}_m}{\mathbf{\hat{x}_n}}
\end{bmatrix}
$$

Then
\
$$ \normalsize
\partial \mathbf{y} =
%\large
\begin{bmatrix}
m_{1,1}\frac{\mathbf{\hat{y}}_1}{\mathbf{\hat{x}_1}} & \cdots &
m_{1,n}\frac{\mathbf{\hat{y}}_1}{\mathbf{\hat{x}_n}} \\
\vdots & \ddots & \vdots \\
m_{m,1}\frac{\mathbf{\hat{y}}_n}{\mathbf{\hat{x}_1}} & \cdots &
m_{m,n}\frac{\mathbf{\hat{y}}_m}{\mathbf{\hat{x}_n}}
\end{bmatrix}
\partial \mathbf{x}
$$
and
$$ \normalsize
\begin{align}
\partial \mathbf{x} &=
%\large
\begin{bmatrix}
m_{1,1}\frac{\mathbf{\hat{y}}_1}{\mathbf{\hat{x}_1}} & \cdots &
m_{1,n}\frac{\mathbf{\hat{y}}_1}{\mathbf{\hat{x}_n}} \\
\vdots & \ddots & \vdots \\
m_{m,1}\frac{\mathbf{\hat{y}}_n}{\mathbf{\hat{x}_1}} & \cdots &
m_{m,n}\frac{\mathbf{\hat{y}}_m}{\mathbf{\hat{x}_n}}
\end{bmatrix}^\mathsf{T}
\partial\mathbf{y} \\[10pt]
&=
%\large
\begin{bmatrix}
m_{1,1}\frac{\mathbf{\hat{x}}_1}{\mathbf{\hat{y}_1}} & \cdots &
m_{m,1}\frac{\mathbf{\hat{x}}_1}{\mathbf{\hat{y}_m}} \\
\vdots & \ddots & \vdots \\
m_{1,n}\frac{\mathbf{\hat{x}}_n}{\mathbf{\hat{y}_1}} & \cdots &
m_{m,n}\frac{\mathbf{\hat{x}}_n}{\mathbf{\hat{y}_m}}
\end{bmatrix}
\partial\mathbf{y}
\end{align}
$$

Approximating ([2.1](#mjx-eqn-partial)) numerically with our example:

In [None]:
M, (M@(x + np.array([0.001, 0])) - M@x) / 0.001, (M@(x + np.array([0, 0.001])) - M@x) / 0.001

Test (2.5) numerically:

In [None]:
max(err.dot(err)
    for err in (((M@(x + veps) - M@x) - M@veps)
              for M,x,veps in ((np.random.randn(2,2), np.random.randn(2), np.random.randn(2) * 0.001)
                          for i in range(1000))))

### $\partial\mathbf{y} / \partial\mathbf{M}$

How does vector $\mathbf{y}$ vary with matrix $M$, with vector $\mathbf{x}$ held constant? I.e. what is $\partial\mathbf{y}/\partial\mathbf{M}$?

From (2.3):
$$\begin{align}
 y_i &= \sum_j m_{ij}x_j \\
 \partial y_i &= \sum_j \partial m_{ij}x_j \\
% \frac{\partial y_i}{\partial M_{ij}} &= 2
\end{align}
$$

Then _[explain]_
$$
 \partial\mathbf{y} = \partial\mathbf{M}\mathbf{x} \\
 \frac{\partial\mathbf{y}}{\partial\mathbf{M}} = \mathbf{x}
$$

Numeric demonstration

In [None]:
M, x, M@x

In [None]:
k11 = np.array([[1, 0], [0, 0]])
k12 = np.fliplr(k11)
k21 = np.flipud(k11)
k22 = np.fliplr(k21)
singles = (k11, k12, k21, k22)
singles

In [None]:
[((M+(e*0.001))@x - M@x) / 0.001 for e in singles]

In [None]:
[e@x for e in singles]

Test numerically: Create random vector x and random M and dM matricies. Use an approximation of (2.1) to estimate
$\partial\mathbf{y}/\partial\mathbf{M}$ numerically, and compare to $\partial\mathbf{M}\mathbf{x}$. Find the maximum squared error in a number of random trials.

In [None]:
max(v.dot(v)
    for v in (dM@x - (((M+(dM*0.001))@x - M@x) / 0.001)
              for M,dM,x in ((np.random.randn(2,2), np.random.randn(2,2), np.random.randn(2))
                          for i in range(1000))))

# END
---