# Derivative of traits_i with respect to traits_k

Here I will be testing my solution for
$\frac{ \partial \mathbf{V_{i,t+1}} }{ \partial \mathbf{V_{k,t}} }$
(see below)
by calculating the Jacobian using the `theano` package
and comparing those results to my solution.




## Importing packages and setting options

In [10]:
%env OMP_NUM_THREADS=4
%env THEANO_FLAGS='openmp=True'
import sympy
import theano
theano.config.cxx = ""
import theano.tensor as T
import numpy as np
import pandas as pd
from tqdm import tqdm
import math
pd.options.display.max_columns = 10

env: OMP_NUM_THREADS=4
env: THEANO_FLAGS='openmp=True'


## Equations

__Notes:__

- $*$ is matrix multiplication
- ${}^\text{T}$ represents transpose.
- Run `vignette("model", "sauron")` in R to see more of the model and
  what each parameter means.
  

The equations for (1) traits for species $i$ at time $t+1$ ($\mathbf{V_{i,t+1}}$)
and (2) the partial derivative of species $i$ traits with respect
to species $k$ traits (where $k \ne i$)
are as follows:

\begin{align}
\mathbf{V_{i,t+1}} &= \mathbf{V_{i,t}} + \sigma^2
    \left[
        \left(
            N_k \text{e}^{-d \mathbf{V_{k,t}} * \mathbf{V_{k,t}}^\text{T}} +
            \mathbf{\Xi}
        \right) 2 g ~ \text{e}^{-\mathbf{V_{i,t}} * \mathbf{V_{i,t}}^\text{T}} * \mathbf{V_{i,t}} 
        - 2 ~ f \mathbf{V_{i,t}} \mathbf{C}
    \right] \\
    \mathbf{Z} &= N_i + \sum_{j \ne i, j \ne k}^{n}{ N_j \text{e}^{-d \mathbf{V_j} * \mathbf{V_j}^\text{T}} } \\
    \frac{ \partial \mathbf{\hat{V}_i} }{ \partial \mathbf{V_k} } &= -4 \sigma^2 N_k d g
        \left[
            ( \mathbf{V_k}^\text{T} * \text{e}^{-d \mathbf{V_k} * \mathbf{V_k}^\text{T}} ) *
            ( \text{e}^{ - \mathbf{V_i} * \mathbf{V_i}^\text{T} } * \mathbf{V_i} )
        \right]
\end{align}


## Read CSV of simulated datasets

In [11]:
sims = pd.read_csv("simulated_data.csv")
sims.head()

Unnamed: 0,V1,V2,V3,V4,V5,...,f,g,eta,r0,d
0,5.329784,-0.593159,0.003065,1.414273,-6.458124,...,0.137235,0.104261,0.063997,0.343463,-0.118705
1,-1.514917,-1.024847,5.413096,-4.548136,1.542865,...,0.600063,0.197839,0.103529,0.279827,-0.158496
2,-9.969353,0.930724,2.855755,8.144096,3.640262,...,0.537799,0.202685,-0.088763,0.303346,-0.159742
3,3.821274,-3.732219,-2.680385,-1.586652,-9.75577,...,0.123312,0.117315,-0.08224,0.136664,0.103837
4,3.291826,0.708288,-5.28158,6.224788,-0.271641,...,0.560044,0.054967,0.046302,0.254523,-0.125201


## Functions to compare methods

In [12]:
def automatic(i, k, N, V, d, f, g, eta, s2):
    """Automatic differentiation using theano pkg"""
    Vi = V[i,:]
    Ni = N[i]
    Nk = N[k]
    C = np.zeros((3, 3)) + eta
    np.fill_diagonal(C,1.0)
    CCC = C + C.T
    Z = [np.exp(-d * np.dot(V[j,:], V[j,:].T)) * N[j] 
         for j in range(0, N.size) if j != i and j != k]
    Z = np.sum(Z)
    Vk_ = T.dvector('Vk_')
    Vhat = Vi + s2 * ( T.dot(Ni + Nk * T.exp(-d * T.dot(Vk_, Vk_.T)) + Z, 
                              2 * g * T.dot(T.exp(-1 * T.dot(Vi, Vi.T)), Vi)) -
                       f * T.dot(Vi, CCC) )
    J, updates = theano.scan(lambda i, Vhat, Vk_ : T.grad(Vhat[i], Vk_), 
                         sequences=T.arange(Vhat.shape[0]), non_sequences=[Vhat, Vk_])
    num_fun = theano.function([Vk_], J, updates=updates)
    out_array = num_fun(V[k,:]).T
    return out_array

In [13]:
def symbolic(i, k, N, V, d, g, s2):
    """Symbolic differentiation using my brain"""
    Vi = V[i,:]
    Vi = Vi.reshape((1, 3))
    Vk = V[k,:]
    Vk = Vk.reshape((1, 3))
    Ni = N[i]
    Nk = N[k]
    dVhat = -4 * s2 * Nk * d * g * np.dot(
        np.dot(Vk.T, np.exp(-d * np.dot(Vk, Vk.T))),
        np.dot(np.exp(-1 * np.dot(Vi, Vi.T)), Vi))
    return dVhat

In [14]:
def compare_methods(sim_i, s2 = 0.01, abs = False):
    """Compare answers from symbolic and automatic methods"""
    N = sims.loc[sim_i, [x.startswith("N") for x in sims.columns]].values
    V = sims.loc[sim_i, [x.startswith("V") for x in sims.columns]].values
    n, q = (N.size, int(V.size / N.size))
    V = V.reshape((n, q), order = 'F')
    f = sims.loc[sim_i,"f"]
    g = sims.loc[sim_i,"g"]
    eta = sims.loc[sim_i,"eta"]
    d = sims.loc[sim_i,"d"]
    # r0 = sims.loc[sim_i,"r0"]  # don't need this one now
    diffs = np.empty((math.factorial(n) // math.factorial(n-2), 4))
    j = 0
    for i in range(0, n):
        for k in [x for x in range(0, n) if x != i]:
            num = automatic(i, k, N, V, d, f, g, eta, s2)
            sym = symbolic(i, k, N, V, d, g, s2)
            if abs:
                diff = num - sym
            else:
                diff = (num - sym) / sym
            diff = diff.flatten()
            diffs[j, 0] = i
            diffs[j, 1] = k
            diffs[j, 2] = diff.min()
            diffs[j, 3] = diff.max()
            j += 1
    return diffs

### Example of using `compare_methods`:

In [15]:
diffs = compare_methods(0)
print(diffs[:,2].min())
print(diffs[:,3].max())

-2.4778570445282854e-16
1.1029752737394025e-15


## Comparing methods

This takes ~5-6 minutes.

In [7]:
n_per_rep = math.factorial(4) // math.factorial(4-2)
diffs = np.empty((int(n_per_rep * 100), 4))

In [8]:
for rep in tqdm(range(100)):
    diffs_r = compare_methods(rep)
    diffs[(rep * n_per_rep):((rep+1) * n_per_rep),:] = diffs_r

100%|██████████| 100/100 [05:34<00:00,  3.96s/it]


## The results
They appear to have extremely similar values, similar enough to make me quite comfortable saying that my symbolic solution works.

In [16]:
print(diffs[:,2].min())
print(diffs[:,3].max())

-2.4778570445282854e-16
1.1029752737394025e-15


## Write output to file

To make sure the R version works, too, I'm writing to a CSV file the output from the symbolic version on the 100 datasets.

In [21]:
n = np.sum([x.startswith("N") for x in sims.columns])
q = int(np.sum([x.startswith("V") for x in sims.columns]) / n)
s2 = 0.01
n_perms = math.factorial(n) // math.factorial(n-2)
# Output array
results = np.zeros((100, n_perms * q * q))

for sim_i in range(100):
    
    # Fill info from data frame:
    N = sims.loc[sim_i, [x.startswith("N") for x in sims.columns]].values
    V = sims.loc[sim_i, [x.startswith("V") for x in sims.columns]].values
    V = V.reshape((n, q), order = 'F')
    g = sims.loc[sim_i,"g"]
    d = sims.loc[sim_i,"d"]

    # Fill output array:
    j = 0
    for i in range(0, n):
        for k in [x for x in range(0, n) if x != i]:
            sym = symbolic(i, k, N, V, d, g, s2)
            results[sim_i, (j*q*q):((j+1)*q*q)] = sym.flatten()
            j += 1

# Make sure the last row isn't zeros:
results[99, :]

array([ 4.47658622e-78, -2.90840057e-78, -4.31334666e-78, -9.85921715e-79,
        6.40545080e-79,  9.49969894e-79,  5.83904091e-78, -3.79357597e-78,
       -5.62611918e-78,  6.28814669e-57, -4.08535623e-57, -6.05884824e-57,
       -5.17308193e-57,  3.36090800e-57,  4.98444452e-57,  1.16055772e-56,
       -7.54004627e-57, -1.11823776e-56,  4.68457429e-58, -3.04352867e-58,
       -4.51375040e-58,  3.48253016e-58, -2.26257068e-58, -3.35553904e-58,
       -1.08285380e-57,  7.03521045e-58,  1.04336733e-57,  1.15887929e-86,
       -2.55231152e-87,  1.51158567e-86, -7.52914168e-87,  1.65821542e-87,
       -9.82064549e-87, -1.11662054e-86,  2.45924100e-87, -1.45646542e-86,
       -2.00880707e-73,  4.42418936e-74, -2.62019005e-73,  1.65258924e-73,
       -3.63965651e-74,  2.15555687e-73, -3.70750979e-73,  8.16540604e-74,
       -4.83589510e-73, -1.49653092e-74,  3.29595423e-75, -1.95200200e-74,
       -1.11252672e-74,  2.45022479e-75, -1.45112564e-74,  3.45927740e-74,
       -7.61869992e-75,  

In [22]:
np.savetxt('results/dVi_dVk.csv', results, delimiter=',')