Mushi
==
_All that the rain promises and more..._

A notebook for testing `mushi`'s ability to invert data simulated under the forward model

API documentation can be viewed with
```python
help(mushi.Œ∑)
help(mushi.ùúá)
help(mushi.kSFS)
```

In [1]:
%matplotlib notebook
import mushi
import numpy as np
from matplotlib import pyplot as plt
from scipy.special import expit
import time
import msprime
%cd stdpopsim
from stdpopsim import homo_sapiens
%cd ../

/Users/williamdewitt/Desktop/repos/dement/stdpopsim
/Users/williamdewitt/Desktop/repos/dement


In [2]:
# plt.style.use('dark_background')

### Time grid

In [3]:
t = np.logspace(0, np.log10(3e4), 300)

### Demographic history $\eta(t)$ from the European population in Tennessen et al.

In [4]:
model = homo_sapiens.TennessenTwoPopOutOfAfrica()
dd = msprime.DemographyDebugger(Ne=model.default_population_size,
                                population_configurations=model.population_configurations,
                                demographic_events=model.demographic_events,
                                migration_matrix=model.migration_matrix)
y = 2 * dd.population_size_trajectory(np.concatenate(([0], t)))[:, 1]
Œ∑ = mushi.Œ∑(t, y)

plt.figure(figsize=(3, 3))
Œ∑.plot()
plt.show()

<IPython.core.display.Javascript object>

### Mutation rate history $\mu(t)$
A 96 dimensional history with 3 underlying signatures:
- 10 identical pulses
- 10 identical ramping increases
- 76 identical constant
We will make the mutation rate low, so that the $k$-SFS is noisy and reconstruction of each of the 96 independently is difficult

In [5]:
Z = np.zeros((len(t) + 1, 96))
tt = np.concatenate(([0], t))[:, np.newaxis]
Œº0 = 0.2
flat = Œº0 * (1.1 * np.ones_like(tt))
ramp = Œº0 * (1 + expit(-.01 * (tt - 50)))
pulse = Œº0 * (1 + expit(.01 * (tt - 80)) - 1.1 * expit(.01 * (tt - 600)))
Z[:, :32] = pulse #+ np.random.normal(0, .01, (1, 10))
Z[:, 32:64] = ramp #+ np.random.normal(0, .01, (1, 10))
Z[:, 64:] = flat #+ np.random.normal(0, .01, (1, 96 - 20))
Œº = mushi.Œº(t, Z)

In [6]:
plt.figure(figsize=(3, 3))
Œº.plot((0, 32, 64))
plt.ylim([0, None])
plt.show()

<IPython.core.display.Javascript object>

In [7]:
plt.figure(figsize=(6, 3))
Œº.heatmap()
plt.show()

<IPython.core.display.Javascript object>

### Simulate a $k$-SFS under this history
We'll sample 200 haplotypes

In [8]:
n = 200
sfs = mushi.kSFS(Œ∑, n=n)
sfs.simulate(Œº, seed=1)

plt.figure(figsize=(6, 3))
sfs.heatmap()
plt.show()
# plt.savefig('/Users/williamdewitt/Downloads/sfs.pdf', transparent=True)

<IPython.core.display.Javascript object>

### TMRCA CDF

In [9]:
plt.figure(figsize=(3, 3))
plt.plot(Œ∑.change_points, sfs.tmrca_cdf())
plt.xlabel('$t$')
plt.ylabel('TMRCA CDF')
plt.ylim([0, 1])
plt.xscale('symlog')
plt.tight_layout()
plt.show()

<IPython.core.display.Javascript object>

### Invert the $k$-SFS conditioned on $\eta(t)$ to get $\boldsymbol\mu(t)$
Accelerated proximal gradient descent

In [10]:
# time derivative regularization parameters
Œª_tv = 2e4
Œ±_tv = 0#0.99
# spectral regularization parameters
Œª_r = 1e-1
Œ±_r = .99
Œ≥ = 0.8
max_iter = 10000
tol = 1e-10
bins = None
# bins = np.logspace(0, np.log10(n), 5)
with np.errstate(all='raise'):
    Œº_inferred = sfs.infer_Œº(Œª_tv=Œª_tv, Œ±_tv=Œ±_tv, Œª_r=Œª_r, Œ±_r=Œ±_r, Œ≥=Œ≥, max_iter=max_iter, tol=tol, bins=bins)

relative change in loss function 3.7e-11 is within tolerance 1e-10 after 1214 iterations


The inferred histories for each mutation type superimposed on the 3 underlying signatures

In [11]:
plt.figure(figsize=(6, 3))
plt.step(tt, pulse, c='C0', ls='--', lw=2)
plt.step(tt, ramp, c='C1', ls='--', lw=2)
plt.step(tt, flat, c='C2', ls='--', lw=2)
Œº_inferred.plot(range(32), c='C0', alpha=0.2, lw=2)
Œº_inferred.plot(range(32, 64), c='C1', alpha=0.2, lw=2)
Œº_inferred.plot(range(64, 96), c='C2', alpha=0.2, lw=2)
plt.ylabel('$Œº(t)$')
plt.xscale('log')
plt.tight_layout()
plt.show()
# plt.savefig('/Users/williamdewitt/Downloads/mu.pdf', transparent=True)

<IPython.core.display.Javascript object>

Heatmap of the inferred mutation spectrum history

In [12]:
plt.figure(figsize=(6, 3))
Œº_inferred.heatmap()
plt.show()

<IPython.core.display.Javascript object>

plot $\chi^2$ goodness of fit for each $k$-SFS matrix element, and compute $\chi^2$ goodness of fit test for the $k$-SFS matrix as a whole

In [13]:
plt.figure(figsize=(6, 3))
sfs.heatmap(Œº_inferred)
plt.show()

<IPython.core.display.Javascript object>

œá¬≤ goodness of fit 18403.5890799671, p = 0.9998552784662857


An example column from each of the three signatures

In [14]:
plt.figure(figsize=(9, 3))
for ct, i in enumerate((0, 10, 20), 1):
    plt.subplot(1, 3, ct)
    sfs.plot(i, Œº=Œº_inferred, prf_quantiles=True)
plt.tight_layout()
plt.show()

<IPython.core.display.Javascript object>

### Singular value spectrum of $Z$

In [15]:
plt.figure(figsize=(3, 3))
plt.bar(range(Œº.Z.shape[1]), np.linalg.svd(Œº_inferred.Z, compute_uv=False))
plt.yscale('log')
plt.tight_layout()
plt.show()

<IPython.core.display.Javascript object>