# Step 1: Estimate a starting ρ from the length model by maximum likelihood.

The independent loss length model is a
Markov process known as an M/M/∞ queuing model
[28] (Figure 2A). In this queuing model, customers
(i.e., spacers) arrive according to a Poisson process with
rate λ. They are immediately served and exit after an
exponential waiting time with rate μ. The stationary distribution
of the number of busy servers (i.e., the number
of spacers in the array), is a Poisson distribution with
rate ρ:

ro = $$\rho = \frac{\lambda}{\mu}$$
with $\lambda$ = spacer insertion rate, $\mu$ = spacer_deletion_rate

prob_n_given_ro = $$p(n|\rho) = e^{-\rho} \frac{\rho^n}{n!}$$

## ML of $p(n|\rho)$ is the mean length of arrays

In [1]:
import numpy as np

arrays=[[9,2,3,4,5],[0,1,2,3,7,8],[1,10,11,12,13]]

rho_init= np.mean([len(s) for s in arrays])
rho_init

5.333333333333333

# Step 2: For each pair of spacers with overlap, generate the possible ancestors (doesnt need to actually generate the arrays)

Ancestral arrays can be arbitrarily
    large, but the probability of observing a certain
    length is given by p(n). For practical reasons we do
    not consider ancestors whose length is outside the
    central 99% of the stationary distribution given by ρ
    estimated in step 1, since they would have a
    negligible contribution to the likelihood. In detail, the
    length l1 where the cumulative distribution exceeds
    0.005 is the minimum ancestor length and the length
    l2 where the cumulative distribution exceeds 0.995 is
    the maximum ancestor length. Then the possible
    ancestor lengths n are between l1 and l2: l1 ≤ n ≤ l2.

### 2.1: Find pairs of spacers that overlap

In [2]:
from CRISPR_functions import is_overlapping
import itertools
from itertools import combinations

arrays=[[9,2,3,4,5],[0,1,2,3,7,8],[1,10,11,12,13]]

overlapping_arrays=[pair for pair in list(itertools.combinations(arrays,2)) if is_overlapping(pair[0],pair[1])==1]

overlapping_arrays

[([9, 2, 3, 4, 5], [0, 1, 2, 3, 7, 8]),
 ([0, 1, 2, 3, 7, 8], [1, 10, 11, 12, 13])]

### 2.1: Find ancestor array size limits to exclude arrays outside of the 99% length distribution

In [3]:
from CRISPR_functions import get_limits_ancestor_sizes

min_ancestor_len, max_ancestor_len=get_limits_ancestor_sizes(arrays)


### 2.2: Generate all possible ancestors for a pair within the size limits

In [16]:
from CRISPR_functions import CRISPR_pair

pair=overlapping_arrays[0]
s1=pair[0]
s2=pair[1]
PAIR=CRISPR_pair(s1,s2)

print(PAIR.get_combi.__doc__)
print('\n'.join(['\t'.join(map(str,(k,v[0],v[1]))) for k,v in PAIR.get_combi(min_ancestor_len,max_ancestor_len).items()]))

 The function get_combi outputs a dictionary of each putative ancestor length and 1) a list of all the possible combinations of spacers categories producing this ancestor size and 2) a list of their respective associated adjusted weights. 
        {n:[list([c,i,j,u]),list([ws])]} with:
        n length of ancestral array, 
        c number of spacers in common (spacers necessarily present in ancerstor), 
        i number of ancestral spacers amongst the spacers only present in array1, 
        j number of ancestral spacers amongt these only present in array 2.
        ws associated weight of each combi; probability of having this combi given the ancestral array size (relative abundance on ancestors with this combi for this size)
        l1 (min ancestor length) and l2 (max ancestor length) have to be provided
6	[[6, 0, 0, 0]]	[1.0]
7	[[6, 0, 0, 1], [6, 0, 1, 0], [6, 1, 0, 0]]	[0.7777777777777778, 0.1111111111111111, 0.1111111111111111]
8	[[6, 0, 0, 2], [6, 0, 1, 1], [6, 0, 2, 0], [6, 1

# Step 3: For all pairs with overlap ...

## 3.1 Estimate the times with fixed ρ. 
It is possible to iterate through
the pairs and estimate their times
independently of the other pairs. The
estimation of both times is iterated
alternatingly until the likelihood has
converged.

## 3.2 Estimate ρ with fixed times using L(ρ|t, S).

## 3.3 Check if the log-likelihood of the estimated parameters has converged,
then return the
estimated parameters, else repeat step 3.1
with the new parameters.