In this notebook, we will be taking a different approach to looking at extinction. Here, we'll rescue a group of individuals that are at a critical density and look to see if genetic diversity can be recovered within this population. 

To model this, we'll assign a linear vector of SNPs to each individual. Each element in the vector is 0, 1, or 2 which corresponds to the number of mutant polymorphisms in the genome at that position. For example, if we have a G -> A mutation, we would put 0 for GG, 1 for GA or AG, or 2 for AA. This linear vector is based on a pair of binary vectors of the same length, which determines whether that SNP is the mutant on a specific homolog. The 0, 1, 2 vector is obviously calculated then by element-wise adding the two vectors.

Thus, a population can be modeled as a matrix. The model will randomly generate a matrix or take in an input matrix from the user. What it needs to take in, however, is either a distance array or an LD array. For $n$ SNPs, this means an $n-1$ length vector of either base pair distances or LD values between the SNPs. That is, the 0th element of the linkage array corresponds to the linkage between the 0th and 1st SNPs. 

New individuals are created by choosing two individuals. For each parent, we 50/50 start with one of the two homologs, then pick the next mutant based on the provided distance/linkage disequilibrium. This goes until the end of the vector. The two vectors are then combined to make the new individual.

We assume no other mutations, and that all parents die after each reproduction period, so all new individuals in the next generation are good. We also assume linear growth from provided critical number to provided thriving population number, but a critical assumption here is that one new individual is created per generation. This reflects (often) low reproduction rates with many endangered species, especially larger animals and mammals. 

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
def simulate_genetic_recovery(n_init,
                              n_thriving,
                              num_snps,
                              ld_array = None,
                              dist_array = None,
                              snp_mats = None):
    """
    Simulates a model of genetic recovery, where we start with a population of n_init and add one new individual
    until we reach n_thriving. The new individual's genotype is determined with probabilities based on the 
    provided LD or distance array. 

    Inputs:
    n_init: int, initial population size
    n_thriving: int, target population size
    num_snps: int, number of SNPs
    ld_array: 1D array of length num_snps-1, optional. This array contains the pairwise LD values between the SNPs.
    dist_array: 1D array of length num_snps-1, optional. This array contains the pairwise distances between the SNPs.
        Note that one of ld_array or dist_array must be provided.
    snp_mats: Array of 2D-arrays, optional. The number of 2D arrays must be equal to n_init. 
        Each 2D array must have shape (2, num_snps); each row is binary and contains the genotype of the individual on the paternal
        or maternal chromosome. It doesn't particularly matter which is which, but the first row is assumed to be the paternal chromosome.
        If this is not provided, the function will generate random genotypes for the initial population.

    Returns:
    snp_mats: Array of 2D-arrays. The number of 2D arrays is equal to n_thriving. 
        Each 2D array has shape (2, num_snps); each row is binary and contains the genotype of the individual on the paternal
        or maternal chromosome. It doesn't particularly matter which is which, but the first row is assumed to be the paternal chromosome.
    """

    #first: one of ld_array or dist_array must be provided
    if ld_array is None and dist_array is None:
        raise ValueError("Either ld_array or dist_array must be provided")
    
