# Lecture 04: Combining natural selection with other forces of evolution

### Natural selection and genetic drift

We saw that natural selection and genetic drift can interact in complicated ways. When an allele is very rare (for example, a new mutation), its initial fate is determined almost entirely by drift. Once the allele grows larger than an establishment frequency ($\sim 1/s$ following [Desai and Fisher](https://academic.oup.com/genetics/article/176/3/1759/6062236), but keep in mind that this is a fuzzy boundary rather than a sharp one) its future change is determined more strongly by selection. In particular, alleles for which $2Ns\gg 1$ will grow rapidly.

Here, we'll explore the interaction between drift and selection in simulations. To do this, we'll use the Wright-Fisher model that we considered previously, but adapted to take natural selection into account. To make our analysis simpler, we'll also assume that we're working with a haploid population rather than diploids, allowing us to avoid complications due to dominance.

As before, let's call two types of alleles A and B, with frequencies $p$ and $1-p$. Under the Wright-Fisher model **without selection**, the probability of getting $n$ alleles of type A in the next generation is binomial,

$$ P(n) = \binom{N}{n} p^n \left(1-p\right)^{N-n}\,. $$

Since we're considering haploids, the total number of alleles is equal to the population size $N$ instead of $2N$. 

To account for selection, we simply adjust the probability $p$ that appears in the binomial distribution above. Instead of depending only on the allele frequency, we now weight the probability by fitness. If the fitness of the A allele is $1+s$ and the fitness of the B allele is $1$, then the probability to select an A allele $p_A$ is 

$$ p_A = \frac{p (1 + s)}{\bar{w}} = \frac{p (1 + s)}{p (1 + s) + (1 - p)} \,. $$

With this adjustment, the probability of obtaining $n$ A alleles in the next generation is 

$$ P(n) = \binom{N}{n} p_A^n \left(1-p_A\right)^{N-n}\,. $$

Let's use this expression to simulate the number of A alleles over time in a population. We'll run the simulation multiple times and plot the results to get a sense of the randomness involved in this process.

In [None]:
import numpy as np          # here we import numpy
import numpy.random as rng  # and here we import the random number generation (sub-)library


# Set the starting parameters for the simulations

N   = 100   # population size
s   = 0.01  # selection coefficient
n_A = 1     # starting number of A alleles
n_g = 100   # number of generations of evolution
n_r = 50    # number of replicate simulations
n_lost = 0  # number of simulations in which the A allele is lost
n_t = np.zeros((n_r, n_g))  # a blank matrix of replicates x allele numbers over time

In [None]:
# Next, let's fill in the vector of allele frequencies over time by simulating the WF model

n_lost = 0

for i in range(n_r):

    n_t[i][0] = n_A  # set the starting frequency to p

    for j in range(n_g-1):
        p_A = n_t[i][j] * (1 + s) / (n_t[i][j] * (1 + s) + N - n_t[i][j])  # probability to select an A allele
        n_t[i][j+1] = rng.binomial(N, p_A)  # get As in next generation
        
    if n_t[i][-1]==0:
        n_lost += 1

In [None]:
# Finally, let's make a plot

import seaborn as sns            # import seaborn
import matplotlib.pyplot as plt  # and matplotlib


# If the establishment number is nonnegative and < N, plot it here

n_estab = 1 / s
if n_estab > 0 and n_estab < N:
    plt.hlines(y=n_estab, xmin=0, xmax=n_g, label='establishment', color='black', ls='--')


# Plot individual simulations

for i in range(n_r):
    sns.lineplot(x=np.arange(n_g), y=n_t[i], alpha=0.5)
    

plt.xlabel('Time (generations)')
plt.ylabel('Number of A alleles')
plt.yscale('log')
plt.ylim(1, 1.1*N);


# Let's also write out the fraction of simulations in which A alleles were lost

print('At the final time point, there are no A alleles in %d out of %d simulations.' % (n_lost, n_r))

**Regimes of selection.** Recall that $Ns$ (or $2Ns$ for diploids) is a critical parameter for undertsanding the interaction between selection and drift. 

- If $Ns\ll 1$, then **drift is dominant**.

- If $Ns \sim 1$, then **drift and selection are comparable**, even at high allele frequencies.

- If $Ns \gg 1$, then **selection is dominant** at high allele frequencies.

To reinforce our understanding, let's return to the simulation above. Try different values for $N$ and $s$ that fit within these regimes. Do they behave in the way that you expect? 

In addition, explore simulations for deleterious alleles with $s<0$. Do they appear to follow rules that are similar to those for beneficial alleles?

You may also want to consider how the evolution changes when you change the starting number of alleles.