# Lecture 03: Introduction to modeling natural selection

### Modeling natural selection in a real population

In a famous early quantitative example of natural selection, [Fisher and Ford](https://www.nature.com/articles/hdy194711) examined the behavior of the _medionigra_ allele in moths near Oxford, England. 

Below, we'll compare predictions of genotype/allele frequency change with data to try to estimate the type and strength of selection that may have been acting on this allele at the time. The _medionigra_ allele is rare enough that we can assume that essentially all moths with this allele are heterozygotes. The moths reproduce once per year.

As a reminder, the expected change in the frequency of a genotype $G$ in one generation due to natural selection alone is

$$\Delta p_G = \frac{w_G - \bar{w}}{\bar{w}}\,,\qquad \bar{w} = \sum_G p_G\,w_G\,.$$

This expression can also be used to estimate the change in allele frequencies by noting how allele frequencies are related to genotype frequencies.

First, let's start by plotting the _medionigra_ allele frequency over time. The code cell below records the measured allele frequency (estimated from captured moths) over the course of several years and displays this in a plot.

In [None]:
# Import libraries for plotting/data

import seaborn as sns            # import seaborn
import matplotlib.pyplot as plt  # and matplotlib

import numpy as np

# medionigra allele frequency

year = np.arange(1939, 1961, 1)
freq = [0.09, 0.11, 0.068, 0.052, 0.056, 0.042, 0.062, 0.05, 0.04, 0.038, 0.03, 0.04, 0.028, 0.039, 0.028, 0.03,
        0.01, 0.03, 0.042, 0.04, 0.022, 0.02]

sns.lineplot(x=year, y=freq, label='data')
plt.xlabel('Year')
plt.ylabel('Allele frequency')
plt.ylim(0, 0.12);

Now, let's simulate how the _medionigra_ allele frequency should change over time, assuming that all of these alleles are found in heterozygotes. Here we can make our calculations a bit simpler by remembering that, ignoring genetic drift, if the current frequency of genotype $G$ is $p_G$ and its fitness is $w_G$, then its frequency after one round of selection will be

$$ p^\prime_G = \frac{w_G}{\bar{w}}p_G\,.$$

If the post-selection heterozygote frequency is small ($p^\prime_G$ here), then the heterozygote frequency in newborns will also be approximately the same as this, and we can skip the random mating step with little error.

We can then plot the allele frequency over time using different choices for the heterozygote fitness to see what kinds of values match best with the data.

In [None]:
# Let's write a function to perform the simulation
# We can arbitrarily set the wild-type (WT) fitness to 1
# Then, given the current heterozygote frequency and its fitness, we compute the frequency in the next generation

def evolve(p, w):
    w_bar = FILL IN # compute the average fitness
    return w*p/w_bar

In [None]:
# Make an initial choice for the heterozygote fitness and frequency

p = 0.2  # heterozygote frequency, NOTE that allele frequency is HALF of this
w = 1  # heterozygote fitness, relative to WT w = 1


# Simulate evolution over the time range above

freq_sim = [p/2]  # simulated allele frequency over time

for i in range(len(year)-1):
    p = evolve(p, w)
    freq_sim.append(p/2)
    
    
# Plot the result

sns.lineplot(x=year, y=freq, label='data')
sns.lineplot(x=year, y=freq_sim, label='simulation, w=%.2f' % w)
plt.xlabel('Year')
plt.ylabel('Allele frequency')
plt.ylim(0, 0.12);

**Discussion.** Ultimately, you likely found a value of the heterozygote fitness that fits pretty well with the data. This was a simple exercise, but if this was a real experiment, what additional factors would you consider? Do you think that our analysis above allows us to make a reasonable guess at the fitness advantage or disadvantage of the _medionigra_ allele?