## Population Genetics Exercise 3: Modeling Positive Selection and Genetic Drift

### Context
The zygotes (cells resulting from the combination of one egg gamete and one sperm gamete) of one generation are formed from a sub-sampling of gametes from the parental population. This sub-sample inevitably is an imperfect representation of the alleles found in the parental population. **This sampling error results in chance fluctuations in allele frequency across generations, known as genetic drift.**

Genetic drift is one of the five forces that can cause evolutionary change - the loss (where the allele frequency becomes 0) versus fixation (the allele frequency becomes 1) of an allele can be caused by genetic drift.

**Positive selection is a directional evolutionary force, allowing us to predict how likely it is that a beneficial  mutation will eventually become fixed in the population.**

We are modifying our hypothetical example from Exercises 1-2. While we are still considering two different alleles (designated A and a) for one gene, we are now looking at the change in the allele frequency of the selectively **positive/beneficial** allele A (designated as the variable p) and the allele frequency of allele a (designated as the variable q) in a population that experiences genetic drift and positive selection.

If selection is weak compared to genetic drift, the stochasticity of reproduction can play a role in the trajectory an allele takes even when it is common in the population. If selection is sufficiently weak compared to genetic drift, then genetic drift will dominate the dynamics of alleles and they will behave like they’re effectively neutral. 

For a newly arising mutation to eventually get fixed, it first has to escape loss by drift when rare, because even beneficial alleles are susceptible to the stochastic effects of genetic drift when they are very rare or in a small population. Population size holds a fundamental place in understanding evolution because it influences how effective the force of positive selection is actually producing evolutionary change. 

<h3>Running Code Cells</h3>
If you've never used a Jupyter notebook on Google Colab before, here's a quick orientation:

Below are code cells containing Python code below that you will want to run. 

You can run code cells individually in Colab by: 
- clicking on a code cell and hitting the "Run" button to the top right of the cell
- clicking on a code cell and hitting Cmd/Ctrl+Enter/Return

You can run all code cells in this notebook in Colab by:
- clicking on "Runtime" in the top navigation bar and select "Run all"

You can edit code within a code cell by clicking into it and then deleting/typing text

In [None]:
#this code imports several important libraries for our modeling of genetic drift and positive selection
import numpy as np
import matplotlib.pyplot as plt
from collections import defaultdict
from random import random

## Code/Parameters We Invite You To Adjust!

### We encourage you to adjust the numbers for the following 2 variables in the code below:

### 1. **N** (population size; default is 5 individuals) - this adjusts the strength of genetic drift - try out the effects on allele frequency based on small and large population sizes :)

### 2. **replicate_population** (default is 5 populations) - this adjusts how many populations are plotted (how many graphs you output!)


In [None]:
#this code designates 4 important parameters (defined below!)

#try out different values for WHICH ONES????

#keep these variables constant for now! - AGAIN, WHICH ONES???

p = 0.2 #starting allele frequency of the selectively positive allele A  
q = 1 - p #starting allele frequency of the allele a 
s = 0.1 #selection coefficient - NEED MORE EXPOSITION HERE
WAA = 0.1 + 2*s  #fitness of AA homozygotes
WAa = 0.1 + s    #fitness of Aa heterozygotes
Waa = 0.1        #fitness of aa homozgyotes
N = 10 #population size (default is 10 individuals)
ngen = 200 #number of generations, which will correspond to the x-axis in our output graph

#this code defines various functions that we will use to visualize genetic drift and positive selection
def random_genotype(f_A): #f_A -- allele frequency of allele A
    if random() <= f_A:
        sperm_allele = 'A'
    else:
        sperm_allele = 'a'
    
    if random() <= f_A:
        egg_allele = 'A'
    else:
        egg_allele = 'a'
    
    genotype = sperm_allele+egg_allele
    return genotype

def simulate_random_mating(pop_sizes,f_A):
    genotypes = defaultdict(int)
    
    for i in range(pop_sizes):
        curr_genotype = random_genotype(f_A)
        genotypes[curr_genotype]+=1
    
    for g in genotypes.keys():
        genotypes[g] = genotypes[g]/float(pop_sizes)
    
    return genotypes

def simulate_genetic_drift_with_selection(n_generations,pop_sizes,f_A=0.5,
                                          WAA=1.0,WAa=1.0,Waa = 1.0):
    """Return the new frequency of A given Hardy-Weinberg equilibrium + selection"""
    generations = range(n_generations)
    allele_freqs = []
    for generation in generations:
        genotypes = None
        genotypes = simulate_random_mating(pop_sizes,f_A)
        numerator = (genotypes['AA']*WAA+\
                     0.5*genotypes['aA']*WAa+\
                     0.5*genotypes['Aa']*WAa)
        
        denominator = (genotypes['AA']*WAA+\
                       genotypes['Aa']*WAa+\
                       genotypes['aA']*WAa+\
                       genotypes['aa']*Waa)
        f_A = numerator/denominator       
        allele_freqs.append(f_A)
    return list(generations),allele_freqs

## Visualization 
The graph output by the code cell below shows a simulation of how genetic drift and positive selection can act on the allele frequencies of a selectively positive allele A and allele a. 

**THE FOLLOWING NEEDS ADJUSTING BASED ON WHAT WE WANT STUDENTS TO PLAY WITH: You can adjust the strength of genetic drift (by changing N, the population size) and how many populations are plotted (by changing the number of replicate_population)**

In [None]:
#this code sets some formatting for the graph you'll output below!
#set plot resolution
%config InlineBackend.figure_format = 'retina'

#set default figure parameters
plt.rcParams['figure.figsize'] = (10,5) #figure size (length, height) in inches

small_size = 9 
medium_size = 12
large_size = 15 

plt.rc('font', size=medium_size)          # default text sizes
plt.rc('xtick', labelsize=medium_size)    # xtick labels
plt.rc('ytick', labelsize=medium_size)    # ytick labels
plt.rc('legend', fontsize=medium_size)    # legend
plt.rc('axes', titlesize=large_size)      # axes title
plt.rc('axes', labelsize=large_size)      # x and y labels
plt.rc('figure', titlesize=large_size)    # figure title

#this code plots the results from running the simulate_genetic_drift_with_selection function as a graph!
xs,ps = simulate_genetic_drift_with_selection(ngen,N,p, WAA,WAa,Waa) # get the allele frequency of allele A over time
qs = [1.0 - p for p in ps] #get the allele frequency of allele a

#plt.figure(dpi=800)    
plt.plot(xs,ps,'-r',label='f(A) ')
plt.plot(xs,qs,'-b',label='f(a) ')
plt.title("Genetic Drift and Positive Selection Effects on Allele Frequency")
plt.ylabel('Allele Frequency')
plt.xlabel('Generations')
plt.legend()
plt.grid()
