## Population Genetics Exercise 2: Modeling Genetic Drift Caused By Bottleneck

### Context
The zygotes (cells resulting from the combination of one egg gamete and one sperm gamete) of one generation are formed from a sub-sampling of gametes from the parental population. This sub-sample inevitably is an imperfect representation of the alleles found in the parental population. **This sampling error results in chance fluctuations in allele frequency across generations, known as genetic drift.**

Genetic drift is one of the five forces that can cause evolutionary change - the loss (where the allele frequency becomes 0) versus fixation (the allele frequency becomes 1) of an allele can be caused by genetic drift. 

Continuing our hypothetical example from Exercise 1, we are considering two different alleles (designated A and a) for one gene. The A allele encodes a dominant phenotype of interest, while the a allele encodes a recessive phenotype of interest. We are looking at the change in the allele frequency of the ***selectively neutral* allele A** (designated as the variable p) in a population that undergoes a **bottleneck event (where the population size is dramatically reduced)**.

**Reminder:** selectively neutral here means there is no selective advantage or disadvantage to an organism with the allele A.

<h3>Running Code Cells</h3>
If you've never used a Jupyter notebook on Google Colab before, here's a quick orientation:

Below are code cells containing Python code below that you will want to run. 

You can run code cells individually in Colab by: 
- clicking on a code cell and hitting the "Run" button (depicted as the "play" arrow icon) to the top left of the cell
- clicking on a code cell and hitting Cmd/Ctrl+Enter/Return

You can run all code cells in this notebook in Colab by:
- clicking on "Runtime" in the top navigation bar and select "Run all"

You can edit code within a code cell by clicking into it and then deleting/typing text

### Run the following code cell below to import the libraries needed to run the simulation!

In [None]:
#this code imports several important libraries for our modeling of genetic drift caused by a bottleneck
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

# Code/Parameters We Invite You To Adjust!

### We encourage you to adjust the numbers for the following 2 variables in the code below:

1. **N** (default 500 individuals) - this indicates the effective population size. Try adjusting this value to any number between 500 and 2000 individuals.

2. **bottleneck_size** (default 100 individuals) - this indicates the size the population is reduced to during the bottleneck event. Try adjusting this value to any number between 5 and 100 individuals. 

You can adjust the strength of genetic drift caused by a bottleneck event (by changing N, the initial population size, as well as bottleneck_size, the size the population is reduced to during the bottleneck event).

### Each time you would like to run a new simulation, change the values of the variable(s) of interest and run the code cell below, and then rerun the following code cell outputting the graphs (in the next section) to visualize the effects of the changing parameters. 

In [None]:
#this code designates 7 important parameters (defined below!)

#try out different values for N and bottleneck_size!
N = 500  # Initial effective population size (default is 500; try adjusting!)
bottleneck_size = 100  #size of population during the bottleneck (default is 100; try adjusting!)

#keep these variables constant for now!
p_init = 0.2 #starting allele frequency of the selectively neutral allele A 
replicates = 10 #number of populations simulated 
ngen = 200 # number of generations simulated 
bottleneck_generations_start = 40 #the generation when the botteneck event starts  
bottleneck_generations_end = 50 #the generation when the botteneck event ends

#this code defines a function (drift_sim_with_bottleneck) that we will use to visualize genetic drift with a bottleneck
def drift_sim_with_bottleneck(N, p, ngen, bottleneck_generations_start,bottleneck_generaitons_end, bottleneck_size):
    # initialize p
    p_init = p
    pvec = [p]
    for gen in range(ngen):
        # Apply bottleneck if within bottleneck generations
        if (gen > bottleneck_generations_start) & (gen < bottleneck_generations_end):
            N_bottleneck = bottleneck_size
        else:
            N_bottleneck = N
        
        # Genetic drift within the population
        pA = np.random.binomial(2*N_bottleneck, p)
        p = pA / (2*N_bottleneck)
        pvec.append(p)
    
    return pvec

## Visualization 

The graph output by the code cell below shows simulations (each line represents a replicate population) of how a bottleneck event can act on the allele frequency of a selectively neutral allele over 200 generations.  

The bottleneck event (where population size abruptly shrinks in generations 40-50) is indicated by a red rectangle. 

### You do not need to alter any of the code within this block, just click the “Run” button to view your graph. You can save the output graph from a run by opening the image in a new tab. 

In [None]:
#this code sets some formatting for the graph you'll output below!
#set plot resolution
%config InlineBackend.figure_format = 'retina'

#set default figure parameters
plt.rcParams['figure.figsize'] = (10,5) #figure size (length, height) in inches

small_size = 9 
medium_size = 12
large_size = 15 

plt.rc('font', size=medium_size)          # default text sizes
plt.rc('xtick', labelsize=medium_size)    # xtick labels
plt.rc('ytick', labelsize=medium_size)    # ytick labels
plt.rc('legend', fontsize=medium_size)    # legend
plt.rc('axes', titlesize=large_size)      # axes title
plt.rc('axes', labelsize=large_size)      # x and y labels
plt.rc('figure', titlesize=large_size)    # figure title

#this code plots the results from running the drift_sim_with_bottleneck function as a graph!
for i in range(replicates):
    plt.plot(np.linspace(0, ngen, ngen+1), drift_sim_with_bottleneck(N, p_init, ngen, bottleneck_generations_start,bottleneck_generations_end, bottleneck_size)) 

rect = Rectangle((bottleneck_generations_start, 0), bottleneck_generations_end - bottleneck_generations_start, 1, linewidth=1, edgecolor='black', facecolor='r', alpha=0.3)
plt.gca().add_patch(rect)
plt.title("Genetic Drift By Bottleneck")
plt.xlabel("Generations (ngen)")
plt.ylabel("Allele frequency of A (p)")
plt.grid()
plt.show()