<a href="https://colab.research.google.com/github/seantibor/uorganisms/blob/master/Generations_Testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python Simple Genetics Simulator
## by Kelly Paredes and Sean Tibor

This notebook is a simple genetics simulator written in Python. It is used to explore how population size and reproduction affect the size and outcome of the population over time. 

Transmission of genes from one generation to the next, (or the passing of genes from parent to offspring) is affected by where a species lives. Sometimes animals die from disease, predators or loss of habitat, other times animals will migrate.  Either way, the size of a population is a very important part when looking at (analyzing) genetics.

Random changes in populations sometimes will cause a genetic drift. 

**Answer these questions:**
1. What is [genetic drift](https://kids.kiddle.co/Genetic_drift)?
2. What does genetic drift often cause?
3. How does population size affect genetic drift?

Hint: You can answer the questions in this section by double-clicking inside this box. You can type your answers below the questions in the editor on the left side.


1.



# Setting Up Our Simulation

The notebook you're using right now is a live simulation that will run Python code for you to explore genetic variation. You'll see two types of content on this page: code blocks and text blocks. Code blocks can be run to produce an output. Text blocks can't be run.

### The first thing we need to do is set up our simulation environment with some necessary code. 

Click on the block below to activate it. Then press the play button next to the code block below to get it to run. While the code is running, you'll see the [*] symbol show up next to your code. Once the code block is finished running, the code symbol will change to something like [3]. 

You may see some text output in the area below the block. You can ignore this output for now. In other blocks, this output will show us the results of our processing.

In [0]:
!git init
!git remote add origin https://github.com/seantibor/uorganisms
!git pull origin master
  
import sys
sys.path.append('lib')
import generations
from organisms import Organism
import pandas as pd
import matplotlib.pyplot as plt


# Changing the Code

### This code has a few variables that can be changed by you. It is important that as a scientist, you only change ONE variable at a time. The variables are defined here:


* INITIAL_POP: The initial population size of organisms. This is the number of species living together in an area. This is a mix of both male and female species. Each species/organism in this population have traits that are chosen at random.

* GENERATIONS: The is the number of generations that the population will reproduce. For example, from your great granparents to you, that is 4 generations.  You can simulate as many times as you want, but the more generations you have, the longer the code will take to run. You can edit the number between 10- 500. Each organism will reproduce with only one organism of the opposite gender in its generation and will produce a random "family_size".

* POPULATION_CAP: This is the maximum number of organisms in a generation. Sometimes populations are controlled in real life because an animal predator may no longer exist, there are "too" many organisms and not enough habitat or food or humans may want to just manage the numbers. Therefore, in this simulation any extra organisms will be culled (killed off) once the population cap is reached. This also has the practical benefit of keeping our simulation from taking too long to run.

---
### These next two variables should be changed with caution! 
These numbers affect statistics. Currently the family size is set to an average number with a standard deviation. For humans that average family time changes over time. Currently [as of 2014, mothers in the US have an average of 2.4 children](http://www.pewsocialtrends.org/2015/05/07/family-size-among-mothers/), compared to this simulation's default reproduction rate of 2.05.

* FAMILY_SIZE: The [mean number](https://www.robertniles.com/stats/mean.shtml) of offspring from a reproductive pair.
* FAMILY_STDEV: The [standard deviation](https://www.robertniles.com/stats/stdev.shtml), is a number used to quantify a variation of family size.


In [0]:
INITIAL_POP = 1000
GENERATIONS = 500
POPULATION_CAP = 2000
FAMILY_SIZE = 2.05
FAMILY_STDEV = 0.9
POPULATIONS_CONTROL = None
print('Parameters established.')

# Running the Simulation

The section below will show the simulation of each generation. You'll see the a progress bar in blue while the simulation is running. When the simulation is finished, the bar will turn green.

## How it works
The algorithm in the computer program is written to randomly choose a male and female organism for reproduction. The number of offspring from this reproduction is a not the same each time. It is a normalized number, which means that it has been calculated with a variant number of offspring using the FAMILY_SIZE and FAMILY_STDEV. Therefore each family will have a different number of individuals.

In addition, if there aren't an equal number of male and female organisms, or the total number of new organisms exceeds the population_cap, the remaining organisms are not allowed to reproduce.

For example, if there are 513 males and 487 females in the initial population, there will only be 487 reproductive pairs and 26 males will not be allowed to reproduce.

**Note: If your progress bar turns red and stops, that means that your population crashed! Move on to the next section to see what happened.**

In [0]:
generations.test_run(INITIAL_POP, GENERATIONS, POPULATION_CAP, FAMILY_SIZE, FAMILY_STDEV)

# Generating Trend Charts
## Total Population Size with Color Genotype
First, we will chart the total population size with a breakdown of the number of organisms with each color gene pair. Recall the difference between BB, Bb, and bb.

**Answer these questions:**

1. What does each of these genotypes represent?
2. What do you notice about the percentages of each genotype? How does this compare to the percentages from a punnett square? Explain your answer.
3. Why do you think that the percentages and numbers are not always consistent? Explain what you think can cause the changes between the numbers of BB, Bb, and bb over time.

In [0]:
overall_stats = generations.get_overall_stats()

# create a pandas dataframe from the overall statistics
df = pd.DataFrame.from_dict(overall_stats)
df.set_index('gen')

# plot the color genes
ax = df.plot(y=['total_pop', 'BB', 'Bb', 'bb'])
ax.set_ylabel('Organisms')
ax.set_xlabel('Generations')
plt.show()

## Phenotype Percentages 

The chart below shows the percentages of the phenotype (Blue or Yellow) produced after "X" generations of reproduction and a capped population.

**Answer these questions:**

1.   What is the phenotype percentage? How does this compare to what you know from punnett squares? Explain your answer.
2.    Do you notice any differences of the phenotype in the graph? Why or why not do you think this occured?
3. Why would phenotypes change in nature, naturally in humans or in animals?




In [0]:
# plot the phenotype percentages by generation
df['BluePct'] = df['blue'] / df['total_pop'] * 100
df['YellowPct'] = df['yellow'] / df['total_pop'] * 100

ax = df.plot(y=['BluePct', 'YellowPct'])
ax.set_ylabel('% of Total Population')
ax.set_xlabel('Generations')
plt.show()

## Males and Females per Generation
The chart below shows the total population and the comparison of males to females in the population.

**Answer the questions:**


1.   Is there a change in the number of males to females? Why or why not? 
2.   Look at the two charts from the [World Bank](https://data.worldbank.org/indicator/SP.POP.TOTL.MA.IN) You can see the total number of males and total numbers of females. What do you notice about this data compared to the data in these charts?




In [0]:
ax = df.plot(y=['total_pop', 'male', 'female'])
ax.set_ylabel('Organisms')
ax.set_xlabel('Generations')
plt.show()

# Running the Simulation Again

What happens if you run this simulation again? Will you get the same result?

Go back to the [Running the Simulation](#Running-the-Simulation) section and run the same simulation again with the same parameters. Do you get the same results?

**If not, why?**


What happens when you change the parameters in the [Changing the Code](#Changing-the-Code) section?