# Modules and Program Structure
## Writing Functions

In [23]:
# here's our first function
def GCcontent(dna):
    # our function is called GCcontent and
    # accepts a single argument called dna;
    # assume that the input is a DNA sequence encoded
    # in a string, and make sure it's all uppercase:
    dna = dna.upper()
    # count the occurrences of each nucleotide
    numG = dna.count("G")
    numC = dna.count("C")
    numA = dna.count("A")
    numT = dna.count("T")
    # finally, calculate (G+C) / (A+T+G+C)
    return (numG+numC) / (numG + numC + numT + numA)

In [None]:
whos

In [24]:
GCcontent("AATTTCCCGGGAAA")
GCcontent("ATGCATGCATGC")

0.5

In [25]:
def print_dictionary(mydic):
    for k, v in mydic.items():
        print("key: ", k, " value: ", str(v))
# return a list with results
# declare default arguments: if no input is provided,
# assume start = 1, end = 10
def squared(start = 1, end = 10):
    # create empty list to catch result of each cycle
    results = []
    for i in range(start, end):
        r = i ** 2
        # append current value to result list
        results.append(r)
    return results

In [26]:
whos

Variable           Type        Data/Info
----------------------------------------
GCcontent          function    <function GCcontent at 0x10a158e60>
a                  int         5
dic                dict        n=3
print_dictionary   function    <function print_dictionary at 0x10a19b9e0>
squared            function    <function squared at 0x10a19b950>


In [None]:
print_dictionary({"a": 3.4, "b": [1,2,3,4], "c": "astring"})

In [27]:
dic = {"a": 3.4, "b": [1,2,3,4], "c": "astring"}
type(dic.items())

dict_items

In [28]:
# specify both start and end
squared(start = 3, end = 10)

[9, 16, 25, 36, 49, 64, 81]

In [29]:
# specify only start, end has default value 10
squared(5)

[25, 36, 49, 64, 81]

In [None]:
# can you work out how you would specify just the end, and have the start be the default value 1?


In [30]:
# and what does the following call of squared() do?
squared()

[1, 4, 9, 16, 25, 36, 49, 64, 81]

## Importing Packages and Modules
The four different ways to load the module `mymodule` in order to access the function `my_function` in that module:
1. `import mymodule`
    access functions in the module via `mymodule.my_function()`
1. `from mymodule import my_function`
    in this way, you import only the function `my_function` contained in module `mymodule` into the current namespace, which means you can directly call that function by `my_function()`
1. `import mymodule as mm`
    same as the first method except we use `mm` as short name for `mymodule`, and now you can shorten the function call by `mm.my_function()`
1. `from mymodule import *`
    this will load all the functions within `mymodule` into the _current name space_, which means you can call all the functions directly by their name. The downside is, this polutes the name space by potentially overwriting existing functions and variables that have the same name.

## Program Structure
To practice coding using a more complex program structure, we will write a simulation dealing with population genetics. The goal is to simulate a population of $N$ monoecious (i.e., hermaphrodites, meaning the same animal is both male and female), diploid (i.e., carrying two homologous copies of each chromosome) organisms. We focus on a particular gene, which has two alternative forms (alleles), $A$ and $a$. Initially, the individuals are assigned a genotype, receiving allele $A$ with probability $p$, and allele $a$ with probability $1-p$. At each generation, the organisms reproduce, and then die (nonoverlapping generations). We also assume that the population size is constant, that there are no mutations or selection and that mating is completely random.
  
We will use this simulation code to explore a population genetic process called genetic drift: for small populations, even alleles that do not bring a fitness advantage can go to fixation (i.e., be present in 100% of the individuals).

To make this task manageable, we will break it down into several smaller tasks, each of which is hopefully easy to handle.

- A function that initializes the population: It should take as input the size of the population ($N$), and the probability of having an $A$ allele ($p$). This function returns an entire population
- A function that computes the genotypic frequencies, which we will need to determine whether an allele has gone to fixation: The function should take a population as input and output the count for each genotype.
- A reproduction function taht takes the current population and produces the next generation.

We also need to choose a data structure for our program. Here we represent a population as a lis of tuples, where each tuple is an individual with its two chromosomes (e.g., `("A", "A")` would be a homozygous individual).

We start by importing the module `SciPy`. This module contains many useful scientific functions. Here we use it only to draw random numbers. Now let's start writing the functions:


In [31]:
import scipy # for random numbers

def build_population(N,p):
    """The population consists of N individuals.
       Each individual has two chromosomes, containing
       allele "A" or "a", with probability p or 1-p, 
       respectively.
       
       The population is a list of tuples.
    """
    population = []
    

ModuleNotFoundError: No module named 'scipy'