# Calculating Expected Offspring

## The Need for Averages

Averages arise everywhere. In sports, we want to project the average number of games that a team is expected to win; in gambling, we want to project the average losses incurred playing blackjack; in business, companies want to calculate their average expected sales for the next quarter.
Molecular biology is not immune from the need for averages. Researchers need to predict the expected number of antibiotic-resistant pathogenic bacteria in a future outbreak, estimate the predicted number of locations in the genome that will match a given motif, and study the distribution of alleles throughout an evolving population. In this problem, we will begin discussing the third issue; first, we need to have a better understanding of what it means to average a random process.

## Problem

For a random variable X taking integer values between 1 and n, the expected value of X is E(X)= ∑(n to k=1) k×Pr(X=k).
The expected value offers us a way of taking the long-term average of a a random variable over a large number of trails.

As a motivating example, let X be the number on a six-sided die. Over a large number of rolls, we should expect to obtain an average of 3.5 on the die (even though it's not possible to roll a 3.5). The formula for expected value confirms that E(X)= ∑(6 to k=1) k×Pr(X=k) = 3.5.

More generally, a random variable for which every one of a number of equally spaced outcomes has the same prbability is called a uniform random variable (in the die example, this "equal spacing" is equal to 1). We can generalise our die example to find that if X is a uniform random variable with minimum possible value 'a' and maximum possible value 'b', then E(X) = (a+b)/2.

### Given

Six non-negative integers, each of which does not exceed 20,000. The integers correspond to the number of couples in a population possessing each geneotype pairing for a given factor. In order, the six given integers represent the number of couples having the following genotypes:

1. AA-AA
2. AA-Aa
3. AA-aa
4. Aa-Aa
5. Aa-aa
6. aa-aa

### Return

The expected number of offspring displaying the dominant phenotype in the next generation, under the assumption that every couple has exactly two offspring.

## Sample Dataset

> 1 0 0 1 0 1

## Sample Output

> 3.5


# Definitions

**Random Variable** A function that associates a real number with an event. 
Random variables are typically expressed using capital letters. If we represent a toin toss event as x and the Random Variable as T, then the Random Variable would be represented as:

> T(x) = 1 If x is tails, 0 If x is heads.

We can then construct an equation for the question "What is the probablity of getting a tails?". Well a tails is when T = 1 so:

> P(T = 1) = 1/2

Think of 'T=1' as an if statement. If this is true for the Random Variable get the related probability.

**Expected Value** is the return you can expect for some kind of action. How many times would I get a 5 or 6 if I rolled 14d6? As you have two outcomes (1-4 or 5-6) this type of expected value is called an *expected value for a binomial variable*. This is the type of expected value for our offspring. They either have the dominant allele or don't.  It is binomial because there are only two possible outcomes.

The expected value formula is the probability of the event multiplied by the amount of times the event happpens. For our dice rolls:

> A(x) = 1 for roll of 5 or 6, 0 for 1-4

> P(x) * n = P(A = 1) * 14 = 1/3 * 14 = 4 2/3 

So if you roll 14d6 you can expect to get a 5 or a 6 four to five times.

# Workings

1. Work out the probablity of getting a offspring with the dominant phenotype for each couple

Calculate the probabilities from the Punnet Squares for each couple.

![punnetTables](punnet.png)

* 4/4 - 1, 2, 3
* 3/4 - 4
* 2/4 - 5
* 0/4 - 6


2. Use the expected value equation with the probablities

The function should take two arrays; the population integers, and the probablities. As we have only one model, we will have fixed probabilities so the probabilities can be encoded into the function and we only have to pass the population array.

In [1]:
# probabilities
props = [1.0, 1.0, 1.0, 0.75, 0.5, 0]
# test populations
pop = [1, 0, 0, 1, 0, 1]

In [18]:
# sum the product of the two arrays and the number of offspring
def iev(pop):
    props = [1.0, 1.0, 1.0, 0.75, 0.5, 0]
    offs = 0
    for i in range(0, len(pop)):
        offs = offs + (2 * props[i] * pop[i])
    offs = offs
    print(offs)

In [16]:
iev(pop)

3.5


In [20]:
iev([19186, 18240, 19574, 16357, 17541, 19329])

156076.5
