# Questions for evaluation
## Evolutionary rates. 
Explain why the ratio of non-synonymous to synonymous rates, Ka/Ks, is regarded as an indicator of positive selection. How do you interpret that Ka/Ks is significantly larger in one gene than in another one although it is smaller than one in both genes?
### Response 1.
## Population size. 
Why do we say that effective population size has the role of the inverse of an evolutionary temperature? Out of two different populations, which one do you expect to evolve faster, the smaller of the larger one? Why?
### Response 2.
## Non-negligible mutation rate. 
Which phenomena may occur when there is more than one mutation in the same genome, and there are two or more different mutant individuals in the same population? 
### Response 3.

## Phylogenetic trees. 
Consider the phylogenetic tree of n protein sequences. How many branches are there? How many pairwise distances? How many internal nodes? How many possible rooted and unrooted trees?
### Response 4.
#### Rooted trees.
We consider only binary trees.
The rooted tree of $n$ protein sequences ($n$ leaf nodes) $T_n$ can be defined recursively as the set formed by its internal nodes $R_i$ and leaf nodes $L_j$ as follows:

$$T_2 = L_2 \cup L_1 \cup R_1$$
$$T_{n} = L_n \cup T_{n-1} \cup R_{n-1}$$

*Note: $T_2$ is the first case: the minimal tree has two final nodes and one internal node*

From  this definition we can guess some properties easily:

**Number of internal nodes (cardinalR)**
It seems by construction that it could be *$n-1$*. We will prove it by structural induction over $T$ sets:

$cardinalR(T_2) = 1$, as $T_2$ only has the internal node $R_1$

We suppose now, that $cardinalR(T_{n-1}) = n-2$, what about $T_{n}$?
 
$cardinalR(T_{n}) = cardinalR(L_n \cup T_{n-1} \cup R_{n-1}) = cardinalR(L_n) + cardinalR(T_{n-1}) + cardinalR(R_{n-1}) = 0 + (n - 2) + 1 = n - 1$

So, it's demonstrated.

**Number of branches (countBranches)**
From each of the internal nodes start two branches. So, the number of branches is:
$$countBranches(n) = 2n-2$$

**Number of trees (countTrees)**

Think now on how many ways we have to obtain recursively a tree of $n$ nodes from a tree of $n-1$ nodes.

We can insert a new branch pairing over an existing branch. Taken into account that is irrelevant insert the branch to the right or on the left of any existing branch, we have $countBranches(n-1)$ ways to do that operation. Also we have another possibility that is generate a new branch past the root of the tree. 

So we have $countBranches(n-1) + 1$ possibilities of generate a $n$ tree given a $n-1$ tree and then:

$countTrees(n) = (countBranches(n-1) + 1) * countTrees(n-1) = ( (2(n-1) - 2)+ 1 ) * countTrees(n-1)$

so

$countTrees(n) = (2n-3) * countTrees(n-1)$, for $n > 2$

and

$countTrees(2) = 1$, trivially, because it's irrelevant the order of the two leaf nodes.

and recursively applying the formula:

$$ countTrees(n) = 1*3*5*...*(2n-3)$$


**Number of pairwise distances**
It's equal to the number of different pairs we can form with the leaf nodes.

$$\binom{N}{2} = n(n-1)/2$$

#### Unrooted trees.
We consider only unrooted trees as the derived from a star topology like the target topology used in the neighbor join algorithm, that is, n leaf nodes and one internal node.
So each of the internal nodes have three branches starting from it.

The construction of such unrooted three is as follows. We take a pair of leaf nodes out from the star topology creating one internal node with three branches, one for every member of the pair of extracted nodes and other ending on the internal node of the star topology. This internal node is used as a leaf node in the remaining star topology, now a star with $n-1$ leaf nodes and the same internal node. 

This process can be done recursively until finishing with a tree with one internal node, three leaf nodes and three branches, that is the minimal possible unrooted tree. The case with two branches is a rooted tree.

**Number of branches**
By construction, for a tree with n leaf nodes, we have

$countBranches(n) = 2 + countBranches(n-1)$

and

$countBranches(3) = 3$

$countBranches(4) = 2 + countBranches(3) = 5$

$countBranches(5) = 2 + countBranches(4) = 7$

and so on...

It seems that this should be the generic formulation:
$$countBranches(n) = 2n -3$$

Effectively, by induction over n:

$countBranches(n) = 2 + countBranches(n-1) = 2 + (2*(n-1) - 3) = 2 + (2n - 2 - 3) = 2n -3$

And for the first case:

$countBranches(3) = 2*3 -3 = 6 - 3 = 3$

**Number of internal nodes**
The approach is very similar:
By construction, for a tree with n leaf nodes, we have

$countNodes(n) = 1 + countNodes(n-1)$

and

$countNodes(3) = 1$

so

$countNodes(4) = 1 + countNodes(3) = 2$

$countNodes(5) = 1 + countNodes(4) = 3$

ans so on...

It seems that this should be the generic formulation:
$$countNodes(n) = n - 2$$

Effectively, by induction over n:
$countNodes(n) = 1 + countNodes(n-1) = 1 + ((n-1) - 2) = 1 + (n - 3) = n -2$

And for the first case
$countNodes(3) = 3 - 2 = 1$

**Number of trees (countTrees)**
Think now on how many ways we have to obtain recursively a tree of $n$ nodes from a tree of $n-1$ nodes.

We can insert a new branch pairing over an existing branch. Taken into account that is irrelevant insert the branch to the right or on the left of any existing branch, we have $countBranches(n-1)$ ways to do that operation. 

So we have $countBranches(n-1)$ possibilities of generate a $n$ tree given a $n-1$ tree and then:

$countTrees(n) = (countBranches(n-1)) * countTrees(n-1) = (2(n-1) - 3) * countTrees(n-1)$

so

$countTrees(n) = (2n-5) * countTrees(n-1)$, for $n > 3$

and

$countTrees(3) = 1$, trivially, because it's irrelevant the order of the three leaf nodes.

and recursively applying the formula:

$$ countTrees(n) = 1*3*5*...*(2n-5)$$


**Number of pairwise distances**
It's equal to the number of different pairs we can form with the leaf nodes.

$$\binom{n}{2} = n(n-1)/2$$

## Additive distances. 
Sequence distances are said to be additive when the distance between two sequences is the sum of the length of the branches that connect them. If there are n species, how many equations express the additivity conditions? Are the free parameters (branch lengths) underdetermined or overdetermined? What is the number of sequences for which the number of equations is the same as the number of parameters? For this number of sequences, which condition expresses the molecular clock hypothesis?
### Response 5.