Regardless of how many reversals separate the human and mouse X chromosomes,
reversals must be rare genomic events. Indeed, genome rearrangements typically
cause the death or sterility of the mutated organism, thus preventing it from passing
the rearrangement on to the next generation. However, a tiny fraction of genome
rearrangements may have a positive effect on survival and propagate through a species
as the result of natural selection. When a population becomes isolated from the rest of
its species for long enough, rearrangements can even create a new species

Geology provides a thought-provoking analogy for thinking about genome evolution.
You might like to think of genome rearrangements as “genomic earthquakes” that
dramatically change the chromosomal architecture of an organism. Genome rearrangements
contrast with much more frequent point mutations, which work slowly and are
analogous to “genomic erosion”.

A fundamental question in chromosome evolution studies
is whether the __breakage points of reversals__ (i.e., the ends of the inverted intervals)
occur along “fault lines” called __rearrangement hotspots__. If such hotspots exist in the
human genome, we want to locate them and determine how they might relate to genetic
disorders, which are often attributable to rearrangements.

endpoints of reach reversal = vertical segments
regions affected by multiple reversals are indicated by multiple vertical segments in the human X chromosome

In 1973, Susumu Ohno proposed the __Random Breakage Model__ of chromosome evolution.
This hypothesis states that the breakage points of rearrangements are selected
randomly, implying that rearrangement hotspots in mammalian genomes do not exist.

__finding the minimum number of reversals that could transform the mouse X chromosome into the human X chromosome. From a biological perspective, why do we want to do this?__

We ask for the minimum number of reversals in accordance with a principle called
__Occam’s razor__. When presented with some quandary, we should explain it using the
simplest hypothesis that is consistent with what we already know. In this case, it seems
most reasonable that evolution would take the “shortest path” between two species,
i.e., the most parsimonious evolutionary scenario. Evolution may not always take the
shortest path, but even when it does not, the number of steps in the true evolutionary
scenario often comes close to the number of steps in the most parsimonious scenario.
How, then, can we find the length of this shortest path?

Genome rearrangement studies typically ignore the lengths of synteny blocks and
represent chromosomes by __signed permutations__. Each block is labeled by a number,
which is assigned a positive/negative sign depending on the block’s direction. 

The number of elements in a signed permutation is its __length__.

## A greedy heuristic for sorting by reversals

Let’s see if we can design a greedy heuristic to approximate drev(P). The simplest idea
is to perform reversals that fix +1 in the first position, followed by reversals that fix
+2 in the second position, and so on. For example, element 1 is already in the correct
position and has the correct sign (+) in the mouse X chromosome, but element 2 is not
in the correct position. We can keep element 1 fixed and move element 2 to the correct
position by applying a single reversal.

(+1 __−7 +6 −10 +9 −8 +2__  11  3 +5 +4)

(+1  __2 +8  9 +10  6 +7__  11  3 +5 +4)

One more reversal flips element 2 around so that it has the correct sign:

(+1 __−2__ +8  9 +10  6 +7  11  3 +5 +4)

(+1 __+2__ +8  9 +10  6 +7  11  3 +5 +4)

We say that
element k in permutation $P = (p_1, \dots, p_n)$ is sorted if $p_k = +k$ and unsorted otherwise.

We call P __k-sorted__ if its first $k-1$ elements are sorted, but if element k is unsorted. 

For every $(k-1)$-sorted permutation P, there exists a single reversal, called the k-sorting
reversal, that fixes the first $k-1$ elements of P and moves element k to the k-th position.

In the case when -k is already in the k-th position of P, the k-sorting reversal merely flips -k around.

In [88]:
import copy
# 6a Greedy Sorting

def checkSorted(P):
    for i in range(1, len(P)+1):
        if P[i] != '+' + str(i):
            return False
    return True

def reverse(element):
    if element[0] == '-':
        element.replace('-', '+')
    else:
        element.replace('+', '-')
    
    return element

# need to revamp this code to ensure it works
def k_sorting(P, k):
    P_dict = copy.deepcopy(P)
    if P[k][1:] == 'k' and P[k][0] == '-':
        P_dict[k][0] = '+'
    else:
        index_reverse = 0
        for j in range(k, len(P)+1):
            print(j, P[j][1:])
            if P[j][1:] == str(k):
                index_reverse = j
                break
                
        print('index',j)
        for i in range(k, index_reverse+1):
            print(i, P[i], ';', index_reverse-i+1, P[index_reverse-i+1])
            P_dict[i] = reverse(P[index_reverse-i+1])
    return P_dict
            

def kSortingReversal(P, k):
    j = k
    while P[j] != k+1 and P[j] != -(k+1):
        j += 1
    P[k:j+1] = list(map(lambda x: -x, P[k:j+1][::-1]))
    
    return P

def GreedySorting(P):
    approxReversalDist = 0
    reversals = []
    for k in range(len(P)):
        while P[k] != k+1:
            P = kSortingReversal(P, k)
            reversals.append(list(P))
            approxReversalDist += 1
    return reversals, approxReversalDist
    

In [83]:
a = '(-3 +4 +1 +5 -2)'
a_list = a.replace(')', '').replace('(', '').split(' ')
a_list

['-3', '+4', '+1', '+5', '-2']

In [90]:
reversals, dist = GreedySorting([int(x) for x in a_list])
for reversal in reversals:
    print("(" + " ".join(["+" + str(x) if x > 0 else str(x) for x in reversal]) + ")")

(-1 -4 +3 +5 -2)
(+1 -4 +3 +5 -2)
(+1 +2 -5 -3 +4)
(+1 +2 +3 +5 +4)
(+1 +2 +3 -4 -5)
(+1 +2 +3 +4 -5)
(+1 +2 +3 +4 +5)


In [99]:
with open('rosalind_ba6a.txt', 'r') as reader:
    permutation = reader.readline().replace(')', '').replace('(', '').split(' ')

permutation = [int(x) for x in permutation]
reversals, dist = GreedySorting(permutation)

res_6a = open('6a.txt','w')
for reversal in reversals:
    res_6a.write("(" + " ".join(["+" + str(x) if x > 0 else str(x) for x in reversal]) + ")" + '\n')
res_6a.close()

The pairs (+12 +13) and (-11-10) have something in common; the second element is equal to the first element plus 1. We therefore say that consecutive elements $(p_i p_{i+1})$ in permutation $P = (p_1 . . . p_n)$
form an adjacency if $p_{i+1}-p_i = 1$. By definition, for any positive integer $k < n$,
both $(k    k + 1)$ and $(-(k + 1) -k)$ are __adjacencies__. 

If $p_{i+1}- p_i \neq 1$, then we say that $(p_i p_{i+1})$ is a __breakpoint__.

$Adjacencies(P) + Breakpoints(P) = n+1$

In [113]:
# 6b no of breakpoints in a permutation

p = '(+3 +4 +5 -12 -8 -7 -6 +1 +2 +10 +9 -11 +13 +14)'
p = p.replace('(', '').replace(')', '').split(' ')
p = [int(x) for x in p]
def BreakPoints(P):
    P = [0] + P + [max(P) + 1]
    breakpoints = 0
    for i in range(len(P) - 1):
        if P[i+1] - P[i] != 1:
            breakpoints += 1
    return breakpoints
BreakPoints(p)

8

In [114]:
with open('rosalind_ba6b.txt', 'r') as reader:
    permutation = reader.readline().replace(')', '').replace('(', '').split(' ')
permutation = [int(x) for x in permutation]
BreakPoints(permutation)

94

- sorting by reversals as the process of breakpoint elimination - reducing the number of breakpoints in a permutation P from Breakpoints(P) to 0

- what's the max no of breakpoints that can be eliminated by a single reversal?
if $(p_i, p_{i+1})$ formed a breakpoint within the span of a reversal, then these consecutive elements will remain a breakpoint after the reversal changes them into $(-p_{i+1}, -p_i)$ i.e. their differences are still not equal to 1

Since all the breakpoints inside and outside the span of a reversal remain breakpoints after a reversal, the only breakpoints that could be eliminated by a reversal are the two breakpoints located on the boundaries of the inverted interval --> a reversal can eliminate max 2 breakpoints

Breakpoint theorem: $d_{rev}(P) \geq Breakpoints(P)/2$

It turns out that every permutation of length $n$ can be sorted using at most $n+1$ reversals and the permutation $(+n+(n-1)+\dots+1)$ requires $n+1$ reversals to sort. Since this permutation has $n+1$ breakpoints, there is a large gap between the lower bound provided by the Breakpoint theorem and the reversal distance

Prove that there exists a shortest sequence of reversals sorting a permutation that never breaks a permutation at an adjacency

Breakpoint Graphs

- Prove that the red and blue edges in any breakpoint graph form alternating cycles. Hint: How many red and blue edges meet at each node of the breakpoint graph? (pg 317)

2-break: a rearrangement that replaces 2 red edges with 2 new red edges on the same 4 nodes

2-break distance, d(P, Q): min # of 2-breaks transforming genome P into genome Q

Cycles(P, Q) = # of red-blue alternating cycles in BreakPointGraph(P, Q)
Blocks(P, Q) = # of synteny blocks in the case that P and Q have the same number of synteny blocks

When P and Q are identical -> Cycles(P, Q) = 2

Trivial cycles = cycles of length 2

Trivial breakpoint graph = breakpoint graph formed by identical genomes

- Prove that Cycles(P, Q) < Blocks(P, Q) unless P = Q

__Chromosome To Cycle Problem__

Solve the Chromosome To Cycle Problem.

__Given__: A chromosome Chromosome containing n synteny blocks.

__Return__: The sequence Nodes of integers between 1 and 2n resulting from applying ChromosomeToCycle to Chromosome.

In [31]:
# 6f ChromosomeToCycle
def ChromosomeToCycle(Chromosome):
    nodes = []
    for j in range(len(Chromosome)):
        i = Chromosome[j]
        if i > 0:
            nodes.append(2*i-1)
            nodes.append(2*i)
        else:
            nodes.append(-2*i)
            nodes.append(-2*i-1)
    return nodes

In [33]:
ChromosomeToCycle([1, -2, -3, 4])

[1, 2, 4, 3, 6, 5, 7, 8]

In [30]:
test_file = 'rosalind_ba6f.txt'
with open(test_file, 'r') as reader:
    Chromosome = reader.readline().strip('\n')
    Chromosome = Chromosome.replace("(", "").replace(")", "")
    Chromosome = [int(x) for x in Chromosome.split()]

cycle = ChromosomeToCycle(Chromosome)
print("(" + " ".join(map(str, cycle)) + ")")

(2 1 3 4 6 5 8 7 10 9 11 12 14 13 15 16 17 18 20 19 22 21 23 24 26 25 28 27 30 29 31 32 34 33 35 36 38 37 39 40 42 41 44 43 45 46 48 47 50 49 52 51 54 53 56 55 58 57 59 60 62 61 63 64 66 65 67 68 70 69 71 72 74 73 75 76 78 77 80 79 81 82 83 84 85 86 87 88 90 89 91 92 93 94 96 95 98 97 99 100 101 102 104 103 106 105 107 108 109 110 112 111 113 114 116 115 117 118 120 119 121 122 123 124 126 125)


__Cycle To Chromosome Problem__

Solve the Cycle to Chromosome Problem.

__Given__: A sequence Nodes of integers between 1 and 2n.

__Return__: The chromosome Chromosome containing n synteny blocks resulting from applying CycleToChromosome to Nodes.

In [69]:
# 6g Cycle to Chromosome
def CycleToChromosome(Nodes):
    Chromosome = []
    for i in range(0, len(Nodes), 2):
        if Nodes[i] < Nodes[i+1]:
            Chromosome.append(Nodes[i+1]//2)
        else:
            Chromosome.append(-Nodes[i]//2)
    return Chromosome

In [70]:
CycleToChromosome([int(x) for x in '1 2 4 3 6 5 7 8'.split(' ')])

[1, -2, -3, 4]

In [68]:
test_file = 'rosalind_ba6g.txt'
with open(test_file, 'r') as reader:
    Nodes = reader.readline().strip('\n')
    Nodes = Nodes.replace("(", "").replace(")", "")
    Nodes = [int(x) for x in Nodes.split()]

Chromosome = CycleToChromosome(Nodes)
print("(" + " ".join(["+" + str(x) if x > 0 else str(x) for x in Chromosome]) + ")")

(-1 +2 -3 -4 -5 +6 +7 +8 -9 -10 +11 +12 -13 -14 +15 -16 -17 -18 -19 -20 +21 +22 +23 +24 +25 +26 +27 -28 +29 -30 +31 +32 -33 -34 +35 +36 +37 +38 -39 -40 +41 +42 +43 -44 -45 -46 +47 +48 -49 +50 -51 -52 -53 +54 -55 -56 +57 +58 -59 +60 +61 -62 -63)


In [121]:
# 6h, need to check again
def ColoredEdges(P):
    edges = []
    for chromosome in P:
        nodes = ChromosomeToCycle(chromosome)
        for j in range(1, len(nodes), 2):
            if j != len(nodes) - 1:
                edges.append([nodes[j], nodes[j+1]])

            else:
                edges.append([nodes[j], nodes[0]])

    return edges

In [122]:
chromosomes = [[+1, -2, -3],[+4, +5, -6]]
ColoredEdges(chromosomes)


[[2, 4], [3, 6], [5, 1], [8, 9], [10, 12], [11, 7]]

In [123]:
test_file = 'rosalind_ba6h.txt'
with open(test_file, 'r') as reader:
    chromosomes = reader.readline().strip()
chromosomes = chromosomes[1:-1]
chromosomes = chromosomes.split(')(')
for i in range(len(chromosomes)):
    chromosomes[i] = [int(x) for x in chromosomes[i].split(' ')]

edges = ColoredEdges(chromosomes)
', '.join([str(tuple(edge)) for edge in edges])

'(1, 4), (3, 5), (6, 8), (7, 10), (9, 11), (12, 14), (13, 16), (15, 17), (18, 19), (20, 22), (21, 23), (24, 26), (25, 27), (28, 30), (29, 31), (32, 34), (33, 35), (36, 38), (37, 39), (40, 41), (42, 43), (44, 46), (45, 48), (47, 50), (49, 51), (52, 2), (54, 56), (55, 58), (57, 59), (60, 61), (62, 63), (64, 66), (65, 67), (68, 70), (69, 72), (71, 73), (74, 76), (75, 78), (77, 80), (79, 82), (81, 83), (84, 86), (85, 88), (87, 89), (90, 91), (92, 93), (94, 95), (96, 97), (98, 99), (100, 102), (101, 104), (103, 106), (105, 107), (108, 109), (110, 112), (111, 53), (114, 115), (116, 117), (118, 119), (120, 122), (121, 123), (124, 125), (126, 128), (127, 129), (130, 132), (131, 134), (133, 136), (135, 137), (138, 139), (140, 141), (142, 144), (143, 145), (146, 148), (147, 150), (149, 152), (151, 153), (154, 155), (156, 157), (158, 113), (159, 162), (161, 164), (163, 166), (165, 168), (167, 170), (169, 172), (171, 174), (173, 175), (176, 178), (177, 179), (180, 182), (181, 183), (184, 186), (18

In [111]:
# 6i GraphToGenome, need to check again

def GraphToGenome(GenomeGraph):
    P = []
    Cycles = []
    temp = []
    for i in range(len(GenomeGraph)):
        if i == len(GenomeGraph) - 1:
            temp += GenomeGraph[i]
            Cycles.append(temp)
        elif GenomeGraph[i][1] == GenomeGraph[i+1][0] + 1 or GenomeGraph[i][1] == GenomeGraph[i+1][0] - 1:
            temp += GenomeGraph[i]
        else:
            temp += GenomeGraph[i]
            Cycles.append(temp)
            temp = []
    for cycle in Cycles:
        Chromosome = CycleToChromosome([cycle[-1]] + cycle[:-1])
        P.append(Chromosome)
    
    return P

In [115]:
print(GraphToGenome([(2, 4), (3, 6), (5, 1), (8, 9), (10, 12), (11, 7)]))
print(GraphToGenome([(2, 4), (3, 6), (5, 1), (7, 9), (10, 12), (11, 8)]))

[[1, -2, -3], [4, 5, -6]]
[[1, -2, -3], [-4, 5, -6]]


__2-Break Distance Problem__

Find the 2-break distance between two genomes.

__Given__: Two genomes with circular chromosomes on the same set of synteny blocks.

__Return__: The 2-break distance between these two genomes.

In [119]:
# 6c 2-break distance problem

def find_next_edge(current, edges):
    if len(edges) == 0:
        return -1
    idx = 0
    while not (current[0] in edges[idx] or current[1] in edges[idx]):
        idx += 1
        if idx == len(edges):
            return -1
        
    return edges[idx]

def TwoBreakDist(P, Q):
    edgesP = ColoredEdges(P)
    edgesQ = ColoredEdges(Q)
    edges = edgesP + edgesQ
    print(edges)
    blocks = set()
    
    for edge in edges:
        blocks.add(edge[0])
        blocks.add(edge[1])
    
    Cycles = []
    
    while len(edges) != 0:
        start = edges[0]
        edges.remove(edges[0])
        Cycle = [start]
        current = find_next_edge(start, edges)
        
        while current != -1:
            Cycle.append(current)
            edges.remove(current)
            current = find_next_edge(current, edges)
        Cycles.append(Cycle)
    
    return len(blocks)//2 - len(Cycles)

In [124]:
P = '(+1 +2 +3 +4 +5 +6)'[1:-1].split(')(')
for i in range(len(P)):
    P[i] = [int(x) for x in P[i].split(' ')]
    
Q = '(+1 -3 -6 -5)(+2 -4)'[1:-1].split(')(')
for i in range(len(Q)):
    Q[i] = [int(x) for x in Q[i].split(' ')]

TwoBreakDist(P, Q)

[[2, 3], [4, 5], [6, 7], [8, 9], [10, 11], [12, 1], [2, 6], [5, 12], [11, 10], [9, 1], [4, 8], [7, 3]]


3