In [2]:
from nussinov import Nussinov

In [3]:
T = {
    'A': 'U',
    'G': 'C',
    'C': 'G',
    'U': 'A',
}

In [4]:
s = 'GCUCGGG UUCCC UAU UCA AGAGC'.replace(' ', '') # should be 10
s = s
ns = Nussinov(s)
ns.solve(min_padding=0)

10

In [5]:
print(ns.dot_parentheses(prettify=True))

G C U C G G G U U C C C U A U U C A A G A G C
---------------------------------------------
( ( ( ( ( ( ( - - ) ) ) ( ( ) ( - ) ) ) ) ) )


In [6]:
ns.solve(min_padding=0, tie_break_permute=['B', 'M', 'D', 'L'])
print(ns.dot_parentheses(prettify=True))

G C U C G G G U U C C C U A U U C A A G A G C
---------------------------------------------
( ) ( ( ) ( ( - - ) ) ( ( ) ( ( - ) ) ) ) ( )


In [7]:
result = ns.evaluate_tie_breaks(min_padding=0, prettify=True)

In [8]:
len(result.keys())

2

In [9]:
possibilities = list(result.keys())

result[possibilities[0]]

[('L', 'D', 'M', 'B'),
 ('L', 'M', 'D', 'B'),
 ('L', 'M', 'B', 'D'),
 ('D', 'L', 'M', 'B'),
 ('D', 'M', 'L', 'B'),
 ('D', 'M', 'B', 'L'),
 ('M', 'L', 'D', 'B'),
 ('M', 'L', 'B', 'D'),
 ('M', 'D', 'L', 'B'),
 ('M', 'D', 'B', 'L'),
 ('M', 'B', 'L', 'D'),
 ('M', 'B', 'D', 'L')]

In [10]:
result[possibilities[1]]

[('L', 'D', 'B', 'M'),
 ('L', 'B', 'D', 'M'),
 ('L', 'B', 'M', 'D'),
 ('D', 'L', 'B', 'M'),
 ('D', 'B', 'L', 'M'),
 ('D', 'B', 'M', 'L'),
 ('B', 'L', 'D', 'M'),
 ('B', 'L', 'M', 'D'),
 ('B', 'D', 'L', 'M'),
 ('B', 'D', 'M', 'L'),
 ('B', 'M', 'L', 'D'),
 ('B', 'M', 'D', 'L')]

There are clearly some patterns here. `B` and `M` are clearly at odds, and `L` and `D` evenly split on the options. We would like to further investigate these patterns. This includes the number of unique structures and their count distribution, the dynamics of the "dominating" options (most likely `B` and `M`) and the remaining options, and the affect of `min_padding.`

A real example is below.

In [17]:
# https://rnacentral.org/rna/URS00000DE3E2/9606
# Homo sapiens small nucleolar RNA, H/ACA box 73A (SNORA73A)

s = 'GUCUUCUCAUUGAGCUCCUUUCUGUCUAUCAGUGGCAGUUUAUGGAUUCGCACGAGAAGAAGAGAGAAUUCACAGAACUAGCAUUAUUUUACCUUCUGUCUUUACAGAGGUAUAUUUAGCUGUAUUGUGAGACAUUC'

In [18]:
print(len(s))

137


In [19]:
ns = Nussinov(s)
ns.solve(min_padding=3)

100%|██████████| 9316/9316.0 [00:00<00:00, 11422.28it/s]


49

In [20]:
result = ns.evaluate_tie_breaks(min_padding=3, prettify=False)

100%|██████████| 9316/9316.0 [00:00<00:00, 13243.32it/s]
100%|██████████| 9316/9316.0 [00:00<00:00, 11471.24it/s]
100%|██████████| 9316/9316.0 [00:01<00:00, 7355.10it/s] 
100%|██████████| 9316/9316.0 [00:00<00:00, 12456.77it/s]
100%|██████████| 9316/9316.0 [00:00<00:00, 10905.51it/s]
100%|██████████| 9316/9316.0 [00:00<00:00, 11129.10it/s]
100%|██████████| 9316/9316.0 [00:00<00:00, 9898.97it/s] 
100%|██████████| 9316/9316.0 [00:00<00:00, 10795.53it/s]
100%|██████████| 9316/9316.0 [00:01<00:00, 7776.16it/s] 
100%|██████████| 9316/9316.0 [00:00<00:00, 12896.67it/s]
100%|██████████| 9316/9316.0 [00:00<00:00, 10754.30it/s]
100%|██████████| 9316/9316.0 [00:00<00:00, 10428.94it/s]
100%|██████████| 9316/9316.0 [00:00<00:00, 11424.49it/s]
100%|██████████| 9316/9316.0 [00:00<00:00, 10447.20it/s]
100%|██████████| 9316/9316.0 [00:00<00:00, 10054.36it/s]
100%|██████████| 9316/9316.0 [00:00<00:00, 11013.31it/s]
100%|██████████| 9316/9316.0 [00:00<00:00, 10290.67it/s]
100%|██████████| 9316/9316.0 [0

In [13]:
len(result.keys())

6

In [14]:
possibilities = list(result.keys())

In [15]:
for i, p in enumerate(possibilities):
    print(i, len(result[p]))

0 4
1 4
2 12
3 1
4 1
5 2


In [16]:
result[possibilities[4]]

[('M', 'D', 'B', 'L')]

In [18]:
result[possibilities[11]]

[('B', 'L', 'D', 'M'), ('B', 'D', 'L', 'M'), ('B', 'D', 'M', 'L')]

In [19]:
result[possibilities[12]]

[('B', 'L', 'M', 'D'), ('B', 'M', 'L', 'D'), ('B', 'M', 'D', 'L')]

In [20]:
# website dot_bracket notation:
official = "..((((((.....(((((...(((((.......))))).....)))...))..))))))..........((((((.................((((((......))))))..............))))))......."


In [21]:
count = 0
for c in official:
    if c == '(':
        count += 1

In [22]:
count

28

We are planning to compare Nussinov to Greedy across a range of real inputs to see how they trend together. It seems as if in general Nussinov is usually pretty close to the greedy maximum number of pairings.

In [23]:
from greedy import GreedySolver

In [24]:
# maximum possible
gs = GreedySolver(s)
gs.solve()

60