# Conversion from single-line to multi-line blocks of alignment showing residue pairs made by TM-align

Issue:  
As part of the results [Tm-align](https://zhanggroup.org/TM-align/) generates is a sequence alignment showing the pairs of residues that are close in distance in the aligned structures, i.e., 'aligned residue pairs'. However, these sequences are all on the same line in the report. For long sequences, this can be difficult to view easily. One way to view such an alignment is by viewing it in a text editor by disabling word-wrap. However, this requires active visualization to explore making it impossible to assess the extreme ends at the same tie. Plus, this wide-view form is hard to pass along to colleagues in a report form.

There's three lines. The outer two are the sequence of the aligned structure and the center line has symbols indicating that have been aligned. The symbols mean the following according to the returned page that includes the report : '":" denotes aligned residue pairs of d < 5.0 A, "." denotes other aligned residues'.

The process illustrated in this notebook takes those three lines and makes blocks of aligned sequences, reminiscent of what you'd get for a Clustal format, for example [here](https://gist.githubusercontent.com/fomightez/f46b0624f1d8e3abb6ff908fc447e63b/raw/6abce38569475c68fa32182c4e0eaadbb8b0cf3b/Stv1p_Vph1p_muscle_alignment.clw.)


-----

### Preparation: making demo input

The demo input was made by [Tm-align](https://zhanggroup.org/TM-align/) when provided chains a of [6o7v](https://www.rcsb.org/structure/6O7V) and [6o7t](https://www.rcsb.org/structure/6O7T), Stv1p and Vph1p, respectively. The files with the two chains isolated from the rest of the PDB files for the structures were made using the script `split_into_chains.sh` as illustrated in [this notebook](Split%20pdb%20files%20into%20chains.ipynb).

Below are the three alignment lines from the 'TM-align Results' page of the submission of `6o7v_a.pdb` and `6o7t_a.pdb`. In other words, the three lines below `(":" denotes aligned residue pairs of d < 5.0 A, "." denotes other aligned residues)`:

In [19]:
s='''------------VQLY-IPLEVIREVTF-LLG-KM--------------------LRRFDEVERMVGFLN-EV--VEKH------------------LSLENVNDMVKEITDCESRARQLDESLDSLRSKLN-DLLEQRQVIFECSKFIEVYMITGSIRRTKVDILNRIL-------------------------------------------------W-R--------------------LLRGNLIFQNFPIEVEKDCFIIFTHGETLLKKVKRVIDSLNGKIVSLNTRSSEL-VDTLNRQ--IDDLQRILDTTEQTLHTELLVIHDQLPVWSAMTKREKYVYTTLNKFQQESQ-GLIAEGWVPSTELIHLQDSLKDYIE-TL-----GSEYSTV-F-N-V-------------------------------AGLATVVTFPFMFAIMFGDMGHGFILFLMALFLVLNERKFGAMHRDEIFDMAFTGRYVLLLMGAFSVYTGLLYNDIFSKSMTIFKSGWQWPSTFRKGESIEAKKTGVYPFGLDFAWHGTDNGLLFSNSYKMKLSILMGYAHMTYSFMFSYINYRAKNSKVDIIGNFIPGLVFMQSIFGYLSWAIVYKWSKDWIKDDKPAPGLLNMLINMFLAPGTIDDQLYSGQAKLQVVLLLAALVCVPWLLLYKPLTLRRLNKFNFGDVMIHQVIHTIEFCLNCISHTASYLRLWALSLAHAQLSSVLWDMTISNAFSSKNSGSPLAVMKVVFLFAMWFVLTVCILVFMEGTSAMLHALRLHWVEAMSK----------------------
            .... ..::::::.:: :.. ::                    ::::::::..:.... :.  ....                  .:.::::.::..:...:............ ...   .  .                   ..:::::.......                                                 . .                    ...:..::::.      .::.                    ....          .  .     .  .. ... ...... ....:..::::::::::::::::........... ...... ..  :.:........... .:     :::.... . . :                               :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::         ::::::::::::::::::::::::::::::::::::::::::::::::::::: : : :::::::::::::::::::::::::::::::::::::::::::::                      
KEEAIFRSAEMALVQFYIPQEISRDSAYTLGQLGLVQFRDLNSKVRAFQRTFVNEIRRLDNVERQYRYFYSLLKKHDIKLYEGDTDKYLDGSGELYVPPSGSVIDDYVRNASYLEERLIQMEDATD-QIE--VQ--K-------------------NDLEQYRFILYVTGVIARDKVATLEQILWRVLRGNLFFKTVEIEQPVYDVKTREYKHKNAFIVFSHGDLIIKRIRKIAESLDANLYDVDSSNEGRSQ------QLAK--------------------VNKN---------LS--D---LYT--VL-KTT-STTLES-ELYAIAKELDSWFQDVTREKAIFEILNKSNYDTNRKILIAE-GW--IPRDELATLQARLGEMIARLGIDVPSIIQVLDTNHTPPTFHRTNKFTAGFQSICDCYGIAQYREINAGLPTIVTFPFMFAIMFGDMGHGFLMTLAALSLVLNEKKINKMKRGEIFDMAFTGRYIILLMGVFSMYTGFLYNDIFSKTMTIFKSGWKWPDHWKKGESITATSVGTYPIGLDWAWHGTENALLFSNSYKMKLSILMGFIHMTYSYFFSLANHLYFNSMIDIIGNFIPGLLFMQGIFGYLSVCIVYKWAVDWVKDGKPAPGLLNMLINMFLSPGTIDDELYPHQAKVQVFLLLMALVCIPWLLLVKPLH---------GDIMIHQVIHTIEFCLNCVSHTASYLRLWALSLAHAQLSSVLWTMTIQIAFGF-R-G-FVGVFMTVALFAMWFALTCAVLVLMEGTSAMLHSLRLHWVESMSKFFVGEGLPYEPFAFEYKDMEVA
'''

### Preparation: Define approach code

In [20]:
sequence_characters_per_line_in_output = 80
def chunk_record(sequence_string):
    '''
    chop up so characters per line matches setting of
    `sequence_characters_per_line_in_output`
    
    return a list of the 'chunks'
    '''
    chunk_size = sequence_characters_per_line_in_output
    seq_chunks = [sequence_string[i:i+chunk_size] for i in range(
        0, len(sequence_string),chunk_size)]
    #return "\n".join(seq_chunks)
    return seq_chunks

### Conversion step

The convsersion step will take the input represented above and convert it.

In [26]:
# take the lines each as separate string
list_o_lines = s.split("\n")
# take each line and chunk into strings of length matching setting for 
# `sequence_characters_per_line_in_output`
chunked_lines = []
for l in list_o_lines:
    chunked_lines.append(chunk_record(l))
# zip the chunked lines so they can be used to create blocks
zipped_chunks = list(zip(*chunked_lines[:3])) #Because of way I made line ending to 
# place closing `'''` on last line of the string used as input
# just want the three lines and NOT THE EMPTRY FOURTH.
# Create the blocks of output
out = ""
for x in zipped_chunks:
    out += f"{x[0]}\n{x[1]}\n{x[2]}\n\n\n"
print(out)

------------VQLY-IPLEVIREVTF-LLG-KM--------------------LRRFDEVERMVGFLN-EV--VEKH-
            .... ..::::::.:: :.. ::                    ::::::::..:.... :.  .... 
KEEAIFRSAEMALVQFYIPQEISRDSAYTLGQLGLVQFRDLNSKVRAFQRTFVNEIRRLDNVERQYRYFYSLLKKHDIKL


-----------------LSLENVNDMVKEITDCESRARQLDESLDSLRSKLN-DLLEQRQVIFECSKFIEVYMITGSIRR
                 .:.::::.::..:...:............ ...   .  .                   ..::
YEGDTDKYLDGSGELYVPPSGSVIDDYVRNASYLEERLIQMEDATD-QIE--VQ--K-------------------NDLE


TKVDILNRIL-------------------------------------------------W-R------------------
:::.......                                                 . .                  
QYRFILYVTGVIARDKVATLEQILWRVLRGNLFFKTVEIEQPVYDVKTREYKHKNAFIVFSHGDLIIKRIRKIAESLDAN


--LLRGNLIFQNFPIEVEKDCFIIFTHGETLLKKVKRVIDSLNGKIVSLNTRSSEL-VDTLNRQ--IDDLQRILDTTEQT
  ...:..::::.      .::.                    ....          .  .     .  .. ... ....
LYDVDSSNEGRSQ------QLAK--------------------VNKN---------LS--D---LYT--VL-KTT-STTL


LHTELLVIHDQLPVWSAMTK

In [18]:
# take the lines each as separate string
list_o_lines = s.split("\n")
# take each line and chunk into strings of length matching setting for 
# `sequence_characters_per_line_in_output`
chunked_lines = []
for l in list_o_lines:
    chunked_lines.append(chunk_record(l))
# zip the chunked lines so they can be used to create blocks
zipped_chunks = list(zip(chunked_lines[0],chunked_lines[1],chunked_lines[2]))
# Create the blocks of output
out = ""
for x in zipped_chunks:
    #print(x[0])
    #print(x[1])
    #print(x[2])
    out += f"{x[0]}\n{x[1]}\n{x[2]}\n\n\n"
print(out)

------------VQLY-IPLEVIREVTF-LLG-KM--------------------LRRFDEVERMVGFLN-EV--VEKH-
            .... ..::::::.:: :.. ::                    ::::::::..:.... :.  .... 
KEEAIFRSAEMALVQFYIPQEISRDSAYTLGQLGLVQFRDLNSKVRAFQRTFVNEIRRLDNVERQYRYFYSLLKKHDIKL


-----------------LSLENVNDMVKEITDCESRARQLDESLDSLRSKLN-DLLEQRQVIFECSKFIEVYMITGSIRR
                 .:.::::.::..:...:............ ...   .  .                   ..::
YEGDTDKYLDGSGELYVPPSGSVIDDYVRNASYLEERLIQMEDATD-QIE--VQ--K-------------------NDLE


TKVDILNRIL-------------------------------------------------W-R------------------
:::.......                                                 . .                  
QYRFILYVTGVIARDKVATLEQILWRVLRGNLFFKTVEIEQPVYDVKTREYKHKNAFIVFSHGDLIIKRIRKIAESLDAN


--LLRGNLIFQNFPIEVEKDCFIIFTHGETLLKKVKRVIDSLNGKIVSLNTRSSEL-VDTLNRQ--IDDLQRILDTTEQT
  ...:..::::.      .::.                    ....          .  .     .  .. ... ....
LYDVDSSNEGRSQ------QLAK--------------------VNKN---------LS--D---LYT--VL-KTT-STTL


LHTELLVIHDQLPVWSAMTK

In [22]:
chunked_lines

[['------------VQLY-IPLEVIREVTF-LLG-KM--------------------LRRFDEVERMVGFLN-EV--VEKH-',
  '-----------------LSLENVNDMVKEITDCESRARQLDESLDSLRSKLN-DLLEQRQVIFECSKFIEVYMITGSIRR',
  'TKVDILNRIL-------------------------------------------------W-R------------------',
  '--LLRGNLIFQNFPIEVEKDCFIIFTHGETLLKKVKRVIDSLNGKIVSLNTRSSEL-VDTLNRQ--IDDLQRILDTTEQT',
  'LHTELLVIHDQLPVWSAMTKREKYVYTTLNKFQQESQ-GLIAEGWVPSTELIHLQDSLKDYIE-TL-----GSEYSTV-F',
  '-N-V-------------------------------AGLATVVTFPFMFAIMFGDMGHGFILFLMALFLVLNERKFGAMHR',
  'DEIFDMAFTGRYVLLLMGAFSVYTGLLYNDIFSKSMTIFKSGWQWPSTFRKGESIEAKKTGVYPFGLDFAWHGTDNGLLF',
  'SNSYKMKLSILMGYAHMTYSFMFSYINYRAKNSKVDIIGNFIPGLVFMQSIFGYLSWAIVYKWSKDWIKDDKPAPGLLNM',
  'LINMFLAPGTIDDQLYSGQAKLQVVLLLAALVCVPWLLLYKPLTLRRLNKFNFGDVMIHQVIHTIEFCLNCISHTASYLR',
  'LWALSLAHAQLSSVLWDMTISNAFSSKNSGSPLAVMKVVFLFAMWFVLTVCILVFMEGTSAMLHALRLHWVEAMSK----',
  '------------------'],
 ['            .... ..::::::.:: :.. ::                    ::::::::..:.... :.  .... ',
  '                 .:.::::.:

In [24]:
chunked_lines[:3]

[['------------VQLY-IPLEVIREVTF-LLG-KM--------------------LRRFDEVERMVGFLN-EV--VEKH-',
  '-----------------LSLENVNDMVKEITDCESRARQLDESLDSLRSKLN-DLLEQRQVIFECSKFIEVYMITGSIRR',
  'TKVDILNRIL-------------------------------------------------W-R------------------',
  '--LLRGNLIFQNFPIEVEKDCFIIFTHGETLLKKVKRVIDSLNGKIVSLNTRSSEL-VDTLNRQ--IDDLQRILDTTEQT',
  'LHTELLVIHDQLPVWSAMTKREKYVYTTLNKFQQESQ-GLIAEGWVPSTELIHLQDSLKDYIE-TL-----GSEYSTV-F',
  '-N-V-------------------------------AGLATVVTFPFMFAIMFGDMGHGFILFLMALFLVLNERKFGAMHR',
  'DEIFDMAFTGRYVLLLMGAFSVYTGLLYNDIFSKSMTIFKSGWQWPSTFRKGESIEAKKTGVYPFGLDFAWHGTDNGLLF',
  'SNSYKMKLSILMGYAHMTYSFMFSYINYRAKNSKVDIIGNFIPGLVFMQSIFGYLSWAIVYKWSKDWIKDDKPAPGLLNM',
  'LINMFLAPGTIDDQLYSGQAKLQVVLLLAALVCVPWLLLYKPLTLRRLNKFNFGDVMIHQVIHTIEFCLNCISHTASYLR',
  'LWALSLAHAQLSSVLWDMTISNAFSSKNSGSPLAVMKVVFLFAMWFVLTVCILVFMEGTSAMLHALRLHWVEAMSK----',
  '------------------'],
 ['            .... ..::::::.:: :.. ::                    ::::::::..:.... :.  .... ',
  '                 .:.::::.:

In [9]:
zipped_chunks

[(['------------VQLY-IPLEVIREVTF-LLG-KM--------------------LRRFDEVERMVGFLN-EV--VEKH-',
   '-----------------LSLENVNDMVKEITDCESRARQLDESLDSLRSKLN-DLLEQRQVIFECSKFIEVYMITGSIRR',
   'TKVDILNRIL-------------------------------------------------W-R------------------',
   '--LLRGNLIFQNFPIEVEKDCFIIFTHGETLLKKVKRVIDSLNGKIVSLNTRSSEL-VDTLNRQ--IDDLQRILDTTEQT',
   'LHTELLVIHDQLPVWSAMTKREKYVYTTLNKFQQESQ-GLIAEGWVPSTELIHLQDSLKDYIE-TL-----GSEYSTV-F',
   '-N-V-------------------------------AGLATVVTFPFMFAIMFGDMGHGFILFLMALFLVLNERKFGAMHR',
   'DEIFDMAFTGRYVLLLMGAFSVYTGLLYNDIFSKSMTIFKSGWQWPSTFRKGESIEAKKTGVYPFGLDFAWHGTDNGLLF',
   'SNSYKMKLSILMGYAHMTYSFMFSYINYRAKNSKVDIIGNFIPGLVFMQSIFGYLSWAIVYKWSKDWIKDDKPAPGLLNM',
   'LINMFLAPGTIDDQLYSGQAKLQVVLLLAALVCVPWLLLYKPLTLRRLNKFNFGDVMIHQVIHTIEFCLNCISHTASYLR',
   'LWALSLAHAQLSSVLWDMTISNAFSSKNSGSPLAVMKVVFLFAMWFVLTVCILVFMEGTSAMLHALRLHWVEAMSK----',
   '------------------'],),
 (['            .... ..::::::.:: :.. ::                    ::::::::..:.... :.  .... ',
   '           

In [14]:
list(zip(*chunked_lines))

[]

In [8]:
import time

def executeSomething():
    #code here
    print ('.')
    time.sleep(480) #60 seconds times 8 minutes

while True:
    executeSomething() 

.
.


KeyboardInterrupt: 