# Conversion from single-line to multi-line blocks of alignment showing residue pairs made by TM-align

**Issue**:  
As part of the results [Tm-align](https://zhanggroup.org/TM-align/), an algorithm for protein structure alignment and comparison, generates is a sequence alignment showing the pairs of residues that are close in distance in the alignment structures, i.e., 'aligned residue pairs'. However, these sequences are all on the same line in the report. For long sequences, this can be difficult to view easily. One way to view such an alignment is by viewing it in a text editor by disabling word-wrap. However, this requires active visualization to explore making it impossible to assess the extreme ends at the same tie. Plus, this wide-view form is hard to pass along to colleagues in a report form.

There's three lines. The outer two are the sequence of the aligned structures and the center line has symbols indicating residue pairs that have been aligned and whether they are under a distance threshold in the superposed structures. The symbols mean the following according to the returned page that includes the report : 

>":" denotes aligned residue pairs of d < 5.0 A, "." denotes other aligned residues'.

The process illustrated in this notebook takes those three lines and reformats it, making blocks of aligned sequences of a defined length, reminiscent of what you'd get for a Clustal format, for example [here](https://gist.githubusercontent.com/fomightez/f46b0624f1d8e3abb6ff908fc447e63b/raw/6abce38569475c68fa32182c4e0eaadbb8b0cf3b/Stv1p_Vph1p_muscle_alignment.clw.).


-----

### Preparation: making demo input

The demo input was made by [Tm-align](https://zhanggroup.org/TM-align/) when provided chains a of [6o7v](https://www.rcsb.org/structure/6O7V) and [6o7t](https://www.rcsb.org/structure/6O7T), Stv1p and Vph1p, respectively. The files with the two chains isolated from the rest of the PDB files for the structures were made using the script `split_into_chains.sh` as illustrated in [this notebook](Split%20pdb%20files%20into%20chains.ipynb).

Below are the three alignment lines from the 'TM-align Results' page of the submission of `6o7v_a.pdb` and `6o7t_a.pdb`. In other words, the three lines below `(":" denotes aligned residue pairs of d < 5.0 A, "." denotes other aligned residues)` from the returned results:

In [1]:
s='''------------VQLY-IPLEVIREVTF-LLG-KM--------------------LRRFDEVERMVGFLN-EV--VEKH------------------LSLENVNDMVKEITDCESRARQLDESLDSLRSKLN-DLLEQRQVIFECSKFIEVYMITGSIRRTKVDILNRIL-------------------------------------------------W-R--------------------LLRGNLIFQNFPIEVEKDCFIIFTHGETLLKKVKRVIDSLNGKIVSLNTRSSEL-VDTLNRQ--IDDLQRILDTTEQTLHTELLVIHDQLPVWSAMTKREKYVYTTLNKFQQESQ-GLIAEGWVPSTELIHLQDSLKDYIE-TL-----GSEYSTV-F-N-V-------------------------------AGLATVVTFPFMFAIMFGDMGHGFILFLMALFLVLNERKFGAMHRDEIFDMAFTGRYVLLLMGAFSVYTGLLYNDIFSKSMTIFKSGWQWPSTFRKGESIEAKKTGVYPFGLDFAWHGTDNGLLFSNSYKMKLSILMGYAHMTYSFMFSYINYRAKNSKVDIIGNFIPGLVFMQSIFGYLSWAIVYKWSKDWIKDDKPAPGLLNMLINMFLAPGTIDDQLYSGQAKLQVVLLLAALVCVPWLLLYKPLTLRRLNKFNFGDVMIHQVIHTIEFCLNCISHTASYLRLWALSLAHAQLSSVLWDMTISNAFSSKNSGSPLAVMKVVFLFAMWFVLTVCILVFMEGTSAMLHALRLHWVEAMSK----------------------
            .... ..::::::.:: :.. ::                    ::::::::..:.... :.  ....                  .:.::::.::..:...:............ ...   .  .                   ..:::::.......                                                 . .                    ...:..::::.      .::.                    ....          .  .     .  .. ... ...... ....:..::::::::::::::::........... ...... ..  :.:........... .:     :::.... . . :                               :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::         ::::::::::::::::::::::::::::::::::::::::::::::::::::: : : :::::::::::::::::::::::::::::::::::::::::::::                      
KEEAIFRSAEMALVQFYIPQEISRDSAYTLGQLGLVQFRDLNSKVRAFQRTFVNEIRRLDNVERQYRYFYSLLKKHDIKLYEGDTDKYLDGSGELYVPPSGSVIDDYVRNASYLEERLIQMEDATD-QIE--VQ--K-------------------NDLEQYRFILYVTGVIARDKVATLEQILWRVLRGNLFFKTVEIEQPVYDVKTREYKHKNAFIVFSHGDLIIKRIRKIAESLDANLYDVDSSNEGRSQ------QLAK--------------------VNKN---------LS--D---LYT--VL-KTT-STTLES-ELYAIAKELDSWFQDVTREKAIFEILNKSNYDTNRKILIAE-GW--IPRDELATLQARLGEMIARLGIDVPSIIQVLDTNHTPPTFHRTNKFTAGFQSICDCYGIAQYREINAGLPTIVTFPFMFAIMFGDMGHGFLMTLAALSLVLNEKKINKMKRGEIFDMAFTGRYIILLMGVFSMYTGFLYNDIFSKTMTIFKSGWKWPDHWKKGESITATSVGTYPIGLDWAWHGTENALLFSNSYKMKLSILMGFIHMTYSYFFSLANHLYFNSMIDIIGNFIPGLLFMQGIFGYLSVCIVYKWAVDWVKDGKPAPGLLNMLINMFLSPGTIDDELYPHQAKVQVFLLLMALVCIPWLLLVKPLH---------GDIMIHQVIHTIEFCLNCVSHTASYLRLWALSLAHAQLSSVLWTMTIQIAFGF-R-G-FVGVFMTVALFAMWFALTCAVLVLMEGTSAMLHSLRLHWVESMSKFFVGEGLPYEPFAFEYKDMEVA
'''

You can replace that alignment between the two set of ticks above with your own alignent to run it with your own data.   
Whether the last three ticks are on their own line, or contiguous with the end of the third line, doesn't matter.  
The **first line has to be contiguous, following directly after first three ticks**.


### Preparation: Define the code for the reformatting steps

In [2]:
sequence_characters_per_line_in_output = 80
def chunk_record(sequence_string):
    '''
    chop up so characters per line matches setting of
    `sequence_characters_per_line_in_output`
    
    return a list of the 'chunks'
    '''
    chunk_size = sequence_characters_per_line_in_output
    seq_chunks = [sequence_string[i:i+chunk_size] for i in range(
        0, len(sequence_string),chunk_size)]
    #return "\n".join(seq_chunks)
    return seq_chunks
def convert_3lines_to_blocks(s):
    '''
    Takes lines of alignment with each on same line as a single
    string.
    
    Returns a string with the alignment broken up into
    blocks with the number ofcharacters matching the 
    `sequence_characters_per_line_in_output` setting.
    '''
    # take the lines each as separate string
    list_o_lines = s.split("\n")
    # Adjust it if there is an empty list at the end due to the ticks being on
    # a separate line.
    if len(list_o_lines) == 4:
        list_o_lines = list_o_lines[:3]
    # take each line and chunk into strings of length matching setting for 
    # `sequence_characters_per_line_in_output`
    chunked_lines = []
    for l in list_o_lines:
        chunked_lines.append(chunk_record(l))
    # zip the chunked lines so they can be used to create blocks
    zipped_chunks = list(zip(*chunked_lines)) #Because of way I made line ending to 
    # place closing `'''` on last line of the string used as input
    # just want the three lines and NOT THE EMPTRY FOURTH.
    # Create the blocks of output
    out = ""
    for x in zipped_chunks:
        out += f"{x[0]}\n{x[1]}\n{x[2]}\n\n\n"
    return out

### Conversion step

The conversion step will take the input represented above and reformat it to block form that better fits in a document.

In [3]:
converted = convert_3lines_to_blocks(s)
print(converted)

------------VQLY-IPLEVIREVTF-LLG-KM--------------------LRRFDEVERMVGFLN-EV--VEKH-
            .... ..::::::.:: :.. ::                    ::::::::..:.... :.  .... 
KEEAIFRSAEMALVQFYIPQEISRDSAYTLGQLGLVQFRDLNSKVRAFQRTFVNEIRRLDNVERQYRYFYSLLKKHDIKL


-----------------LSLENVNDMVKEITDCESRARQLDESLDSLRSKLN-DLLEQRQVIFECSKFIEVYMITGSIRR
                 .:.::::.::..:...:............ ...   .  .                   ..::
YEGDTDKYLDGSGELYVPPSGSVIDDYVRNASYLEERLIQMEDATD-QIE--VQ--K-------------------NDLE


TKVDILNRIL-------------------------------------------------W-R------------------
:::.......                                                 . .                  
QYRFILYVTGVIARDKVATLEQILWRVLRGNLFFKTVEIEQPVYDVKTREYKHKNAFIVFSHGDLIIKRIRKIAESLDAN


--LLRGNLIFQNFPIEVEKDCFIIFTHGETLLKKVKRVIDSLNGKIVSLNTRSSEL-VDTLNRQ--IDDLQRILDTTEQT
  ...:..::::.      .::.                    ....          .  .     .  .. ... ....
LYDVDSSNEGRSQ------QLAK--------------------VNKN---------LS--D---LYT--VL-KTT-STTL


LHTELLVIHDQLPVWSAMTK

Copy the reformatted alignment from above and use it where you wish.

-----

Enjoy!