# Test 02: Large Alignment

### Overview

This notebook demonstrates iterative alignment of a large testcase.

For this demonstration, testcase ```1a0cA_1ubpC``` from PREFAB is used as it contains 50 biological sequences.

Expected runtime: ~30 seconds or less

### Context

This notebook is intended to test the following requirements of MAli:

**Requirement 4.1** - Can align structural benchmarking test cases containing as many as 30 sequences.
- The notebook demonstrates alignment of a testcase of 50 sequences.


**Requirement 4.2** - Can align a structural benchmarking test case of at least 20 sequences within 60 seconds.
- The testcase of 50 sequences is aligned in less than 10 seconds.

### Installing Prerequisites

In [1]:
!pip install biopython



#### Imports

In [2]:
import os
import shutil
import subprocess
import time
from presentation_helper import PresentationHelper


#### MAli v1.31

In [3]:
ALIGNER_NAME = "MAli-v1.31"
ALIGNER_PATH = "MAli-v1.31/MAli.exe"
OUTPUT_FOLDER = "data/output"

In [4]:
# creating empty output folder
if os.path.exists(OUTPUT_FOLDER):
    shutil.rmtree(OUTPUT_FOLDER)
os.makedirs(OUTPUT_FOLDER)

#### Testcase

In [5]:
TESTCASE_NAME = "1a0cA_1ubpC"
INPUT_FILEPATH = f"data/input/{TESTCASE_NAME}"
OUTPUT_FILEPATH = f"data/output/{TESTCASE_NAME}"

#### Viewing Testcase

In [6]:
presenter = PresentationHelper()

In [7]:
presenter.present_unaligned_fasta(INPUT_FILEPATH)

Displaying Sequences from data/input/1a0cA_1ubpC: 

>1a0cA
NKYFENVSKIKYEGPKSNNPYSFKFYNPEEVIDGKTMEEHLRFSIAYWHTFTADGTDQFGKATMQRPWNHYTDPMDIAKARVEAAFEFFDKINAPYFCFHDRDIAPEGDTLRETNKNLDTIVAMIKDYLKTSKTKVLWGTANLFSNPRFVHGASTSCNADVFAYSAAQVKKALEITKELGGENYVFWGGREGYETLLNTDMEFELDNFARFLHMAVDYAKEIGFEGQFLIEPKPKEPTKHQYDFDVANVLAFLRKYDLDKYFKVNIEANHATLAFHDFQHELRYARINGVLGSIDANTGDMLLGWDTDQFPTDIRMTTLAMYEVIKMGGFDKGGLNFDAKVRRASFEPEDLFLGHIAGMDAFAKGFKVAYKLVKDRVFDKFIEERYASYKDGIGADIVSGKADFRSLEKYALERSQIVNKSGRQELLESILNQYLFA

>13|1a0cA|gi|37526188
HRYFERVNRISYEGRQSNNPLAFRHYNPEEIILGKKMKDHLRFAVCYWHNFCWDGTDMFGSGAFERFWQKGGDALELAKLKADVAFEFFYKLNIPFYCFHDIDVAPEGCSLKEYIYNLGVMSDILADKQAETGVKLLWGTANCFTHPRYAAGASTNPDLNIFAYASAQVCQVMQMTKKLGGENYVLWGGREGYESLLNTDLRQEREQIGRFMQMVVDYKYKIGFQGTLLIEPKPQEPTKHQYDYDVATVYGFLKQFGLENEIKVNIEANHATLAGHSFQHEVATAIALGILGSIDANRGDAQLGWDTDQFPNSVEENSLVMYEILKAGGFTTGGLNFDAKVRRQSIDIDDLFYGHIGAIDTMALSLKSAVKILVDGKLDEYVAQRYSGWNSELGRDILEGKMTLDEVAHYAETLVQEPKHRSGQQELLENLINRYIY

>55|1a0cA|gi|139830
NEYWQGVDQIKYIGHQDKKSG

#### Performing Alignment

In [8]:
ALIGNMENT_COMMAND = f"{ALIGNER_PATH} -input {INPUT_FILEPATH} -output {OUTPUT_FILEPATH}"
print(f"CLI command to be run: '{ALIGNMENT_COMMAND}'")

CLI command to be run: 'MAli-v1.31/MAli.exe -input data/input/1a0cA_1ubpC -output data/output/1a0cA_1ubpC'


In [9]:
start_time = time.perf_counter()
subprocess.run(ALIGNMENT_COMMAND)
end_time = time.perf_counter()

time_in_milliseconds = (end_time - start_time) * 1000
time_in_milliseconds_rounded = round(time_in_milliseconds, 0)
time_in_seconds = time_in_milliseconds_rounded / 1000

print(f"Performed alignment of {TESTCASE_NAME} in {time_in_seconds} seconds")

Performed alignment of 1a0cA_1ubpC in 5.278 seconds


#### Viewing Alignment Produced by MAli

In [10]:
ALIGNMENT_FILEPATH = OUTPUT_FILEPATH + ".faa"
presenter.present_interleaved_aligned_fasta(ALIGNMENT_FILEPATH)

Displaying interleaved alignment from 'data/output/1a0cA_1ubpC.faa: 

1a0cA           -----------------------------------------------------------N
13|1a0cA|gi|375 ------------------------------------------------------------
55|1a0cA|gi|139 ---------------------------------------------NEYWQGVDQIKYIGH
24|1a0cA|gi|333 ------------------------------------------------------------
28|1a0cA|gi|134 --------------------------FFGDIQKIKYEGPDSTNPLAYRFYNPDEVVAGKR
2|1a0cA|gi|3247 ------------------------------------------------------------
23|1a0cA|gi|343 -------------EF-FP-GIPKIKYE-GPSSKNPL-A----------------------
3|1a0cA|gi|3203 -----------------------------------------------------------S
29|1a0cA|gi|229 ------------------------------------------------------------
39|1a0cA|gi|613 --------------------------N-HFESANKVLYEGKDSKNPLAFKYYNPEEVVGG
53|1a0cA|gi|613 ------------------------------------------------------------
22|1a0cA|gi|156 -------------------------FF-NDVEKVQYEGPRSTNPYAFKYYNPEEIVAGKT
43|1a0