# DiscobaMultimer for MSA retrieval only

This notebook is used to predict batches of paired+unpaired Discoba MSAs, combined or not with the MSAs comming from ColabFold MSA server. If you want to use the resulting MSA to predict the structure of the complexes using LocalColabFold's implementation of AF2, see the other notebook: https://github.com/elviorodriguez/DiscobaMultimer.

Inside the downloaded ZIP file, the following folders will contain different alignments:
 - `colabfold_MSA`: MSAs retrived from ColabFold MSA server.
 - `discoba_mmseqs_alignments`: Discoba MSA for the monomeric proteins.
 - `discoba_paired_unpaired`: Paired+Unpaired Discoba MSAs for each IDs combination.
 - `merged_MSA`: Combination of Paired+Unpaired DiscobaMSA with ColabFoldMSA.

It works by uploading a database.fasta and an IDs_table.txt file. See https://github.com/elviorodriguez/DiscobaMultimer to know how to format them.

In [None]:
#@title Input <database.fasta> and <IDs_table.txt> files
from google.colab import files
import os
import re

#@markdown Set your `jobname` and options, and the click on `Runtime` -> `Run all`. You will be asked to upload your <database.fasta> and <IDs_table.txt> files.
# jobname to create a filesystem
#@markdown Name of the folder that will be crated for the project:
jobname = 'my_job' #@param {type:"string"}
#@markdown Set it to `yes` if you want to get a plot representation of the MSAs:
get_MSA_plots = "no" #@param ["no", "yes"]
#@markdown - MSA plots is not working right now. Leave it as `no`.print('')# (save it for later) - MSA plots will be stored in msa_plots directory of the project folder.
# (save it for later) - MSA plot generation is not computationally efficient rigth now. If you have many MSAs to retrieve, leave it as `no`, or it will significantly impact the execution time.

# check if directory with jobname exists
def check(folder):
  if os.path.exists(folder):
    return False
  else:
    return True
if not check(jobname):
  n = 0
  while not check(f"{jobname}_{n}"): n += 1
  jobname = f"{jobname}_{n}"

# make directory to save results
os.makedirs(jobname, exist_ok=True)
print("Jobname:",jobname)

# Upload database and IDs files
print("")
print("Upload <database.fasta> file")
upload_DB = files.upload()
print("")
print("Upload <IDs_table.tx> file")
upload_IDs = files.upload()


# Move the uploaded files to the destination directory
for filename, content in upload_DB.items():
    with open(jobname + "/database.fasta", 'wb') as f:
        f.write(content)
        if os.path.exists(filename): os.remove(filename)
for filename, content in upload_IDs.items():
    with open(jobname + "/IDs_table.txt", 'wb') as f:
        f.write(content)
        if os.path.exists(filename): os.remove(filename)

In [None]:
# @title Install dependencies and DiscobaMultimer (~20 min)
%%bash

# ---------------------------- Install dependencies ----------------------------

# Update apt-get
sudo apt-get -y update

# Install curl, git and wget and dos2unix
sudo apt-get -y install curl git wget dos2unix

# Install LocalColabFold
wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_linux.sh
bash install_colabbatch_linux.sh

# Install biopython
pip install biopython

# Install EMBOSS and HHSUITE
sudo apt-get -y install emboss hhsuite

# --------------------------- Install DiscobaMultimer --------------------------

# Clone the repo and cd to DiscobaMultimer
git clone https://github.com/elviorodriguez/DiscobaMultimer.git
cd DiscobaMultimer

# Install DiscobaMultimer
bash install/install_MMseqs2_and_DiscobaDB.sh install

# ----------------------------- Remove temp files ------------------------------
cd ..
rm install_colabbatch_linux.sh


In [None]:
# @title Run discoba_multimer_batch to retrive only MSAs
%%bash -s "$jobname" "$get_MSA_plots"
jobname=$1
get_MSA_plots=$2

# Add mmseqs and colabfold to PATH
export PATH="/content/localcolabfold/colabfold-conda/bin:$PATH"
export PATH="/content/DiscobaMultimer/mmseqs/bin:$PATH"

# Add env variables for DiscobaDB and DiscobaMultimerPATH
export DiscobaDB=/content/DiscobaMultimer/discoba/discoba
export DiscobaMultimerPath=/content/DiscobaMultimer

# Add aliases for DiscobaMultimer binaries
discoba_multimer_batch() {
    /content/DiscobaMultimer/scripts/discoba-multimer_batch.sh "$@"
}
discoba_monomer_batch() {
    /content/DiscobaMultimer/scripts/discoba-multimer_batch.sh "$@"
}

# Change to the project folder
cd $jobname

# Make sure files are unix compatible
dos2unix database.fasta
dos2unix IDs_table.txt

# Run the prediction
if [ "$get_MSA_plots" == "yes" ]; then
  discoba_multimer_batch -mp database.fasta IDs_table.txt
else
  discoba_multimer_batch -m database.fasta IDs_table.txt
fi

In [109]:
# @title Zip and download results
results_zip = f"{jobname}.result.zip"
os.system(f"zip -r {results_zip} {jobname}")
files.download(f"{jobname}.result.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>