# Tutorial
## **For Covalent Docking:**
This pipeline uses [Boltz2](https://github.com/jwohlwend/boltz)

Please install Boltz2 by running the following: 
```
conda create --name boltz2 python=3.12
pip install boltz[cuda] -U
module load cuda cudnn
```

For Maom Lab modules to help with preprocessing for covalent docking, refer to [TRIAGED](https://github.com/maomlab/TRIAGED/)

Run the following to install TRIAGED: 
```
git clone https://github.com/maomlab/TRIAGED.git
conda create --name ccd_pkl python=3.12
cd covalent_module/
conda env create -f ccd_pkl.yml
```

## Running Boltz2 on a single covalent ligand
All scripts can be run on terminal from the PROJECT DIRECTORY 

Please use **ccd_pkl** environment to run the below AFTER installing the boltz package!!
### **Main Script: `single_inference.py`**
#### **Required run Args:**
- **prot_file**: Path to either a PDB file or a TXT file with a single chain sequence.
- **res_idx**: Index of the residue to be covalently targeted by a covalent ligand. Starting at 1.
- **lig_chain**: Chain interacting with ligand in PDB file. Single character.
- **smiles**: SMILES string.
- **compound_id**: 5 character compound ID. 
- **msa_path**: Path to MSA file in csv format. If provided, will be added to yaml.
- **outdir**: Output directory for all jobs.
 
#### **Please write other varibales into `TRIAGGED/covalent_module/.env`**
Varaible names are case sensitive.
#### **Required in .env:**
- **BOLTZ_CACHE**: Path to where you want boltz weights to be saved. 
- **SLURM_TEMPLATE**: for using Greatlakes to run jobs (use single_inferece.sh as a template for the slurm args).
##### **Additional in .env:**
- **CCD_DB**: Path to pkl files. 
- **VERBOSE**: Set to True for print statements. 
- **DEBUG**: Set to True if you don't want files to be remade before boltz run.


**correct line: `load_dotenv("TRIAGED/covalent_module/.env")` in single_inference.py if needed.**


In [None]:
import sys 
sys.path.append("TRIAGED/") # change path to package as needed
import argparse
from covalent_module import single_inference

In [None]:
args = argparse.Namespace(
    prot_file="TRIAGED/covalent_module/tutorial/example_input/5MAJ.pdb",
    res_idx=25,
    lig_chain="A",
    smiles="N#CC1=NC(N(C[C@H]2OCCC2)CC2CCCC2)=NC(N2CCOCC2)=N1",
    compound_id="VM834",
    msa_path="TRIAGED/covalent_module/tutorial/example_input/5MAJ_msa.csv",  
    outdir="TRIAGED/covalent_module/tutorial/example_output/5MAJ"
)

single_inference.main(args)

# Running Boltz2 on a list of covalent ligands (virtual screening)
### **Main Script: `submit_jobs.py `**

Please use **ccd_pkl** environment to run the below AFTER installing the boltz package.
#### **Required run Args:**
- **name** : Name of protein. Used for naming output files.
- **prot_file**: Path to either a PDB file or a TXT file with a single chain sequence.
- **res_idx**: Index of the residue to be covalently targeted by a covalent ligand. Starting at 1.
- **lig_chain**: Chain interacting with ligand in PDB file. Single character.
- **outdir**: Output directory for all jobs.
- **msa_path**: Path to MSA file in csv format. If provided, will be added to yaml. 

#### **Please write other varibales into `TRIAGGED/covalent_module/.env`**
Varaible names are case sensitive.
#### **Required in .env:**
- **BOLTZ_CACHE**: Path to where you want boltz weights to be saved. 
- **LIGAND_CSV**: CSV with vault_id and SMILES columns. 
- **SLURM_TEMPLATE**: for using Greatlakes to run jobs (use covalent_slurm.sh as a template).
##### **Additional in .env:**
- **CCD_DB**: Path to pkl files. 
- **COMPOUND_RECORD**: csv with vault_id and compound_id columns. Maps vault_id to 5 char compound_id for pkl making. 
- **VERBOSE**: Set to True for print statements. 
- **DEBUG**: Set to True if you don't want files to be remade before boltz run.


**correct line: `load_dotenv("TRIAGED/covalent_module/.env")` in submit_job.py if needed.**


In [1]:
import sys 
sys.path.append("TRIAGED/")
import argparse
from covalent_module import submit_job

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
args = argparse.Namespace(
    name="5MAJ",
    prot_file="TRIAGED/covalent_module/tutorial/example_input/5MAJ.pdb",
    res_idx="25",
    lig_chain="A",
    outdir="TRIAGED/covalent_module/tutorial/example_output/5MAJ",
    msa_path="TRIAGED/covalent_module/tutorial/example_input/5MAJ_msa.csv",  
)

submit_job.main(args)

Submitted batch job 36356723
