# Ligand Selection and Preparation, Docking and Analysis

This notebook contains the code necessary to; prepare the ligands for docking, run docking in smina, and analyse the results of the docking.

## Ligand Selection

Ligand selection step involves;

- Identification of ligands (from generative model, DB's, etc) and extraction as smiles strings
- Filtering of ligands (based on pharmacophor matching score, eos models, synthetic likelihood, etc)

## Ligand Preparation

- Conversion of smiles strings to 3D conformers using RDKIT 
- Protonation at a specific pH (7.4) and conversion to .pdbqt via obabel

In [2]:
# Ensure docking-env is activated as kernel.

# Import all libraries that are required.

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.rdmolfiles import MolToPDBFile


In [3]:
import os

DATAPATH = '../data'
RESULTSPATH = '../results'

In [4]:
# Update doc string
def prepare_ligands(csv_file_name,  pH, header_len=1, output_dir='', delim=',') -> list:

    """
    Takes a csv file of smiles, generates 3D coordinates,
    protonates for a specific pH, and outputs a pdbqt file 
    for each of the compounds therein. 

      
    csv_file_name (str): File path to a csv file containing the smiles input, among other information.
    
    header_len (int): num of rows to skip in the csv file.
    
    pH (float): pH that the compounds will be protonated at.
    
    output_dir (str): str appended to the produced file name so output files can be conveniently stored in an
                        output directory
    
    delim (str): delimeter used in the csv file

        
    returns: list of strs of the produced pdbqt file's paths.
    
    """
    
    out_pdbqts = [] # Creates an empty list to fill.
    
    print(csv_file_name)
    
    with open(csv_file_name, 'r') as csv: 
        
        
        for entry in csv.readlines()[header_len:]:
            
            Similarity, ID, SMILES = entry.split(delim)[:3]  #Note that this must be changed depending on the headings in the .csv file.
            
            # Convert smiles str to 3D coordinates
            mol = Chem.MolFromSmiles(SMILES)
            mol = Chem.AddHs(mol)
            AllChem.EmbedMolecule(mol)
            
    
            # Ouput coords to pdb
            pdb_name = f"{output_dir}/{ID}.pdb"
            MolToPDBFile(mol, pdb_name)
            
#             print(pdb_name)
            # Protonate according to pH
            ! obabel {pdb_name} -pH {pH} -O {pdb_name}
            
            
            # Also create a pdbqt for vina
            pdbqt_str = pdb_name + 'qt'
            ! obabel {pdb_name} -pH {pH} -O {pdbqt_str}
            
            
            out_pdbqts.append(pdbqt_str)
#             print()
            
    return out_pdbqts
        
        
        

In [5]:
pH = 7.5

In [6]:
myfile = os.path.join(DATAPATH, "sim29", "smiles3.csv")

ligands = prepare_ligands('myfile', pH, output_dir='data/')

# this hasn't worked - I will looks at this again tomorrow... the issue is being able to define csv file... I can do this, and it works, when working locally, but I can't define it for GH.

myfile


FileNotFoundError: [Errno 2] No such file or directory: 'myfile'