# Part 2: Mark compounds by Lipinski's rule of five

For promising compounds in not only the potency, but also the drug-likeness important. In this part the orally bioavailability of the compounds fetched from ChEMBL with be examined using Lipinski's rule-of-5.

The Lipinski’s rule-of-5 describes a number of rules for determining whether a drug is suitable for oral absorption. The rules all contain multiples of 5, namely a molecular weight less than 500, less than 5 hydrogen bond donor groups and less than 10 hydrogen acceptor groups. In addition, a compound should have a LogP of less than +5. These rules are based on an analysis of compounds from the World Drug Index database.

Import required libraries

In [1]:
from pathlib import Path
import math

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import matplotlib.patches as mpatches
from rdkit import Chem
from rdkit.Chem import Descriptors, Draw, PandasTools

Define path to this notebook

In [2]:
HERE = Path(_dh[-1])
DATA = HERE / "data"

## Investigate compliance with Ro5

Define function to test if the molecule fulfills the lipinski's rule of five.

In [3]:
def calculate_ro5_properties(smiles):
    """
    Test if input molecule (SMILES) fulfills Lipinski's rule of five.

    Parameters
    ----------
    smiles : str
        SMILES for a molecule.

    Returns
    -------
    pandas.Series
        Molecular weight, number of hydrogen bond acceptors/donor and logP value
        and Lipinski's rule of five compliance for input molecule.
    """
    # RDKit molecule from SMILES
    molecule = Chem.MolFromSmiles(smiles)
    # Calculate Ro5-relevant chemical properties
    molecular_weight = Descriptors.ExactMolWt(molecule)
    n_hba = Descriptors.NumHAcceptors(molecule)
    n_hbd = Descriptors.NumHDonors(molecule)
    logp = Descriptors.MolLogP(molecule)
    # Check if Ro5 conditions fulfilled
    conditions = [molecular_weight <= 500, n_hba <= 10, n_hbd <= 5, logp <= 5]
    ro5_fulfilled = sum(conditions) >= 3
    # Return True if no more than one out of four conditions is violated
    return pd.Series(
        [molecular_weight, n_hba, n_hbd, logp, ro5_fulfilled],
        index=["molecular_weight", "n_hba", "n_hbd", "logp", "ro5_fulfilled"],
    )

Read the data from part 1

In [4]:
molecules = pd.read_csv(DATA/"BACE_compounds.csv", index_col=0)
print(molecules.shape)
molecules.head()

(6691, 5)


Unnamed: 0,molecule_chembl_id,IC50,units,smiles,pIC50
0,CHEMBL3969403,0.0002,nM,CC1(C)C(N)=N[C@](C)(c2cc(NC(=O)c3ccc(C#N)cn3)c...,12.69897
1,CHEMBL3937515,0.0009,nM,COc1cnc(C(=O)Nc2ccc(F)c([C@]3(C)CS(=O)(=O)C(C)...,12.045757
2,CHEMBL3949213,0.001,nM,C[C@@]1(c2cc(NC(=O)c3ccc(C#N)cn3)ccc2F)CS(=O)(...,12.0
3,CHEMBL3955051,0.0018,nM,CC1(C)C(N)=N[C@](C)(c2cc(NC(=O)c3cnc(C(F)F)cn3...,11.744727
4,CHEMBL3936264,0.0057,nM,C[C@@]1(c2cc(NC(=O)c3ccc(OC(F)F)cn3)ccc2F)CS(=...,11.244125


Apply function to all molecules in the dataset, to mark if they fulfill the rule of 5. 

In [5]:
ro5_properties = molecules["smiles"].apply(calculate_ro5_properties)
ro5_properties.head()

Unnamed: 0,molecular_weight,n_hba,n_hbd,logp,ro5_fulfilled
0,429.127089,7,2,2.12408,True
1,435.137653,8,2,1.656,True
2,455.142739,7,2,2.65828,True
3,455.123895,7,2,2.585,True
4,442.092261,7,2,2.0752,True


In [6]:
molecules = pd.concat([molecules, ro5_properties], axis=1) # Combine data to full dataset with all columns
molecules.head() # Show first 5 rows

Unnamed: 0,molecule_chembl_id,IC50,units,smiles,pIC50,molecular_weight,n_hba,n_hbd,logp,ro5_fulfilled
0,CHEMBL3969403,0.0002,nM,CC1(C)C(N)=N[C@](C)(c2cc(NC(=O)c3ccc(C#N)cn3)c...,12.69897,429.127089,7,2,2.12408,True
1,CHEMBL3937515,0.0009,nM,COc1cnc(C(=O)Nc2ccc(F)c([C@]3(C)CS(=O)(=O)C(C)...,12.045757,435.137653,8,2,1.656,True
2,CHEMBL3949213,0.001,nM,C[C@@]1(c2cc(NC(=O)c3ccc(C#N)cn3)ccc2F)CS(=O)(...,12.0,455.142739,7,2,2.65828,True
3,CHEMBL3955051,0.0018,nM,CC1(C)C(N)=N[C@](C)(c2cc(NC(=O)c3cnc(C(F)F)cn3...,11.744727,455.123895,7,2,2.585,True
4,CHEMBL3936264,0.0057,nM,C[C@@]1(c2cc(NC(=O)c3ccc(OC(F)F)cn3)ccc2F)CS(=...,11.244125,442.092261,7,2,2.0752,True


Separate the data that violate the ro5 from the data that fulfill, and count how much fulfill and how much violate the ro5

In [7]:
molecules_ro5_fulfilled = molecules[molecules["ro5_fulfilled"]]
molecules_ro5_violated = molecules[~molecules["ro5_fulfilled"]]

print(f"# compounds in unfiltered data set: {molecules.shape[0]}")
print(f"# compounds in filtered data set: {molecules_ro5_fulfilled.shape[0]}")
print(f"# compounds not compliant with the Ro5: {molecules_ro5_violated.shape[0]}")

# compounds in unfiltered data set: 6691
# compounds in filtered data set: 5921
# compounds not compliant with the Ro5: 770


So 770 (11.5%) of the compounds violate more than one rule of the Ro5, which would mean that these compounds are not expected to be orally available. However, it is later determined that the rules do not describe the properties of poor oral availability directly, and therefore they should only be viewed as general rules of thumb. For this reason, the compounds that violate the Ro5 are not removed from the list, but only marked. Future research can look at those compounds manually.

Save the data to a csv file of the molecules with marking if the ro5 is fulfilled 

In [8]:
molecules.to_csv(DATA/"BACE_compounds_lipinski.csv")
molecules.head()

Unnamed: 0,molecule_chembl_id,IC50,units,smiles,pIC50,molecular_weight,n_hba,n_hbd,logp,ro5_fulfilled
0,CHEMBL3969403,0.0002,nM,CC1(C)C(N)=N[C@](C)(c2cc(NC(=O)c3ccc(C#N)cn3)c...,12.69897,429.127089,7,2,2.12408,True
1,CHEMBL3937515,0.0009,nM,COc1cnc(C(=O)Nc2ccc(F)c([C@]3(C)CS(=O)(=O)C(C)...,12.045757,435.137653,8,2,1.656,True
2,CHEMBL3949213,0.001,nM,C[C@@]1(c2cc(NC(=O)c3ccc(C#N)cn3)ccc2F)CS(=O)(...,12.0,455.142739,7,2,2.65828,True
3,CHEMBL3955051,0.0018,nM,CC1(C)C(N)=N[C@](C)(c2cc(NC(=O)c3cnc(C(F)F)cn3...,11.744727,455.123895,7,2,2.585,True
4,CHEMBL3936264,0.0057,nM,C[C@@]1(c2cc(NC(=O)c3ccc(OC(F)F)cn3)ccc2F)CS(=...,11.244125,442.092261,7,2,2.0752,True


Besides the oral bioavailability, it is also important that a drug does not have unwanted side effects. This will be examined in the next part.