# Calculate NIBR filters

If you have worked in the field of drug design, many a times you will often end up designing molecules with unwanted substructures especially when using any of the generative algorithms out there.

Fortunately, RDKit has NIBR filters embedded in it which enables users to filter out such compounds.

This is inspired and taken from https://github.com/rdkit/rdkit/tree/master/Contrib/NIBRSubstructureFilters and
https://iwatobipen.wordpress.com/2021/03/20/novartiss-molecular-filter-for-hit-triage-chemoinformatics-rdkit/

In [7]:
# Import all libraries:

import os
import sys
import pandas as pd
from rdkit.Chem import Draw
from rdkit.Chem import RDConfig
from rdkit.Chem import FilterCatalog
from rdkit.Chem import PandasTools
from rdkit.Chem import RDConfig
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import rdDepictor
rdDepictor.SetPreferCoordGen(True)
from IPython import display
basedir = os.path.join(RDConfig.RDContribDir, 'NIBRSubstructureFilters')

sys.path.append(basedir)

import assignSubstructureFilters

In [8]:
# Now copy this block of function:

def buildFilterCatalog():

    inhousefilter = pd.read_csv(f'{basedir}/SubstructureFilter_HitTriaging_wPubChemExamples.csv')
    inhouseFiltersCat = FilterCatalog.FilterCatalog()
    for i in range(inhousefilter.shape[0]):
        mincount=1
        if inhousefilter['MIN_COUNT'][i] != 0:
            mincount = int(inhousefilter['MIN_COUNT'][i]) 
        pname = inhousefilter['PATTERN_NAME'][i]
        sname = inhousefilter['SET_NAME'][i]
        pname_final='{0}_min({1})__{2}__{3}__{4}'.format(pname,mincount,
                                                    inhousefilter['SEVERITY_SCORE'][i],
                                                    inhousefilter['COVALENT'][i],
                                                    inhousefilter['SPECIAL_MOL'][i])
        fil = FilterCatalog.SmartsMatcher(pname_final,inhousefilter['SMARTS'][i], mincount)
        inhouseFiltersCat.AddEntry(FilterCatalog.FilterCatalogEntry(pname_final,fil))
        inhouseFiltersCat.GetEntry(i).SetProp('Scope', sname)
    return inhouseFiltersCat

In [9]:
# Import a sample dataset containing a SMILES column:

df = pd.read_csv("data/DA_list.csv")
print(f"The dataframe consists of {df.shape[0]} molecules")
df.head(3)

The dataframe consists of 19 molecules


Unnamed: 0,Smiles,Energy (-315001),Vector
0,c1(C)c(COS(=O)(=O)NC)c(c2cc(COC(F)(F)F)cc3cNcc...,-315180,"[5, 27, 15, 9, 11, 4, 4, 4]"
1,c1(C(F)(F)F)c(COS(=O)(=O)NC)c(c2cc(COCF)cc(O)c...,-315177,"[5, 17, 15, 15, 7, 4, 4, 5]"
2,c1(N(C)C)c(COS(=O)(=O)NC)c(C2C(CC(=O)CC(F)(F)F...,-315172,"[5, 13, 11, 6, 11, 4, 4, 9]"


In [10]:
# And these lines of code will score the smiles and merge the resulting dataframe with the one above:

assignSubstructureFilters.buildFilterCatalog = buildFilterCatalog

res = assignSubstructureFilters.assignFilters(data=df, nameSmilesColumn='Smiles')

dfres = pd.DataFrame.from_records(res, columns=assignSubstructureFilters.FilterMatch._fields)

alldata = df.merge(dfres, how='left', left_index=True, right_index=True)

In [11]:
# Now let's look at what the output looks like:

alldata.head(3)

Unnamed: 0,Smiles,Energy (-315001),Vector,SubstructureMatches,Min_N_O_filter,Frac_N_O,Covalent,SpecialMol,SeverityScore
0,c1(C)c(COS(=O)(=O)NC)c(c2cc(COC(F)(F)F)cc3cNcc...,-315180,"[5, 27, 15, 9, 11, 4, 4, 4]",NIBR_Screeningdeck_2019_SO3_groups_min(1),no match,0.242424,0,0,10
1,c1(C(F)(F)F)c(COS(=O)(=O)NC)c(c2cc(COCF)cc(O)c...,-315177,"[5, 17, 15, 15, 7, 4, 4, 5]",NIBR_Screeningdeck_2019_alpha_halo_heteroatom_...,no match,0.25,0,0,10
2,c1(N(C)C)c(COS(=O)(=O)NC)c(C2C(CC(=O)CC(F)(F)F...,-315172,"[5, 13, 11, 6, 11, 4, 4, 9]",NIBR_Screeningdeck_2019_SO3_groups_min(1),no match,0.28125,0,0,10


#### This is how to interprete the computed scores:

__SubstructureMatches:__ the names of all filters that match the compound

__Min_N_O_filter:__ tests if no nitrogen or oxygen atom is contained in the molecule

__Frac_N_O:__ fraction of nitrogen and oxygen atoms compared to all heavy atoms

__Covalent:__ number of potentially covalent motifs contained in the compound

__SpecialMol:__ is the compound/parts of the compound belonging to a special class of molecules like peptides, glycosides, fatty acid,...

__SeverityScore:__ 0 -> compound has no flags, might have annotations; 1-9 number of flags the compound raises; >= 10 exclusion criterion for our newly designed screening deck