## Fingerprint File
This file calculates Morgan fingerprints from Rdkit and loads optimized fingerprints from Chemprop

Kevin Yang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez, Timothy Hopper, Brian Kelley, Miriam Mathea, Andrew Palmer, Volker Settels, Tommi Jaakkola, Klavs Jensen, and Regina Barzilay
Journal of Chemical Information and Modeling 2019 59 (8), 3370-3388
DOI: 10.1021/acs.jcim.9b00237

## Calculate fingerprints from Rdkit

In [11]:
import numpy as np
from rdkit import Chem, DataStructs
from rdkit.Chem import AllChem

def get_rdkit_fingerprint(df):
    MORGAN_RADIUS = 2
    MORGAN_NUM_BITS = 2048
    features = np.zeros((1,))
    fingerprints = np.zeros([df.shape[0],MORGAN_NUM_BITS])
    for i in range(df.shape[0]):
        mol = Chem.MolFromSmiles(df.smiles[i])
        features_vec = AllChem.GetHashedMorganFingerprint(mol, MORGAN_RADIUS, nBits=MORGAN_NUM_BITS)
        #features_vec = AllChem.GetMorganFingerprintAsBitVect(mol, MORGAN_RADIUS, nBits=MORGAN_NUM_BITS)
        DataStructs.ConvertToNumpyArray(features_vec, features)
        fingerprints[i,:] = features
    fingerprints=pd.DataFrame(fingerprints)
    return fingerprints

## Load optimized feature vector from Chemprop
Chemprop fingerprints are obtained from the message passing protocol via transfer learning done on the QM9 dataset

L. Ruddigkeit, R. van Deursen, L. C. Blum, J.-L. Reymond, Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17, J. Chem. Inf. Model. 52, 2864–2875, 2012.
R. Ramakrishnan, P. O. Dral, M. Rupp, O. A. von Lilienfeld, Quantum chemistry structures and properties of 134 kilo molecules, Scientific Data 1, 140022, 2014. [bibtex]

In [10]:
import pandas as pd

def get_cp_fingerprint():
    cp_input = pd.read_csv('./Data/ohfeatures.csv')
    cp_features = cp_input.iloc[:,cp_input.columns != 'smiles']
    return cp_features