# Lecture 6: Exercises and Activities

## **Activity 2a:** 
Compute the Tanimoto similarity scores between the seven compounds used in this section, using the PubChem fingerprints

- Download the PubChem Fingerprint for the seven CIDs.
- Convert the downloaded fingerprints into bit vectors.
- Compute the pair-wise Tanimoto scores using the bit vectors.

In [1]:
cids = [    54454,  # Simvastatin (Zocor)
            54687,  # Pravastatin (Pravachol)
            60823,  # Atorvastatin (Lipitor)
           446155,  # Fluvastatin (Lescol)   
           446157,  # Rosuvastatin (Crestor)
          5282452,  # Pitavastatin (Livalo)
         97938126 ] # Lovastatin (Altoprev)

In [40]:
from base64 import b64decode
from rdkit import DataStructs, Chem
import requests
import time

def PCFP_BitString(pcfp_base64) :

    pcfp_bitstring = "".join( ["{:08b}".format(x) for x in b64decode( pcfp_base64 )] )[32:913]
    return pcfp_bitstring

Retrieves/Downloads the Base64-encoded PubChem Substructure Fingerprint of the molecule corresponding to each cid in the list:

In [104]:
prolog = "https://pubchem.ncbi.nlm.nih.gov/rest/pug"

str_cid = ",".join([ str(x) for x in cids])

url = prolog + "/compound/cid/" + str_cid + "/property/Fingerprint2D/txt"
res = requests.get(url)
pcfps = res.text.split()


Converts the downloaded fingerprints into bit vectors:


In [68]:
bitvect = [DataStructs.CreateFromBitString(PCFP_BitString(x)) for x in pcfps]

In [70]:
# Using the zip function, we can iterate over multiple variables 
##corresponding to their lists:
for x,i in zip(cids, pcfps):
    print(x, ":", i, '\n', PCFP_BitString(i))

54454 : AAADcfB4OAAAAAAAAAAAAAAAAAAAAAAAAAAkQIAAAAAAAACAAAAAGgAACAAADxSggAICCAAABgCIAiDSCAAAAAAgAAAICAEAAAgIEBYAAQACQAAF4AAIgAOIzPDPgAAAAAAAAAAAAAAAAAAAAAAAAAAAAA== 
 11110000011110000011100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001001000100000010000000000000000000000000000000000000000000000000000000100000000000000000000000000000000001101000000000000000000000100000000000000000000000111100010100101000001000000000000010000000100000100000000000000000000000011000000000100010000000001000100000110100100000100000000000000000000000000000000000001000000000000000000000000010000000100000000001000000000000000000001000000010000001000000010110000000000000000100000000000000100100000000000000000001011110000000000000000010001000000000000011100010001100110011110000110011111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Computes the pair-wise Fingerprint & Tanimoto scores using the bit vectors:

In [103]:
print ('Fingerprint Similarity Scores:', '\n##############################\n')
for i in range(0, len(bitvect)) :
    for j in range(i+1, len(bitvect)) :
        
        score = DataStructs.FingerprintSimilarity(bitvect[i], bitvect[j])
        
        print(cids[i], "vs.", cids[j], ":", round(score,3), end='')
        
        if ( score >= 0.85 ):
            print(" ****")
        elif ( score >= 0.75 ):
            print(" ***")
        elif ( score >= 0.65 ):
            print(" **")
        elif ( score >= 0.55 ):
            print(" *")
        else:
            print(" ")
            
print ('\nTanimoto Similarity Scores:', '\n###########################\n')
for i in range(0, len(bitvect)) :
    for j in range(i+1, len(bitvect)) :
        
        score = DataStructs.TanimotoSimilarity(bitvect[i], bitvect[j])
        if score >= .55:
            print(cids[i], "vs.", cids[j], ":", round(score,3))
            

Fingerprint Similarity Scores: 
##############################

54454 vs. 54687 : 0.897 ****
54454 vs. 60823 : 0.392 
54454 vs. 446155 : 0.388 
54454 vs. 446157 : 0.387 
54454 vs. 5282452 : 0.424 
54454 vs. 97938126 : 0.864 ****
54687 vs. 60823 : 0.397 
54687 vs. 446155 : 0.425 
54687 vs. 446157 : 0.416 
54687 vs. 5282452 : 0.446 
54687 vs. 97938126 : 0.813 ***
60823 vs. 446155 : 0.793 ***
60823 vs. 446157 : 0.667 **
60823 vs. 5282452 : 0.74 **
60823 vs. 97938126 : 0.377 
446155 vs. 446157 : 0.722 **
446155 vs. 5282452 : 0.868 ****
446155 vs. 97938126 : 0.372 
446157 vs. 5282452 : 0.741 **
446157 vs. 97938126 : 0.372 
5282452 vs. 97938126 : 0.407 

Tanimoto Similarity Scores: 
###########################

54454 vs. 54687 : 0.897
54454 vs. 97938126 : 0.864
54687 vs. 97938126 : 0.813
60823 vs. 446155 : 0.793
60823 vs. 446157 : 0.667
60823 vs. 5282452 : 0.74
446155 vs. 446157 : 0.722
446155 vs. 5282452 : 0.868
446157 vs. 5282452 : 0.741
