# **iSIM**

***
Miranda-Quintana Group, Department of Chemistry, University of Florida

***
Please cite the original iSIM paper:

***
This notebook contains an example on how to generate simulated datasets of binary and real-value fingerprints and calculates iSIM values for them.

In [1]:
from isim_comp import calculate_isim
from isim_real import calculate_isim_real, pairwise_average_real
from isim_utils import pairwise_average
import numpy as np
import random

## Binary fingerprints
Please select the number of datasets, the desired range of elements in the set and number of bits in the fingerprint. Beware that for comparison the quadratic scaling way of fingerprints is computed, if large number of sets is simulated, the computing time will be large.

In [3]:
# Number of datasets to simulate
n_datasets = 100 #---> in the original paper, we used 10000 datasets

# Similarity index to use
n_ary = 'JT'

# Initialize lists to store iSIM and pairwise values
isim_values = []
pair_values = []

for rep in range(n_datasets):

    # Generate random binary fingerprints
    fp_total = random.randint(10, 100) #---> in the original paper, we used 100-1000 fingerprints
    fp_size = random.randint(166, 2049) #---> in the original paper, we used fingerprints of 166-2048 bits
    bias = np.random.uniform(0.01,1) #---> the bias is to cover the whole range of possible similarity values
    total_fingerprints = np.random.choice([int(0),int(1)], size=(fp_total, fp_size), p = [bias, 1 - bias])

    # Append values 
    isim_values.append(calculate_isim(total_fingerprints, n_ary = n_ary))
    pair_values.append(pairwise_average(total_fingerprints, n_ary = n_ary))


# Print results of the comparison of the iSIM and pairwise values
print('R2:', np.corrcoef(isim_values, pair_values)[0,1]**2)
print('MAE:', np.mean(np.abs(np.array(isim_values) - np.array(pair_values))))
print('RMSE:', np.sqrt(np.mean((np.array(isim_values) - np.array(pair_values))**2)))


R2: 0.9999997890555186
MAE: 4.077067238988768e-05
RMSE: 0.00013070820419047868


## Real-value fingerprints

In [4]:
# Number of datasets to simulate
n_datasets = 100 #---> in the original paper, we used 10000 datasets

# Similarity index to use
n_ary = 'JT'

# Initialize lists to store iSIM and pairwise values
isim_values = []
pair_values = []

# Generate random real-number fingerprints and compute isim and pairwise values
for rep in range(n_datasets):

    # Random number of fingerprints and size
    fp_total = random.randint(10, 100) # ---> in the original paper, we used 100-1000 fingerprints
    fp_size = random.randint(166, 2049) # ---> in the original paper, we used fingerprints of 166-2048 bits

    # Random matrix generation
    total_fingerprints = np.random.random((fp_total, fp_size))

    # Append values
    isim_values.append(calculate_isim_real(total_fingerprints, n_ary=n_ary))
    pair_values.append(pairwise_average_real(total_fingerprints, n_ary = n_ary))

# Print results of the comparison of the iSIM and pairwise values
print('R2:', np.corrcoef(isim_values, pair_values)[0,1]**2)
print('MAE:', np.mean(np.abs(np.array(isim_values) - np.array(pair_values))))
print('RMSE:', np.sqrt(np.mean((np.array(isim_values) - np.array(pair_values))**2)))

R2: 0.9997910836814017
MAE: 2.2956587628044645e-05
RMSE: 3.48212293794904e-05
