# Combine Results
Author: Doug Klink (dklink@stanford.edu)

Herein we combine the results generated by KNN, Random Forest, and SVM.  We create a final output table which has the score each gave to each compound (or NaN if compound unscored by one method), and a "combined_score" column which is the weighted average of the three methods' scores, the weights being the inverse of the RMSE each method achieved in its regression validation.

In [1]:
import pandas as pd
import numpy as np

In [2]:
knn = pd.read_csv('../results/knn_results.csv', index_col=0).reset_index(drop=True)
rf = pd.read_csv('../results/random_forest_results.csv', index_col=0).reset_index(drop=True)
svm = pd.read_csv('../results/svm_screening_results_no_duplicate_names.csv', index_col=0).reset_index(drop=True)

In [3]:
knn_RMSE = knn['RMSE'][0]
rf_RMSE = rf['RMSE'][0]
svm_RMSE = svm['RMSE'][0]
print(f'Method          RMSE     Weight (1/RMSE)')
print(f'knn            {knn_RMSE: .2f}    {1/knn_RMSE: .2f}')
print(f'random forest  {rf_RMSE: .2f}    {1/rf_RMSE: .2f}')
print(f'svm            {svm_RMSE: .2f}    {1/svm_RMSE: .2f}')

Method          RMSE     Weight (1/RMSE)
knn             0.72     1.40
random forest   0.35     2.87
svm             0.75     1.33


In [4]:
knn.drop(columns=['source', 'RMSE'], inplace=True)
rf.drop(columns=['smiles', 'RMSE'], inplace=True)
svm.drop(columns=['source', 'RMSE'], inplace=True)

In [5]:
knn = knn.rename(columns={'predicted_acvalue(log10)': 'knn_acvalue'})
rf = rf.rename(columns={'predicted_activity(log10)': 'random_forest_acvalue'})
svm = svm.rename(columns={'pred_value': 'svm_acvalue'})

In [6]:
rf['random_forest_acvalue'] = -rf.random_forest_acvalue
svm['name'] = svm.name.str.upper()

In [7]:
combined = knn.merge(svm, on='name', how='inner').merge(rf, on='name', how='inner')

In [8]:
combined['combined_acvalue'] = np.average([combined.knn_acvalue, combined.svm_acvalue, combined.random_forest_acvalue],
                                          weights = [1/knn_RMSE, 1/svm_RMSE, 1/rf_RMSE], axis=0)

In [9]:
combined.sort_values(by='combined_acvalue', inplace=True)
combined.head(20)

Unnamed: 0,name,knn_acvalue,svm_acvalue,random_forest_acvalue,combined_acvalue
0,NAFAMOSTAT,-4.57386,-4.573851,-2.797827,-3.662878
1,RWJ-56423,-3.215892,-2.05168,-2.365654,-2.5029
3,RWJ-51084,-3.00353,-1.887822,-2.49919,-2.479495
8,RWJ-58643,-2.847431,-2.231559,-1.959521,-2.245526
52,SUBSTANCE-P,-1.560627,-3.270026,-1.214972,-1.789869
41,"SAR9, MET (O2)11-SUBSTANCE P",-1.67223,-2.878899,-1.144704,-1.688623
42,"[SAR9,MET(O2)11]-SUBSTANCE-P",-1.67223,-2.878899,-1.144704,-1.688623
16,GONADORELIN,-2.406312,-1.594884,-0.980204,-1.481841
22,GOSERELIN-ACETATE,-2.220121,-1.312182,-1.044103,-1.400975
23,ANTAGONIST-G,-2.211524,-1.588432,-0.890731,-1.385863


In [10]:
combined.to_csv('../results/combined_results.csv')