# KNN

- **Categorical features**: conc1_type, exposure_type, control_type, media_type, application_freq_unit, class, tax_order, family, genus, species

- **Non Categorical features**: obs_duration_mean, conc1_mean, atom_number, alone_atom_number, bonds_number, doubleBond, tripleBond, ring_number, Mol, MorganDensity, LogP, oh_count

It turns out that *obs_duration_mean* have to be considered as a categorical feature in order to maxime the metrics.

In [1]:
from helper_knn import *

X_try, X_train, X_test, y_train, y_test, len_X_train = load_data_knn('data/lc_db_processed.csv',
                                                                     encoding = 'binary', seed = 42)

# Best combination
categorical = ['class', 'tax_order', 'family', 'genus', "species", 'control_type', 'media_type',
               'application_freq_unit',"exposure_type", "conc1_type", 'obs_duration_mean']

non_categorical = ['ring_number', 'tripleBond', 'doubleBond', 'alone_atom_number', 'oh_count',
                   'atom_number', 'bonds_number', 'Mol', 'MorganDensity', 'LogP']

# OLD and NOT GOLD
# categorical = ['class', 'tax_order', 'family', 'genus', 'species', 'control_type', 'media_type',
#                'application_freq_unit', 'exposure_type', 'conc1_type']

# non_categorical = ['ring_number', 'tripleBond',  'doubleBond', 'alone_atom_number', 'oh_count',
#                    'atom_number', 'bonds_number', 'Mol', 'MorganDensity', 'LogP', 'obs_duration_mean']

## BINARY -- K = 1
### Finding the best alpha_1 for the problem

START: alpha_1 = 0, alpha_2 = 1, alpha_3 = 0

END: alpha_1 = 0.0069519279617756054, alpha_2 = 1, alpha_3 = 0

In [2]:
c = [0,0]
ham = np.logspace(-3, -1, 20) 

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_ham = ham, choice = c, ks = [1])

Mon Sep 14 11:08:06 2020
START...
Computing Euclidean ...
Adding Hamming 1 (Categorical)... alpha = 0.001
Start CV...
New best params found! alpha:0.001, k:1, leaf:10,
                                                        acc:  0.8976980539183801, st.error:  0.0026286215760152913,
                                                        rmse: 0.31974227013083456, st.error:  0.004087382206260621
New best params found! alpha:0.001, k:1, leaf:40,
                                                        acc:  0.8984766658275387, st.error:  0.0013155183512758776,
                                                        rmse: 0.3186007056220668, st.error:  0.0020569728718740753
New best params found! alpha:0.001, k:1, leaf:80,
                                                        acc:  0.8997000774068564, st.error:  0.001976147553967034,
                                                        rmse: 0.3166398848354725, st.error:  0.003126736498423271
Adding Hamming 1 (Categorical)... alpha =

### Finding the best alpha_3, fixing best_alpha_1
START: alpha_1 = 0.0069519279617756054, alpha_2 = 1, alpha_3 = 0

END: alpha_1 = 0.0069519279617756054, alpha_2 = 1, alpha_3 = 0.0069519279617756054

In [2]:
c = [0,1]
al_ham = 0.0069519279617756054
pub = np.logspace(-3, -1, 20)

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_pub = pub, a_ham = al_ham, choice = c, ks = [1])

Mon Sep 14 12:29:06 2020
START...
Computing Basic Matrix: Hamming 1 and Euclidean 2...
Adding Hamming 3 (Pubchem2d)... alpha = 0.001
Start CV...
New best params found! alpha:0.001, k:1, leaf:10,
                                                        acc:  0.8977536249017346, st.error:  0.002539752081748882,
                                                        rmse: 0.3196624298760987, st.error:  0.003946708247816731
New best params found! alpha:0.001, k:1, leaf:40,
                                                        acc:  0.8985878387013575, st.error:  0.0013189399299914412,
                                                        rmse: 0.31842614944471437, st.error:  0.0020584368131552197
New best params found! alpha:0.001, k:1, leaf:80,
                                                        acc:  0.8998112657342301, st.error:  0.0019271779321472723,
                                                        rmse: 0.3164673314241388, st.error:  0.0030480488453350613
Adding Hammin

### Finding again the best alpha_1, fixing best_alpha_3

START: alpha_1 = 0.0069519279617756054, alpha_2 = 1, alpha_3 = 0.0069519279617756054

END: alpha_1 = 0.009473684210526315, alpha_2 = 1, alpha_3 = 0.0069519279617756054

In [2]:
c = [1,0]
al_pub = 0.0069519279617756054

ham = np.linspace(0.005,0.01,20)

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_ham = ham, a_pub = al_pub, choice = c, ks = [1])

Mon Sep 14 16:51:44 2020
START...
Computing Euclidean and Pubchem2d Matrix...
Adding Hamming 1 (Categorical)... alpha = 0.005
Start CV...
New best params found! alpha:0.005, k:1, leaf:10,
                                                        acc:  0.8245314057187116, st.error:  0.005742672577797003,
                                                        rmse: 0.4186730767961923, st.error:  0.006735151210219632
New best params found! alpha:0.005, k:1, leaf:30,
                                                        acc:  0.827031914527006, st.error:  0.0024800733461741212,
                                                        rmse: 0.4158523496638812, st.error:  0.0029541814442851045
Adding Hamming 1 (Categorical)... alpha = 0.005263157894736842
Start CV...
Adding Hamming 1 (Categorical)... alpha = 0.005526315789473685
Start CV...
Adding Hamming 1 (Categorical)... alpha = 0.005789473684210527
Start CV...
New best params found! alpha:0.005789473684210527, k:1, leaf:30,
             

### Finding again the best alpha_3, fixing best_alpha_1

START: alpha_1 = 0.009473684210526315, alpha_2 = 1, alpha_3 = 0.0069519279617756054

END: alpha_1 = 0.009473684210526315, alpha_2 = 1, alpha_3 = 0.007105263157894737

In [2]:
c = [0,1]
al_ham = 0.009473684210526315
pub = np.linspace(0.005,0.01,20)

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_pub = pub, a_ham = al_ham, choice = c, ks = [1])

Mon Sep 14 21:28:13 2020
START...
Computing Basic Matrix: Hamming 1 and Euclidean 2...
Adding Hamming 3 (Pubchem2d)... alpha = 0.005
Start CV...
New best params found! alpha:0.005, k:1, leaf:10,
                                                        acc:  0.8977536249017346, st.error:  0.0025060535258592765,
                                                        rmse: 0.3196648161329612, st.error:  0.003898090075453123
New best params found! alpha:0.005, k:1, leaf:40,
                                                        acc:  0.898532267718003, st.error:  0.0014565415219307638,
                                                        rmse: 0.3185074989524941, st.error:  0.002275158951814106
New best params found! alpha:0.005, k:1, leaf:80,
                                                        acc:  0.899588858172373, st.error:  0.0019329533882988246,
                                                        rmse: 0.3168182050385376, st.error:  0.0030564188107250296
Adding Hamming 3

## Final model -- BINARY -- K = 1

In [2]:
y = np.append(y_train,y_test)

del X_train, X_test, y_train, y_test

ham = 0.009473684210526315
pub = 0.007105263157894737
k = 1
leaf = 40

cv_binary_knn(X_try, y, ham, pub, k, leaf)

Basic Matrix... Tue Sep 22 00:23:49 2020
Adding pubchem2d Tue Sep 22 00:26:50 2020
End distance matrix... Tue Sep 22 00:40:05 2020
Accuracy: 	 0.9105265658811724, se: 0.001596093283967252
    RMSE: 		 0.29907364869243597, se: 0.0026639621159533205
    Sensitivity: 	 0.9305930270523051, se: 0.0025662688577057476
    Precision: 	 0.9257884455755866, se: 0.0008814897322470156
    Specificity: 	 0.877670405685049, se: 0.0016028283693755635


## BINARY -- K = 5
### Finding the best alpha_1 for the problem

START: alpha_1 = 0, alpha_2 = 1, alpha_3 = 0

END: alpha_1 = 0.004281332398719396, alpha_2 = 1, alpha_3 = 0

In [2]:
c = [0,0]
ham = np.logspace(-3, -1, 20) 

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_ham = ham, choice = c, ks = [5])

Sun Sep 20 20:36:42 2020
START...
Computing Euclidean ...
Adding Hamming 1 (Categorical)... alpha = 0.001
Start CV...
New best params found! alpha:0.001, k:5, leaf:10,
                                                        acc:  0.8761812967788764, st.error:  0.002418811517636534,
                                                        rmse: 0.3518115974103475, st.error:  0.0034388642275861754
New best params found! alpha:0.001, k:5, leaf:20,
                                                        acc:  0.880184540170975, st.error:  0.002459028612155808,
                                                        rmse: 0.34607134161526065, st.error:  0.0035385852271274066
Adding Hamming 1 (Categorical)... alpha = 0.0012742749857031334
Start CV...
Adding Hamming 1 (Categorical)... alpha = 0.001623776739188721
Start CV...
Adding Hamming 1 (Categorical)... alpha = 0.00206913808111479
Start CV...
Adding Hamming 1 (Categorical)... alpha = 0.0026366508987303583
Start CV...
Adding Hamming 1 (Cat

### Finding the best alpha_3, fixing best_alpha_1
START: alpha_1 = 0.004281332398719396, alpha_2 = 1, alpha_3 = 0

END: alpha_1 = 0.004281332398719396, alpha_2 = 1, alpha_3 = 0.0379269019073225

In [2]:
c = [0,1]
al_ham = 0.004281332398719396
pub = np.logspace(-3, -1, 20)

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_pub = pub, a_ham = al_ham, choice = c, ks = [5])

Sun Sep 20 22:23:14 2020
START...
Computing Basic Matrix: Hamming 1 and Euclidean 2...
Adding Hamming 3 (Pubchem2d)... alpha = 0.001
Start CV...
New best params found! alpha:0.001, k:5, leaf:10,
                                                        acc:  0.8766817292466099, st.error:  0.001168452234182648,
                                                        rmse: 0.3511512650906884, st.error:  0.0016628122713671133
New best params found! alpha:0.001, k:5, leaf:20,
                                                        acc:  0.8801288301056266, st.error:  0.0026295035397560017,
                                                        rmse: 0.34614127610917766, st.error:  0.0037877060301538105
Adding Hamming 3 (Pubchem2d)... alpha = 0.0012742749857031334
Start CV...
New best params found! alpha:0.0012742749857031334, k:5, leaf:40,
                                                        acc:  0.8804630132299429, st.error:  0.0021034171628869567,
                                     

## Finding again the best alpha_1, fixing best_alpha_3

START: alpha_1 = 0.004281332398719396, alpha_2 = 1, alpha_3 = 0.0379269019073225

END: alpha_1 = 0.014399033208816327 alpha_2 = 1, alpha_3 = 0.0379269019073225

In [2]:
c = [1,0]
al_pub = 0.0379269019073225

ham = np.logspace(-2.8,-1.8,25)

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_ham = ham, a_pub = al_pub, choice = c, ks = [5])

Mon Sep 21 05:45:16 2020
START...
Computing Euclidean and Pubchem2d Matrix...
Adding Hamming 1 (Categorical)... alpha = 0.001584893192461114
Start CV...
New best params found! alpha:0.001584893192461114, k:5, leaf:10,
                                                        acc:  0.8432674347392515, st.error:  0.002852271713141206,
                                                        rmse: 0.395829532552304, st.error:  0.0035897917827189354
Adding Hamming 1 (Categorical)... alpha = 0.0017444826989992542
Start CV...
Adding Hamming 1 (Categorical)... alpha = 0.001920141938638803
Start CV...
New best params found! alpha:0.001920141938638803, k:5, leaf:30,
                                                        acc:  0.8442678824287364, st.error:  0.0014235023780417933,
                                                        rmse: 0.39461252835898064, st.error:  0.001807625057665591
Adding Hamming 1 (Categorical)... alpha = 0.0021134890398366475
Start CV...
Adding Hamming 1 (Categorical)

### Finding again the best alpha_3, fixing best_alpha_1

START: alpha_1 = 0.014399033208816327, alpha_2 = 1, alpha_3 = 0.0379269019073225

END: alpha_1 = 0.014399033208816327, alpha_2 = 1, alpha_3 = 0.001920141938638803

In [2]:
c = [0,1]
al_ham = 0.014399033208816327
pub = np.logspace(-2.8,-1.8,25)

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_pub = pub, a_ham = al_ham, choice = c, ks = [5])

Mon Sep 21 19:55:01 2020
START...
Computing Basic Matrix: Hamming 1 and Euclidean 2...
Adding Hamming 3 (Pubchem2d)... alpha = 0.001584893192461114
Start CV...
New best params found! alpha:0.001584893192461114, k:5, leaf:10,
                                                        acc:  0.8762924696526954, st.error:  0.001551797447186532,
                                                        rmse: 0.351693399161556, st.error:  0.0021956396274883747
New best params found! alpha:0.001584893192461114, k:5, leaf:20,
                                                        acc:  0.8789057585045162, st.error:  0.0026005967801567328,
                                                        rmse: 0.34790555614969515, st.error:  0.003740504499899162
New best params found! alpha:0.001584893192461114, k:5, leaf:40,
                                                        acc:  0.879294987191321, st.error:  0.0010786989108609874,
                                                        rmse: 0.347412

## Final model -- BINARY -- K = 5

In [3]:
y = np.append(y_train,y_test)

del X_train, X_test, y_train, y_test

ham = 0.014399033208816327
pub = 0.001920141938638803
k = 5
leaf = 60

cv_binary_knn(X_try, y, ham, pub, k, leaf)

Basic Matrix... Mon Sep 21 23:55:06 2020
Adding pubchem2d Mon Sep 21 23:57:34 2020
End distance matrix... Tue Sep 22 00:10:05 2020
Accuracy: 	 0.894099848325774, se: 0.0022209701187569154
    RMSE: 		 0.32535135480944166, se: 0.003414952338134494
    Sensitivity: 	 0.9221661678227073, se: 0.0019946591404273245
    Precision: 	 0.9086972914663299, se: 0.002549905255769359
    Specificity: 	 0.8481329384973784, se: 0.0035147439046552915


# KNN -- MULTICLASS

In [1]:
from helper_knn import *

X_try, X_train, X_test, y_train, y_test, len_X_train = load_data_knn('data/lc_db_processed.csv',
                                                                     encoding = 'multiclass', seed = 42)

# Best combination
categorical = ['class', 'tax_order', 'family', 'genus', "species", 'control_type', 'media_type',
               'application_freq_unit',"exposure_type", "conc1_type", 'obs_duration_mean']

non_categorical = ['ring_number', 'tripleBond', 'doubleBond', 'alone_atom_number', 'oh_count',
                   'atom_number', 'bonds_number', 'Mol', 'MorganDensity', 'LogP']

## MULTICLASS -- K = 1
### Finding the best alpha_1 for the problem

START: alpha_1 = 0, alpha_2 = 1, alpha_3 = 0

END: alpha_1 = 0.023357214690901212, alpha_2 = 1, alpha_3 = 0

In [5]:
c = [0,0]
ham = np.logspace(-3, -1, 20) 

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_ham = ham, choice = c, ks = [1])

Tue Sep 22 22:41:52 2020
START...
Computing Euclidean ...
Adding Hamming 1 (Categorical)... alpha = 0.001
Start CV...
New best params found! alpha:0.001, k:1, leaf:10,
                                                        acc:  0.7255086885294288, st.error:  0.003222136760733884,
                                                        rmse: 0.7183435526109299, st.error:  0.007361166388361987
New best params found! alpha:0.001, k:1, leaf:20,
                                                        acc:  0.7255637340919174, st.error:  0.003546667786245106,
                                                        rmse: 0.7304215945475012, st.error:  0.005548520794540555
New best params found! alpha:0.001, k:1, leaf:80,
                                                        acc:  0.7267881501523026, st.error:  0.0033562206883952293,
                                                        rmse: 0.712964330967776, st.error:  0.005733241624523589
Adding Hamming 1 (Categorical)... alpha = 0.0

### Finding the best alpha_3, fixing best_alpha_1
START: alpha_1 = 0.023357214690901212, alpha_2 = 1, alpha_3 = 0

END1: alpha_1 = 0.023357214690901212, alpha_2 = 1, alpha_3 = 0.1

END2: alpha_1 = 0.023357214690901212, alpha_2 = 1, alpha_3 = 0.4832930238571752

In [2]:
c = [0,1]
al_ham = 0.023357214690901212
pub = np.logspace(-3, -1, 20)

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_pub = pub, a_ham = al_ham, choice = c, ks = [1])

Wed Sep 23 00:19:31 2020
START...
Computing Basic Matrix: Hamming 1 and Euclidean 2...
Adding Hamming 3 (Pubchem2d)... alpha = 0.001
Start CV...
New best params found! alpha:0.001, k:1, leaf:10,
                                                        acc:  0.7256198304961379, st.error:  0.0029769940217011497,
                                                        rmse: 0.7185721631305062, st.error:  0.007447037493213675
New best params found! alpha:0.001, k:1, leaf:80,
                                                        acc:  0.7263988596512782, st.error:  0.002992036837334821,
                                                        rmse: 0.7144185271959385, st.error:  0.0053160126795970515
Adding Hamming 3 (Pubchem2d)... alpha = 0.0012742749857031334
Start CV...
New best params found! alpha:0.0012742749857031334, k:1, leaf:10,
                                                        acc:  0.7265092289402431, st.error:  0.0037610557038928872,
                                       

This code chunck has to be re-runned since it point out as optimal value the last value of the interval I set. 

In [2]:
c = [0,1]
al_ham = 0.023357214690901212
pub = np.logspace(-1, 0, 20)

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_pub = pub, a_ham = al_ham, choice = c, ks = [1])

Wed Sep 23 18:47:16 2020
START...
Computing Basic Matrix: Hamming 1 and Euclidean 2...
Adding Hamming 3 (Pubchem2d)... alpha = 0.1
Start CV...
New best params found! alpha:0.1, k:1, leaf:10,
                                                        acc:  0.7261202475103163, st.error:  0.0030563025397978205,
                                                        rmse: 0.7166110159073625, st.error:  0.007088816416174572
New best params found! alpha:0.1, k:1, leaf:80,
                                                        acc:  0.7271772397571133, st.error:  0.0028711383187119827,
                                                        rmse: 0.7123628431239972, st.error:  0.004992303580564422
Adding Hamming 3 (Pubchem2d)... alpha = 0.11288378916846889
Start CV...
New best params found! alpha:0.11288378916846889, k:1, leaf:10,
                                                        acc:  0.7278992143876304, st.error:  0.0035340244286078725,
                                                 

### Finding again the best alpha_1, fixing best_alpha_3

START: alpha_1 = 0.023357214690901212, alpha_2 = 1, alpha_3 = 0.4832930238571752

END1: alpha_1 = 1.0 alpha_2 = 1, alpha_3 = 0.4832930238571752

END2: alpha_1 = 2.7825594022071245 alpha_2 = 1, alpha_3 = 0.4832930238571752

In [2]:
c = [1,0]
al_pub = 0.4832930238571752

ham = np.logspace(-2, 0, 20)

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_ham = ham, a_pub = al_pub, choice = c, ks = [1])

Wed Sep 23 21:58:04 2020
START...
Computing Euclidean and Pubchem2d Matrix...
Adding Hamming 1 (Categorical)... alpha = 0.01
Start CV...
New best params found! alpha:0.01, k:1, leaf:10,
                                                        acc:  0.5677197800711883, st.error:  0.004939900535676926,
                                                        rmse: 0.9412787952512911, st.error:  0.005629515686018252
New best params found! alpha:0.01, k:1, leaf:30,
                                                        acc:  0.5720558930354382, st.error:  0.0034541145096843974,
                                                        rmse: 0.9381895946210109, st.error:  0.009681609567517947
New best params found! alpha:0.01, k:1, leaf:70,
                                                        acc:  0.5722778833513135, st.error:  0.004358228071446643,
                                                        rmse: 0.9318040691222567, st.error:  0.007922290370031204
Adding Hamming 1 (Categorica

In [2]:
c = [1,0]
al_pub = 0.4832930238571752

ham = np.logspace(0, 0.5, 10)

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_ham = ham, a_pub = al_pub, choice = c, ks = [1])

Thu Sep 24 05:47:44 2020
START...
Computing Euclidean and Pubchem2d Matrix...
Adding Hamming 1 (Categorical)... alpha = 1.0
Start CV...
New best params found! alpha:1.0, k:1, leaf:10,
                                                        acc:  0.5684981756305785, st.error:  0.004777699496653579,
                                                        rmse: 0.9419881118866616, st.error:  0.004197384066548152
New best params found! alpha:1.0, k:1, leaf:30,
                                                        acc:  0.572500615437823, st.error:  0.004151861528614078,
                                                        rmse: 0.9365807775990893, st.error:  0.01054366643876818
Adding Hamming 1 (Categorical)... alpha = 1.1364636663857248
Start CV...
Adding Hamming 1 (Categorical)... alpha = 1.2915496650148839
Start CV...
Adding Hamming 1 (Categorical)... alpha = 1.4677992676220695
Start CV...
New best params found! alpha:1.4677992676220695, k:1, leaf:70,
                              

### Finding again the best alpha_3, fixing best_alpha_1

START: alpha_1 = 2.7825594022071245, alpha_2 = 1, alpha_3 = 0.4832930238571752

END: alpha_1 = 2.7825594022071245, alpha_2 = 1, alpha_3 = 1000

In [2]:
c = [0,1]
al_ham = 2.7825594022071245
pub = np.logspace(-0.3, 1, 20)

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_pub = pub, a_ham = al_ham, choice = c, ks = [1])

Thu Sep 24 10:42:25 2020
START...
Computing Basic Matrix: Hamming 1 and Euclidean 2...
Adding Hamming 3 (Pubchem2d)... alpha = 0.5011872336272722
Start CV...
New best params found! alpha:0.5011872336272722, k:1, leaf:10,
                                                        acc:  0.5185697951306776, st.error:  0.0012606025896634678,
                                                        rmse: 1.134684458293093, st.error:  0.009055758374491591
New best params found! alpha:0.5011872336272722, k:1, leaf:20,
                                                        acc:  0.5198486077042462, st.error:  0.0035048587031914543,
                                                        rmse: 1.128338364279855, st.error:  0.003999009126531808
New best params found! alpha:0.5011872336272722, k:1, leaf:30,
                                                        acc:  0.5207386861047661, st.error:  0.0024258905741088616,
                                                        rmse: 1.129454408357924

Start CV...
New best params found! alpha:1.767536622987673, k:1, leaf:10,
                                                        acc:  0.5667740070588748, st.error:  0.0024814909386639138,
                                                        rmse: 1.0662405790407656, st.error:  0.009337438643036945
New best params found! alpha:1.767536622987673, k:1, leaf:20,
                                                        acc:  0.5688306897709674, st.error:  0.0029063209284844785,
                                                        rmse: 1.048098498192307, st.error:  0.008540549678760436
New best params found! alpha:1.767536622987673, k:1, leaf:40,
                                                        acc:  0.5702763698301485, st.error:  0.0035864981171653708,
                                                        rmse: 1.0427903908696359, st.error:  0.007319886720854343
Adding Hamming 3 (Pubchem2d)... alpha = 2.0691380811147897
Start CV...
New best params found! alpha:2.06913808111

Adding Hamming 3 (Pubchem2d)... alpha = 7.297227644686393
Start CV...
New best params found! alpha:7.297227644686393, k:1, leaf:20,
                                                        acc:  0.6318250509233267, st.error:  0.003144500029685537,
                                                        rmse: 0.9334032957167342, st.error:  0.008981436605725911
New best params found! alpha:7.297227644686393, k:1, leaf:30,
                                                        acc:  0.6328247722957322, st.error:  0.004429609538339608,
                                                        rmse: 0.9301073902476468, st.error:  0.00779031401464186
New best params found! alpha:7.297227644686393, k:1, leaf:40,
                                                        acc:  0.6331036316935721, st.error:  0.0024971361411862568,
                                                        rmse: 0.9383216508263216, st.error:  0.008518692385873845
New best params found! alpha:7.297227644686393, k:1, leaf

In [2]:
c = [0,1]
al_ham = 2.7825594022071245
pub = [10,20,50,100]

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_pub = pub, a_ham = al_ham, choice = c, ks = [1])

Thu Sep 24 14:06:04 2020
START...
Computing Basic Matrix: Hamming 1 and Euclidean 2...
Adding Hamming 3 (Pubchem2d)... alpha = 10
Start CV...
New best params found! alpha:10, k:1, leaf:10,
                                                        acc:  0.649060107065319, st.error:  0.0039927701073128275,
                                                        rmse: 0.8970446052192381, st.error:  0.0073117770855754185
New best params found! alpha:10, k:1, leaf:70,
                                                        acc:  0.6510060186960198, st.error:  0.004674900141749234,
                                                        rmse: 0.8892442255135233, st.error:  0.007892436048516993
Adding Hamming 3 (Pubchem2d)... alpha = 20
Start CV...
New best params found! alpha:20, k:1, leaf:10,
                                                        acc:  0.6760811268361334, st.error:  0.004958342546751918,
                                                        rmse: 0.8471052725774729, st.err

In [2]:
c = [0,1]
al_ham = 2.7825594022071245
pub = [100,200,500,1000,10000]

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_pub = pub, a_ham = al_ham, choice = c, ks = [1])

Mon Sep 28 10:52:21 2020
START...
Computing Basic Matrix: Hamming 1 and Euclidean 2...
Adding Hamming 3 (Pubchem2d)... alpha = 100
Start CV...
New best params found! alpha:100, k:1, leaf:10,
                                                        acc:  0.7155011363771583, st.error:  0.004299724377315707,
                                                        rmse: 0.7488069810593526, st.error:  0.006479600703800599
New best params found! alpha:100, k:1, leaf:70,
                                                        acc:  0.7164458121870751, st.error:  0.003889738274597874,
                                                        rmse: 0.7470222994556968, st.error:  0.006509207008171904
Adding Hamming 3 (Pubchem2d)... alpha = 200
Start CV...
New best params found! alpha:200, k:1, leaf:10,
                                                        acc:  0.7252861882462425, st.error:  0.00456556607363394,
                                                        rmse: 0.7321440495009776, st.

### Finding again the best alpha_1, fixing best_alpha_3

START: alpha_1 = 2.7825594022071245, alpha_2 = 1, alpha_3 = 1000

END: alpha_1 = -----------, alpha_2 = 1, alpha_3 = 1000

In [2]:
c = [1,0]
al_pub = 1000

ham = [3, 5, 10, 100, 1000]

best_acc, best_alpha, best_k, best_leaf = cv_params_new(X_train, y_train, categorical, non_categorical,
                                                    sequence_ham = ham, a_pub = al_pub, choice = c, ks = [1])

Mon Sep 28 12:53:59 2020
START...
Computing Euclidean and Pubchem2d Matrix...
Adding Hamming 1 (Categorical)... alpha = 3
Start CV...
New best params found! alpha:3, k:1, leaf:10,
                                                        acc:  0.5694989169376061, st.error:  0.005365420350894578,
                                                        rmse: 0.9437098797254124, st.error:  0.0050949059820642835
New best params found! alpha:3, k:1, leaf:30,
                                                        acc:  0.5725560782462934, st.error:  0.0043481487470328015,
                                                        rmse: 0.9380019223084393, st.error:  0.011840056503234513
New best params found! alpha:3, k:1, leaf:70,
                                                        acc:  0.5731674672380773, st.error:  0.004254024042303233,
                                                        rmse: 0.936920913571455, st.error:  0.007937711609441634
Adding Hamming 1 (Categorical)... alpha 

## Final model -- MULTICLASS -- K = 1

In [2]:
y = np.append(y_train,y_test)

del X_train, X_test, y_train, y_test

ham = 2.7825594022071245
pub = 100
k = 1
leaf = 30

cv_multiclass_knn(X_try, y, ham, pub, k, leaf)

Basic Matrix... Thu Sep 24 15:37:29 2020
Adding pubchem2d Thu Sep 24 15:40:59 2020
End distance matrix... Thu Sep 24 15:53:58 2020
Accuracy: 	 0.7397005500575238, se: 0.0018383759458390201
RMSE: 		 0.6947909402973472, se: 0.00219610172761468


In [2]:
y = np.append(y_train,y_test)

del X_train, X_test, y_train, y_test

ham = 2.7825594022071245
pub = 1000
k = 1
leaf = 30

cv_multiclass_knn(X_try, y, ham, pub, k, leaf)

Basic Matrix... Mon Sep 28 11:39:30 2020
Adding pubchem2d Mon Sep 28 11:42:21 2020
End distance matrix... Mon Sep 28 11:58:37 2020
Accuracy: 	 0.7559785276743899, se: 0.0028283453879600776
RMSE: 		 0.6547477665180885, se: 0.005520359369533257
