# TRAINING, OPTIMIZING, AND SELECTING THE MACHINE LEARNING ALGORITHMS FOR FLAVOR PREDICTION WITH RDKit DESCRIPTORS

This script comprises the process for training, hyperprameter optimization and testing the Machine Learning algorithms for flavor prediction. The data for both training and testing was previously splitted using a partition training-testing 80:20.

This script contains the steps for the training of the Random Forest, and K-Nearest KNeighbors classifiers. These steps are the same for the training with Extended Connectivity Fingerprint.

In [1]:

# Import the training data for Extended Connectivity Fingerprint

import pandas as pd

ECFP_train_data = pd.read_excel('https://github.com/FabioHerrera97/FlavorMiner/raw/main/Data/ECFP_train.xlsx')

X_ECFP_train = ECFP_train_data.drop(['Bitter', 'Floral', 'Fruity', 'Off_flavor', 'Nutty', 'Sour', 'Sweet'], axis=1)

y_ECFP_train = ECFP_train_data [['Bitter', 'Floral', 'Fruity', 'Off_flavor', 'Nutty', 'Sour', 'Sweet']]



# 1. Training and optimizing the Random Forest and KNN algorithms

Three algorihms selected for the training with molecular descriptors and fingerprint are Random Forest and K-Nearest Neibours. These algorithms were chosen because they were previously used for flavor prediction and have several tools to interpret and explain the results, offering further information beyond the predictions. Additionally the optimization of the hyperparameters of these models is relatively fast. The library used for this training and optimization is sklearn. The support vector machine must be trained separately as it requires more computer power and a different hyperparameter optimization procedure.

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV

All the models are submitted to hyperparameter optimization using the gridsearch fuction. 5-fold Cross Validation is used to do the validation of the model during the hyperparameter optimization. The metric used for selecting the best estimator during the optimization is the recall. This is because this metrics measure the performance of the model predicting True Positives, the lower category in this case.

WARNING: The training can take up to 2 hours

In [None]:

# Function to train a binary classifier and return the trained model

def train_classifier(X, y, classifier, param_grid):

    # Perform hyperparameter optimization using GridSearchCV

    grid_search = GridSearchCV(classifier, param_grid, cv=5, scoring='recall')
    grid_search.fit(X, y)

    return grid_search.best_estimator_


trained_classifiers = []

# Hyperparameter grids for each classifier

rf_param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10]
}

svm_param_grid = {
    'C': [0.1, 1, 10],
    'kernel': ['linear', 'rbf']
}

knn_param_grid = {
    'n_neighbors': [3, 5, 7],
    'weights': ['uniform', 'distance']
}

In [None]:

for label in y_ECFP_train:
    print(f"Training classifiers for '{label}'")

    # Random Forest

    rf_classifier = RandomForestClassifier(random_state=42)
    rf_classifier = train_classifier(X_ECFP_train, y_ECFP_train[label], rf_classifier, rf_param_grid)
    trained_classifiers.append(("Random Forest", label, rf_classifier))

    # KNN

    knn_classifier = KNeighborsClassifier()
    knn_classifier = train_classifier(X_ECFP_train, y_ECFP_train[label], knn_classifier, knn_param_grid)
    trained_classifiers.append(("KNN", label, knn_classifier))


Training classifiers for 'Bitter'
Training classifiers for 'Floral'
Training classifiers for 'Fruity'
Training classifiers for 'Off_flavor'
Training classifiers for 'Nutty'
Training classifiers for 'Sour'
Training classifiers for 'Sweet'


In [None]:
import joblib
from google.colab import files

# save model with joblib
for classifier_type, label, classifier in trained_classifiers:
  filename = f'{label}_{classifier_type}.sav'
  joblib.dump(classifier, filename)
  files.download(filename)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# 2. Testing the algorithms trained with the Extended Connectivity Fingerprint

The models trained previously  were saved as .sav files. This allows subsequent use of the models without retraining. Similarly, multiple tests cab be performed on this saved models.

After training the algorithms the best estimators are tested using the test set. The metrics used during the testing are recall, specificity, and roc_score. These metrics were selected because they are speacially designed to test classifiers trained with imbalanced data. The recall measuress the performance on True Positives, the specifity on False Positives, and the roc_score works as a weighted average of the recall and specifity.

The mentioned metrics are calculated both during the training (with cross validation) and testing to identify additional pathologies in the models, such as overfitting.

In [2]:
import pandas as pd

''' Import the testing data for Extended Connectivity Fingerprint'''

ECFP_test_data = pd.read_excel('https://github.com/FabioHerrera97/FlavorMiner/raw/main/Data/ECFP_test.xlsx')

X_ECFP_test = ECFP_test_data.drop(['Bitter', 'Floral', 'Fruity', 'Off_flavor', 'Nutty', 'Sour', 'Sweet'], axis=1)

y_ECFP_test = ECFP_test_data [['Bitter', 'Floral', 'Fruity', 'Off_flavor', 'Nutty', 'Sour', 'Sweet']]

In [None]:

def evaluate_classifiers(trained_classifiers, X_train, y_train, X_test, y_test):
    evaluation_metrics_train = {}
    evaluation_metrics_test = {}

    for classifier_type, label, classifier in trained_classifiers:
        ''' Training evaluation. Use the resampled training set for evaluation'''

        y_pred_train = classifier.predict(X_train)
        recall_train = recall_score(y_train[label], y_pred_train)
        tn_train, fp_train, fn_train, tp_train = confusion_matrix(y_train[label], y_pred_train).ravel()
        specificity_train = tn_train / (tn_train + fp_train)
        roc_score_train = roc_auc_score(y_train[label], classifier.predict_proba(X_train)[:, 1])

        evaluation_metrics_train[(classifier_type, label)] = {'Recall': recall_train,
                                                              'Specificity': specificity_train,
                                                              'ROC Score': roc_score_train}

        ''' Testing evaluation'''

        y_pred_test = classifier.predict(X_test)
        recall_test = recall_score(y_test[label], y_pred_test)
        tn_test, fp_test, fn_test, tp_test = confusion_matrix(y_test[label], y_pred_test).ravel()
        specificity_test = tn_test / (tn_test + fp_test)
        roc_score_test = roc_auc_score(y_test[label], classifier.predict_proba(X_test)[:, 1])

        evaluation_metrics_test[(classifier_type, label)] = {'Recall': recall_test,
                                                             'Specificity': specificity_test,
                                                             'ROC Score': roc_score_test}

    return evaluation_metrics_train, evaluation_metrics_test

In [None]:
train_metrics, test_metrics = evaluate_classifiers(trained_classifiers, X_ECFP_train, y_ECFP_train, X_ECFP_test, y_ECFP_test)

The evaluation of the models show in general good values for the specificiy and ROC score of the models. However, the recall is still considerably low (except for sweet compounds). For cases such as Bitter, Floral, Fruity, and Off-Flavor the results are promising, although the metric is still low. For Nutty and Sour these approaches show the worst performance. Additionally, except for the bitter flavor, KNN algorithm performs slighltly better than Random Forest.

In [None]:
metrics_test_df = pd.DataFrame.from_dict(test_metrics, orient='index')

print(metrics_test_df)

                            Recall  Specificity  ROC Score
Random Forest Bitter      0.567708     0.986486   0.930585
KNN           Bitter      0.588542     0.968178   0.861551
Random Forest Floral      0.345238     0.966785   0.886229
KNN           Floral      0.483333     0.927812   0.796264
Random Forest Fruity      0.319149     0.959600   0.867763
KNN           Fruity      0.388298     0.932233   0.757413
Random Forest Off_flavor  0.395556     0.938959   0.877383
KNN           Off_flavor  0.511111     0.886894   0.803607
Random Forest Nutty       0.234940     0.975703   0.855657
KNN           Nutty       0.370482     0.950980   0.747155
Random Forest Sour        0.066667     0.995390   0.816294
KNN           Sour        0.133333     0.984633   0.711558
Random Forest Sweet       0.847405     0.945926   0.944961
KNN           Sweet       0.868319     0.874549   0.918427


After calculating the performance metrics during training with cross-validation, it is observed how the recall for all the models is above 86%. This fact reflects overfitting problems in trained models because the performance during testing drops more than 50% on average.  The most likely cause for this issue is the considerable imbalance between positive and negative examples. This hypothesis is based on the fact that the differences between the training and testing performance are significant considering the recall but small at the specificity level. Another evidence of this problem is the sweet category, were the class imbalance is low, shows no severe overfitting. Thus, implementiong a class imbalance strategy has the potential to help solving this problem.

In [None]:
metrics_train_df = pd.DataFrame.from_dict(train_metrics, orient='index')

print(metrics_train_df)

                            Recall  Specificity  ROC Score
Random Forest Bitter      0.960784     0.997930   0.998698
KNN           Bitter      0.958824     0.998148   0.994219
Random Forest Floral      0.924331     0.992480   0.997894
KNN           Floral      0.899543     0.996513   0.995884
Random Forest Fruity      0.924448     0.990436   0.997594
KNN           Fruity      0.891661     0.994735   0.994609
Random Forest Off_flavor  0.921317     0.990243   0.997149
KNN           Off_flavor  0.708147     0.945722   0.948835
Random Forest Nutty       0.885425     0.997900   0.997831
KNN           Nutty       0.880371     0.998320   0.992893
Random Forest Sour        0.801242     0.998845   0.998499
KNN           Sour        0.819876     0.997882   0.986403
Random Forest Sweet       0.976735     0.992919   0.999041
KNN           Sweet       0.973467     0.995824   0.997741


# 3. Training with the oversampled data



After doing the SMOTE oversampling, the following step is training the algorithms with the resampled data. The hyperparameter optimization was repeated during this step to help improve the performance.

WARNING: The training can take up to 2 hours

In [None]:
import pandas as pd

labels = ['Bitter', 'Floral', 'Fruity', 'Off_flavor', 'Nutty', 'Sour', 'Sweet']
trained_classifiers_SMOTE = []

for lab in labels:
    print(f"Training classifiers for '{lab}'")
    oversampled_data = pd.read_excel(f"https://github.com/FabioHerrera97/FlavorMiner/raw/main/Data/OversampledData/{lab}_ECFP_oversampled.xlsx")

    X_train = oversampled_data.drop([f"{lab}"], axis=1)
    y_train = oversampled_data [f"{lab}"]
    # Random Forest

    rf_classifier = RandomForestClassifier(random_state=42)
    rf_classifier = train_classifier(X_train, y_train, rf_classifier, rf_param_grid)
    trained_classifiers_SMOTE.append(("Random_Forest_SMOTE", lab, rf_classifier))

    # KNN

    knn_classifier = KNeighborsClassifier()
    knn_classifier = train_classifier(X_train, y_train, knn_classifier, knn_param_grid)
    trained_classifiers_SMOTE.append(("KNN_SMOTE", lab, knn_classifier))


Training classifiers for 'Bitter'
Training classifiers for 'Floral'
Training classifiers for 'Fruity'
Training classifiers for 'Off_flavor'
Training classifiers for 'Nutty'
Training classifiers for 'Sour'
Training classifiers for 'Sweet'


In [None]:
import joblib
from google.colab import files

# save model with joblib
for classifier_type, label, classifier in trained_classifiers_SMOTE:
  filename = f'{label}_{classifier_type}.sav'
  joblib.dump(classifier, filename)
  files.download(filename)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
import joblib

metrics_train_SMOTE = {}
metrics_test_SMOTE = {}

labels = ['Bitter', 'Floral', 'Fruity', 'Off_flavor', 'Nutty', 'Sour', 'Sweet']

for lab in labels:
    print(f"Testing classifiers for '{lab}'")
    oversampled_data = pd.read_excel(f"https://github.com/FabioHerrera97/FlavorMiner/raw/main/Data/OversampledData/{lab}_ECFP_oversampled.xlsx")
    X_train = oversampled_data.drop([f'{lab}'], axis=1)
    y_train = oversampled_data[f'{lab}']


    classifier_trained = {}
    RF = joblib.load(f'{lab}_Random_Forest_SMOTE.sav')
    name_RF = 'Random Forest'
    classifier_trained[RF] = name_RF
    KNN = joblib.load(f'{lab}_KNN_SMOTE.sav')
    name_KNN = 'KNN'
    classifier_trained[KNN] = name_KNN

    for classifier in classifier_trained:

      ''' Training evaluation. Use the resampled training set for evaluation'''

      y_pred_train = classifier.predict(X_train)
      recall_train = recall_score(y_train, y_pred_train)
      tn_train, fp_train, fn_train, tp_train = confusion_matrix(y_train, y_pred_train).ravel()
      specificity_train = tn_train / (tn_train + fp_train)
      roc_score_train = roc_auc_score(y_train, classifier.predict_proba(X_train)[:, 1])

      metrics_train_SMOTE[(classifier_trained[classifier], lab)] = {'Recall': recall_train, 'Specificity': specificity_train,
                                                     'ROC Score': roc_score_train}

      ''' Testing evaluation'''

      y_pred_test = classifier.predict(X_ECFP_test)
      recall_test = recall_score(y_ECFP_test[lab], y_pred_test)
      tn_test, fp_test, fn_test, tp_test = confusion_matrix(y_ECFP_test[lab], y_pred_test).ravel()
      specificity_test = tn_test / (tn_test + fp_test)
      roc_score_test = roc_auc_score(y_ECFP_test[lab], classifier.predict_proba(X_ECFP_test)[:, 1])

      metrics_test_SMOTE[(classifier_trained[classifier], lab)] = {'Recall': recall_test, 'Specificity': specificity_test,
                                                      'ROC Score': roc_score_test}

Testing classifiers for 'Bitter'
Testing classifiers for 'Floral'
Testing classifiers for 'Fruity'
Testing classifiers for 'Off_flavor'
Testing classifiers for 'Nutty'
Testing classifiers for 'Sour'
Testing classifiers for 'Sweet'


The evaluation of the models show in general good values for the specificiy and ROC score of the models. However, the recall is still considerably low (except for sweet compounds). For cases such as Bitter, Floral, Fruity, and Off-Flavor the results are promising, although the metric is still low. For Nutty and Sour these approaches show the worst performance. Additionally, except for the bitter flavor, KNN algorithm performs slighltly better than Random Forest.

In [None]:
metrics_test_df_SMOTE = pd.DataFrame.from_dict(metrics_test_SMOTE, orient='index')

print(metrics_test_df_SMOTE)

                            Recall  Specificity  ROC Score
Random Forest Bitter      0.815104     0.877942   0.923010
KNN           Bitter      0.888021     0.742371   0.885952
Random Forest Floral      0.869048     0.691763   0.849641
KNN           Floral      0.904762     0.683791   0.850712
Random Forest Fruity      0.710106     0.827976   0.875428
KNN           Fruity      0.893617     0.670721   0.833911
Random Forest Off_flavor  0.817778     0.729803   0.864227
KNN           Off_flavor  0.913333     0.671005   0.841296
Random Forest Nutty       0.792169     0.718244   0.844115
KNN           Nutty       0.876506     0.682864   0.841926
Random Forest Sour        0.493333     0.905878   0.861137
KNN           Sour        0.706667     0.730695   0.775687
Random Forest Sweet       0.852053     0.942322   0.946012
KNN           Sweet       0.872967     0.861572   0.914885


In [None]:
metrics_train_df_SMOTE = pd.DataFrame.from_dict(metrics_train_SMOTE, orient='index')
print(metrics_train_df_SMOTE)

                            Recall  Specificity  ROC Score
Random Forest Bitter      0.983332     0.908923   0.991109
KNN           Bitter      0.987580     0.804990   0.990044
Random Forest Floral      0.985506     0.698779   0.943109
KNN           Floral      0.991282     0.731473   0.981230
Random Forest Fruity      0.998818     0.850956   0.974015
KNN           Fruity      0.991511     0.718891   0.980123
Random Forest Off_flavor  0.973982     0.740159   0.939973
KNN           Off_flavor  0.990019     0.718067   0.978402
Random Forest Nutty       0.986662     0.725163   0.949861
KNN           Nutty       0.988763     0.736085   0.983769
Random Forest Sour        0.998941     0.914701   0.990124
KNN           Sour        0.997593     0.788871   0.994410
Random Forest Sweet       0.977669     0.992738   0.999092
KNN           Sweet       0.974582     0.995643   0.997810


# 5. Training with the undersampled data

In [None]:
import pandas as pd

labels = ['Bitter', 'Floral', 'Fruity', 'Off_flavor', 'Nutty', 'Sour', 'Sweet']
trained_classifiers_CC = []

for lab in labels:
    print(f"Training classifiers for '{lab}'")
    oversampled_data = pd.read_excel(f"https://github.com/FabioHerrera97/FlavorMiner/raw/main/Data/OversampledData/{lab}_ECFP_undersampled.xlsx")

    X_train = oversampled_data.drop([f"{lab}"], axis=1)
    y_train = oversampled_data [f"{lab}"]
    # Random Forest

    rf_classifier = RandomForestClassifier(random_state=42)
    rf_classifier = train_classifier(X_train, y_train, rf_classifier, rf_param_grid)
    trained_classifiers_CC.append(("Random_Forest_CC", lab, rf_classifier))

    # KNN

    knn_classifier = KNeighborsClassifier()
    knn_classifier = train_classifier(X_train, y_train, knn_classifier, knn_param_grid)
    trained_classifiers_CC.append(("KNN_CC", lab, knn_classifier))


Training classifiers for 'Bitter'
Training classifiers for 'Floral'
Training classifiers for 'Fruity'
Training classifiers for 'Off_flavor'
Training classifiers for 'Nutty'
Training classifiers for 'Sour'
Training classifiers for 'Sweet'


In [None]:
import joblib
from google.colab import files

# save model with joblib
for classifier_type, label, classifier in trained_classifiers_CC:
  filename = f'{label}_{classifier_type}.sav'
  joblib.dump(classifier, filename)
  files.download(filename)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
metrics_train_CC = {}
metrics_test_CC = {}

labels = ['Bitter', 'Floral', 'Fruity', 'Off_flavor', 'Nutty', 'Sour', 'Sweet']

for lab in labels:
    print(f"Testing classifiers for '{lab}'")
    oversampled_data = pd.read_excel(f"{lab}_ECFP_undersampled.xlsx")
    X_train = oversampled_data.drop([f'{lab}'], axis=1)
    y_train = oversampled_data[f'{lab}']


    classifier_trained = {}
    RF = joblib.load(f'{lab}_Random_Forest_CC.sav')
    name_RF = 'Random Forest'
    classifier_trained[RF] = name_RF
    KNN = joblib.load(f'{lab}_KNN_CC.sav')
    name_KNN = 'KNN'
    classifier_trained[KNN] = name_KNN

    for classifier in classifier_trained:

      ''' Training evaluation. Use the resampled training set for evaluation'''

      y_pred_train = classifier.predict(X_train)
      recall_train = recall_score(y_train, y_pred_train)
      tn_train, fp_train, fn_train, tp_train = confusion_matrix(y_train, y_pred_train).ravel()
      specificity_train = tn_train / (tn_train + fp_train)
      roc_score_train = roc_auc_score(y_train, classifier.predict_proba(X_train)[:, 1])

      metrics_train_CC[(classifier_trained[classifier], lab)] = {'Recall': recall_train, 'Specificity': specificity_train,
                                                     'ROC Score': roc_score_train}

      ''' Testing evaluation'''

      y_pred_test = classifier.predict(X_ECFP_test)
      recall_test = recall_score(y_ECFP_test[lab], y_pred_test)
      tn_test, fp_test, fn_test, tp_test = confusion_matrix(y_ECFP_test[lab], y_pred_test).ravel()
      specificity_test = tn_test / (tn_test + fp_test)
      roc_score_test = roc_auc_score(y_ECFP_test[lab], classifier.predict_proba(X_ECFP_test)[:, 1])

      metrics_test_CC[(classifier_trained[classifier], lab)] = {'Recall': recall_test, 'Specificity': specificity_test,
                                                      'ROC Score': roc_score_test}

Testing classifiers for 'Bitter'
Testing classifiers for 'Floral'
Testing classifiers for 'Fruity'
Testing classifiers for 'Off_flavor'
Testing classifiers for 'Nutty'
Testing classifiers for 'Sour'
Testing classifiers for 'Sweet'


In [None]:
metrics_test_df_CC = pd.DataFrame.from_dict(metrics_test_CC, orient='index')

print(metrics_test_df_CC)

                            Recall  Specificity  ROC Score
Random Forest Bitter      0.903646     0.651700   0.888168
KNN           Bitter      0.703125     0.841325   0.854980
Random Forest Floral      0.876190     0.652790   0.832914
KNN           Floral      0.728571     0.793180   0.832425
Random Forest Fruity      0.901596     0.652911   0.864323
KNN           Fruity      0.728723     0.807993   0.830058
Random Forest Off_flavor  0.904444     0.653052   0.849378
KNN           Off_flavor  0.713333     0.759874   0.825381
Random Forest Nutty       0.927711     0.531969   0.800746
KNN           Nutty       0.728916     0.742967   0.800184
Random Forest Sour        0.946667     0.447176   0.819429
KNN           Sour        0.746667     0.750288   0.805450
Random Forest Sweet       0.845081     0.931507   0.945261
KNN           Sweet       0.865221     0.880317   0.925679


In [None]:
metrics_train_df_CC = pd.DataFrame.from_dict(metrics_train_CC, orient='index')

print(metrics_train_df_CC)

                            Recall  Specificity  ROC Score
Random Forest Bitter      0.981046     0.973856   0.997391
KNN           Bitter      0.999346     0.999346   0.999999
Random Forest Floral      0.946510     0.836269   0.957635
KNN           Floral      0.998695     1.000000   0.999999
Random Forest Fruity      0.984319     0.965788   0.995885
KNN           Fruity      0.997862     1.000000   0.999998
Random Forest Off_flavor  0.956473     0.752232   0.940139
KNN           Off_flavor  0.996094     0.999442   0.999990
Random Forest Nutty       0.966302     0.821398   0.971098
KNN           Nutty       0.999158     1.000000   1.000000
Random Forest Sour        0.996894     0.975155   0.999064
KNN           Sour        1.000000     1.000000   1.000000
Random Forest Sweet       0.981157     0.990002   0.999207
KNN           Sweet       0.974620     0.996731   0.999286
