## Machine Learning I (CC2008) - Practical Assignment (2023-24)

# Evaluation and Comparison of K-NN Algorithms on Imbalanced Binary Classification Tasks

#### Vítor Bruno Dantas Ramalhosa Ferreira (201109428) | G: 2.1

### Table of Contents

- [1. Algorithm Selection](#1-Algorithm)
- [2. K-NN and Data Characteristics](#2-KNN)
- [3. Class Imbalance in Binary Problems](#3-Class-imbalance)
- [4. Benchmark First Evaluation](#4-Benchmark-first-evaluation)
    - [4.1. Datasets Selection](#4.1-DS-selection)
    - [4.2. Datasets Evaluation](#4.2-DS-evaluation)
        - [4.2.1. wilt](#4.2.1-wilt) 
        - [4.2.2. sick](#4.2.2-sick)
        - [4.2.3. ozone-level-8hr](#4.2.3-ozone)
        - [4.2.4. pc1](#4.2.4-pc1)
        - [4.2.5. climate-model-simulation-crashes](#4.2.5-climate)
        - [4.2.6. pc3](#4.2.6-pc3)    
        - [4.2.7. pc4](#4.2.7-pc4)
        - [4.2.8. Internet-Advertisements](#4.2.8-internet)
        - [4.2.9. churn](#4.2.9-churn)
        - [4.2.10. kc1](#4.2.10-kc1)
    - [4.3. Results](#4.3-Results)    
- [5. K-NN's variant proposal and implementation](#5-KNN-variant-proposal)
- [6. Benchmark Second Evaluation](#6-Benchmark-second-evaluation)
    - [6.1. Datasets Evaluation](#6.1-DS-evaluation)
        - [6.1.1. wilt](#6.1.1-wilt) 
        - [6.1.2. sick](#6.1.2-sick)
        - [6.1.3. ozone-level-8hr](#6.1.3-ozone)
        - [6.1.4. pc1](#6.1.4-pc1)
        - [6.1.5. climate-model-simulation-crashes](#6.1.5-climate)
        - [6.1.6. pc3](#6.1.6-pc3)    
        - [6.1.7. pc4](#6.1.7-pc4)
        - [6.1.8. Internet-Advertisements](#6.1.8-internet)
        - [6.1.9. churn](#6.1.9-churn)
        - [6.1.10. kc1](#6.1.10-kc1)
    - [6.2. Results and Discussion](#6.2-Results)
- [7. References](#7-References)

<a id="1-Algorithm"></a>
## 1. Algorithm Selection

For the machine learning tasks outlined in this assignment, I have selected the **K-Nearest Neighbors (K-NN)** algorithm. K-NN is renowned for its simplicity and effectiveness, especially in scenarios where the decision boundary is irregular. The algorithm's reliance on feature similarity (distance metrics) to predict the labels of new data points makes it a versatile choice for many practical applications.

A standard implementation of the K-NN algorithm, which was later used, can be found at this GitHub repository: [MLAlgorithms - KNN](https://github.com/rushter/MLAlgorithms/blob/master/mla/knn.py).

This approach ensures that the fundamental aspects of the algorithm are adhered to, including distance calculation, neighbor selection, and classification based on majority voting. Below are the needed classes for the K-NN functionality:

In [None]:
from collections import Counter
import numpy as np
from scipy.spatial.distance import euclidean

class BaseEstimator:
    y_required = True
    fit_required = True

    def _setup_input(self, X, y=None):
        """Ensure inputs to an estimator are in the expected format.

        Ensures X and y are stored as numpy ndarrays by converting from an
        array-like object if necessary. Enables estimators to define whether
        they require a set of y target values or not with y_required, e.g.
        kmeans clustering requires no target labels and is fit against only X.

        Parameters
        ----------
        X : array-like
            Feature dataset.
        y : array-like
            Target values. By default is required, but if y_required = false
            then may be omitted.
        """
        if not isinstance(X, np.ndarray):
            X = np.array(X)

        if X.size == 0:
            raise ValueError("Got an empty matrix.")

        if X.ndim == 1:
            self.n_samples, self.n_features = 1, X.shape
        else:
            self.n_samples, self.n_features = X.shape[0], np.prod(X.shape[1:])

        self.X = X

        if self.y_required:
            if y is None:
                raise ValueError("Missed required argument y")

            if not isinstance(y, np.ndarray):
                y = np.array(y)

            if y.size == 0:
                raise ValueError("The targets array must be no-empty.")

        self.y = y

    def fit(self, X, y=None):
        self._setup_input(X, y)

    def predict(self, X=None):
        if not isinstance(X, np.ndarray):
            X = np.array(X)

        if self.X is not None or not self.fit_required:
            return self._predict(X)
        else:
            raise ValueError("You must call `fit` before `predict`")

    def _predict(self, X=None):
        raise NotImplementedError()


class KNNBase(BaseEstimator):
    def __init__(self, k=5, distance_func=euclidean):
        """Base class for Nearest neighbors classifier and regressor.

        Parameters
        ----------
        k : int, default 5
            The number of neighbors to take into account. If 0, all the
            training examples are used.
        distance_func : function, default euclidean distance
            A distance function taking two arguments. Any function from
            scipy.spatial.distance will do.
        """

        self.k = None if k == 0 else k  # l[:None] returns the whole list
        self.distance_func = distance_func

    def aggregate(self, neighbors_targets):
        raise NotImplementedError()

    def _predict(self, X=None):
        predictions = [self._predict_x(x) for x in X]

        return np.array(predictions)

    def _predict_x(self, x):
        """Predict the label of a single instance x."""

        # compute distances between x and all examples in the training set.
        distances = (self.distance_func(x, example) for example in self.X)

        # Sort all examples by their distance to x and keep their target value.
        neighbors = sorted(((dist, target) for (dist, target) in zip(distances, self.y)), key=lambda x: x[0])

        # Get targets of the k-nn and aggregate them (most common one or
        # average).
        neighbors_targets = [target for (_, target) in neighbors[: self.k]]

        return self.aggregate(neighbors_targets)


class KNNClassifier(KNNBase):
    """Nearest neighbors classifier.

    Note: if there is a tie for the most common label among the neighbors, then
    the predicted label is arbitrary."""

    def aggregate(self, neighbors_targets):
        """Return the most common target label."""

        most_common_label = Counter(neighbors_targets).most_common(1)[0][0]
        return most_common_label


This chosen implementation served as the base for the modifications or enhancements, which were later explored in the assignment, particularly when proposing and implementing a **variant of the K-NN algorithm**.

<a id="2-KNN"></a>
## 2. K-NN and Data Characteristics

The **K-Nearest Neighbors (K-NN)** algorithm, while straightforward and versatile, is notably sensitive to the specific characteristics of the dataset it processes. Below, I hypothesized how the standard version of K-NN reacts to various data peculiarities and which characteristics may most significantly impact its performance.

1. **Qualitative Attributes with a Large Number of Possible Values**
K-NN relies heavily on distance metrics to make predictions, which can become problematic when dealing with qualitative or categorical attributes that have a wide range of values. If these attributes are encoded improperly (e.g., using simple integer encoding), distances between categories may not be meaningful, leading to poor performance. Techniques like one-hot encoding can alleviate this issue but increase the dimensionality of the data, which can exacerbate the curse of dimensionality.

2. **Noise or Outliers**
The presence of noise or outliers in the training data can disproportionately affect K-NN because the algorithm's predictions are directly influenced by the nearest few samples in the feature space. An outlier close to a query point can lead the algorithm to make incorrect predictions, especially if the number of neighbors k is small.

3. **Class Imbalance in Binary Problems**
In binary classification tasks where one class significantly outnumbers the other, K-NN can develop a bias towards the majority class. This happens because there is a higher probability that the majority of the nearest neighbors belong to the more prevalent class, which can mislead predictions.

4. **Multiclass Classification**
In multiclass settings, K-NN’s efficacy can diminish as the distance between neighbors becomes less discriminative. When classes are numerous, the likelihood increases that neighbors belong to several different classes, which can dilute the voting process and lead to less confident predictions.

5. **Class Overlap**
K-NN's performance is also compromised in situations where class boundaries overlap significantly. In such cases, the local neighborhood of a sample may contain instances from multiple classes, making it difficult for K-NN to accurately assign a class based on majority voting.

In summary, the standard implementation of K-NN is inherently sensitive to the aforementioned data characteristics due to its dependency on local information and distance calculations. This sensitivity stems primarily from:

- **Dependency on Local Structure**: K-NN's decision-making process is based purely on the nearest neighbors, with no understanding of the overall data structure. This makes it highly susceptible to anomalies in local data patterns.
- **Equal Weight to All Features**: Standard K-NN implementation gives equal importance to all features unless explicitly programmed to do otherwise, which can lead to issues when irrelevant or less important features influence the distance calculations.
- **Curse of Dimensionality**: As the number of features grows (a common scenario when encoding categorical variables or dealing with high-dimensional data), the volume of the feature space increases exponentially, and the data becomes sparser. This sparsity makes it harder for K-NN to find meaningful neighbors, as most points are almost equidistant to one another.

<a id="3-Class-imbalance"></a>
## 3. Class Imbalance in Binary Problems

**Class imbalance** is a critical data characteristic to address in binary classification problems, especially when using algorithms like K-Nearest Neighbors (K-NN). This issue arises when one class in the dataset significantly outnumbers the other, which can alter the predictive accuracy and lead to a model that is biased toward the majority class. Understanding and tackling class imbalance is essential for several reasons:

- **Bias Towards Majority Class**: In its standard form, K-NN uses a majority voting system where the class label of a new instance is determined based on the most common class among its nearest neighbors. If one class dominates the dataset, it's more likely that a new instance will be classified into the majority class regardless of its true label. This results in poor model performance, particularly in its ability to correctly identify instances of the minority class.
- **Degradation of Performance Metrics**: Class imbalance affects key performance metrics. For instance, accuracy might appear high, but this can be misleading if the algorithm simply predicts the majority class most of the time. More critical metrics in imbalanced settings, like precision, recall, and the F1-score for the minority class, often degrade unless the imbalance is addressed.
- **Practical Implications in Real-World Scenarios**: Many real-world applications involve scenarios where the minority class is of greater interest despite its fewer occurrences, such as fraud detection, disease diagnosis, or spam detection. In these cases, failing to correctly predict the minority class can have serious consequences, making it imperative to handle class imbalance effectively.
- **Fairness and Equity in Predictive Modeling**: Addressing class imbalance also touches on ethical aspects of machine learning. Models trained on imbalanced data can perpetuate or exacerbate biases, leading to unfair outcomes. Ensuring that the model treats classes equitably is crucial to ethical AI practices.

Given these considerations, tackling class imbalance is not merely a technical issue but also a fundamental aspect of building reliable, fair, and effective machine learning models. Addressing this imbalance allows for a more nuanced understanding of the dataset and improves the robustness of the model's predictions across different classes.

<a id="4-Benchmark-first-evaluation"></a>
## 4. Benchmark First Evaluation

<a id="4.1-DS-selection"></a>
### 4.1. Datasets Selection

For the initial benchmarking of the provided K-NN implementation, the algorithm performance was evaluated using the **OpenML-CC18 Curated Classification Benchmark**. This benchmark suite provides a variety of datasets specifically curated for comprehensive benchmarking of classification algorithms. 

Below is a list of the given datasets:

In [16]:
import openml

suite = openml.study.get_suite('99')  # Use 'get_suite' to load the benchmark suite

print("List of Datasets in OpenML-CC18 Curated Classification benchmark:\n")
dataset_count = 0  # Initialize counter for datasets

for task_id in suite.tasks:
    task = openml.tasks.get_task(task_id, download_splits=False)
    dataset = openml.datasets.get_dataset(task.dataset_id, 
                                          download_data=True, 
                                          download_qualities=True, 
                                          download_features_meta_data=True)
    
    # Increment the dataset counter
    dataset_count += 1

    # Print details of each dataset
    print(f"Dataset ID: {dataset.dataset_id}, Name: {dataset.name}")

# Print the total number of datasets after the loop
print(f"\nTotal Number of Datasets: {dataset_count}\n")

List of Datasets in OpenML-CC18 Curated Classification benchmark:

Dataset ID: 3, Name: kr-vs-kp
Dataset ID: 6, Name: letter
Dataset ID: 11, Name: balance-scale
Dataset ID: 12, Name: mfeat-factors
Dataset ID: 14, Name: mfeat-fourier
Dataset ID: 15, Name: breast-w
Dataset ID: 16, Name: mfeat-karhunen
Dataset ID: 18, Name: mfeat-morphological
Dataset ID: 22, Name: mfeat-zernike
Dataset ID: 23, Name: cmc
Dataset ID: 28, Name: optdigits
Dataset ID: 29, Name: credit-approval
Dataset ID: 31, Name: credit-g
Dataset ID: 32, Name: pendigits
Dataset ID: 37, Name: diabetes
Dataset ID: 44, Name: spambase
Dataset ID: 46, Name: splice
Dataset ID: 50, Name: tic-tac-toe
Dataset ID: 54, Name: vehicle
Dataset ID: 151, Name: electricity
Dataset ID: 182, Name: satimage
Dataset ID: 188, Name: eucalyptus
Dataset ID: 38, Name: sick
Dataset ID: 307, Name: vowel
Dataset ID: 300, Name: isolet
Dataset ID: 458, Name: analcatdata_authorship
Dataset ID: 469, Name: analcatdata_dmft
Dataset ID: 554, Name: mnist_784
D

Instead of evaluating the K-NN on all available datasets, I focused on those specifically tailored to binary classification, with an emphasis on identifying and selecting those with significant class imbalances.

To streamline this evaluation, I pre-selected relevant datasets tailored to binary classification from the OpenML-CC18 suite and saved them into a CSV file named **dslist.csv**.

After loading the dataset information from dslist.csv, I calculated the **imbalance ratio** for each dataset. This ratio is determined by taking the maximum of (Class A / Class B) and (Class B / Class A). This calculation helped to identify the degree of imbalance present in each dataset:

In [17]:
import pandas as pd

# Load the CSV file
data = pd.read_csv('DS/dslist.csv')

# Ensure the column names are as expected
print("Columns in the dataset:", data.columns.tolist())

# Calculate the imbalance ratio as the maximum of (class A / class B) and (class B / class A) to ensure it's >= 1
data['IMBALANCE RATIO'] = data.apply(lambda row: max(row['CLASS A'] / row['CLASS B'], row['CLASS B'] / row['CLASS A']), axis=1)

# Print the dataset names with their corresponding imbalance ratios, sorted by the imbalance ratio in descending order
print(data[['DATASET', 'CLASS A', 'CLASS B', 'IMBALANCE RATIO']].sort_values(by='IMBALANCE RATIO', ascending=False))


Columns in the dataset: ['DATASET', 'CLASS A', 'CLASS B']
                             DATASET  CLASS A  CLASS B  IMBALANCE RATIO
1                               wilt   4578.0    261.0        17.540230
2                               sick   3541.0    231.0        15.329004
3                    ozone-level-8hr   2374.0    160.0        14.837500
4                                pc1   1032.0     77.0        13.402597
5   climate-model-simulation-crashes     46.0    494.0        10.739130
6                                pc3   1403.0    160.0         8.768750
7                     bank-marketing  39922.0   5289.0         7.548119
8                                pc4   1280.0    178.0         7.191011
9             Internet-Advertisement    459.0   2820.0         6.143791
10                             churn   4293.0    707.0         6.072136
11                               kc1   1783.0    326.0         5.469325
12                               jm1   8779.0   2106.0         4.168566
13    

Given the challenges posed by class imbalance in binary classification tasks, such as **bias towards the majority class**, I decided to only focus on those datasets where the **imbalance ratio** was **greater than 5**. This was chosen to highlight datasets where K-NN's performance could be more significantly affected by the imbalance. The final list of benchmark datasets that were selected is found below:

In [18]:
# Filter datasets where the imbalance ratio is greater than 5
filtered_data = data[data['IMBALANCE RATIO'] > 5]

# Print the filtered datasets that have an imbalance ratio greater than 5
print(filtered_data[['DATASET', 'CLASS A', 'CLASS B', 'IMBALANCE RATIO']].sort_values(by='IMBALANCE RATIO', ascending=False))

                             DATASET  CLASS A  CLASS B  IMBALANCE RATIO
1                               wilt   4578.0    261.0        17.540230
2                               sick   3541.0    231.0        15.329004
3                    ozone-level-8hr   2374.0    160.0        14.837500
4                                pc1   1032.0     77.0        13.402597
5   climate-model-simulation-crashes     46.0    494.0        10.739130
6                                pc3   1403.0    160.0         8.768750
7                     bank-marketing  39922.0   5289.0         7.548119
8                                pc4   1280.0    178.0         7.191011
9             Internet-Advertisement    459.0   2820.0         6.143791
10                             churn   4293.0    707.0         6.072136
11                               kc1   1783.0    326.0         5.469325


However, after reviewing the computational demands and performance metrics, I also decided to exclude the **bank-marketing** dataset from the current analysis. This particular dataset, due to its substantial size and high number of attributes, posed significant challenges for the K-NN algorithm, resulting in **long processing times**. Given these constraints, removing the dataset was necessary to ensure a more efficient and manageable workflow.

<a id="4.2-DS-evaluation"></a>
### 4.2. Datasets Evaluation

To effectively evaluate the **ten** pre-selected datasets, I began by copying the necessary machine learning algorithms from the GitHub repository previously mentioned to the folder "MLAlgorithms".

In [2]:
import sys
import os
sys.path.append('MLAlgorithms')

Then, I proceeded to import all the necessary libraries required for conducting the evaluations:

In [7]:
import numpy as np
import pandas as pd
from scipy.io import arff
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer

For each dataset, I have implemented a **Stratified 5-Fold Cross-Validation** method to ensure a thorough and unbiased evaluation. Given that some datasets have a small number of examples, a 10-fold cross-validation seemed excessive, even though it was initially considered. This approach maintains the proportion of classes across each fold, which is crucial for datasets with varying class distributions. During the cross-validation process, I recorded the **accuracy**, **precision**, **recall**, and **F1-score** for each fold, saving these metrics in a CSV file within the 'Results' folder.

Before initiating the cross-validation, I individually **preprocessed** each dataset, where preprocessing steps were tailored to the specific characteristics and requirements of each dataset. I also compiled a brief description of the structure of each dataset.

To ensure consistency in the evaluations, especially when comparing a different algorithm variation later, I utilized a **fixed random seed**. This was crucial for replicating the exact splits in the Stratified cross-validation when the same datasets were subjected to the modified version of the algorithm, thus ensuring that any differences in performance metrics were attributable solely to the algorithmic changes and not to variations in the dataset splits.

The following function **loads the datasets arff** to dataframe and is used for each dataset:

In [8]:
def load_arff_to_dataframe(file_path):
    try:
        data, meta = arff.loadarff(file_path)
        df = pd.DataFrame(data)
        for col in df.select_dtypes([object]):
            df[col] = df[col].apply(lambda x: x.decode('utf-8') if isinstance(x, bytes) else x)
        return df
    except Exception as e:
        print(f"Error loading data from {file_path}: {e}")
        return None

<a id="4.2.1-wilt"></a>
#### 4.2.1. wilt

**Relevant aspects**:

- Distribution of target variable ("class"): "1" (4578) / "2" (261).

- All other attributes are numeric.

- No missing values.

In [22]:
from knn import KNNClassifier

# Define parameters for the dataset
file_path = 'DS/wilt.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'class'
model = KNNClassifier(k=5)
model_name = 'KNNClassifier'

# Preprocess the data
def preprocess_data(df, target_column):
    X = df.drop(columns=[target_column]).values
    y = df[target_column].values
    return X, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Standardize the data
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='2', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='2', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='2', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = f"Results/wilt_first_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/wilt_first_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.961777,0.9,0.339623,0.493151,KNNClassifier
1,2,0.974174,0.935484,0.557692,0.698795,KNNClassifier
2,3,0.967975,0.92,0.442308,0.597403,KNNClassifier
3,4,0.966942,0.916667,0.423077,0.578947,KNNClassifier
4,5,0.974147,0.965517,0.538462,0.691358,KNNClassifier


<a id="4.2.2-sick"></a>
#### 4.2.2. sick

**Relevant aspects**:

- Distribution of target variable ("Class"): "negative" (3541) / "sick" (231).

- Some attributes are nominal, others numeric.

- There are some missing values. 

**Key Changes** (related to data preprocessing):

- Handling Missing Values: Implemented imputation for both numeric and categorical features.
  
- Encoding Nominal Attributes: Used one-hot encoding for categorical features.
  
- Dropping Columns: Dropped the "TBG" column as it contains only missing values.


In [23]:
from knn import KNNClassifier

# Define parameters for the dataset
file_path = 'DS/sick.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'Class'
model = KNNClassifier(k=5)
model_name = 'KNNClassifier'

def preprocess_data(df, target_column):
    # Drop the column 'TBG' as it only contains missing values
    df = df.drop(columns=['TBG'])
    
    # Separate features and target
    X = df.drop(columns=[target_column])
    y = df[target_column].values
    
    # Identify numeric and categorical columns
    numeric_features = X.select_dtypes(include=['int64', 'float64']).columns
    categorical_features = X.select_dtypes(include=['object']).columns
    
    # Create preprocessing pipelines for both numeric and categorical data
    numeric_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='mean')),  # Impute missing values with mean
        ('scaler', StandardScaler())  # Standardize numeric features
    ])

    categorical_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='most_frequent')),  # Impute missing values with most frequent value
        ('onehot', OneHotEncoder(handle_unknown='ignore'))  # One-hot encode categorical features
    ])

    # Combine preprocessing steps
    preprocessor = ColumnTransformer(
        transformers=[
            ('num', numeric_transformer, numeric_features),
            ('cat', categorical_transformer, categorical_features)
        ]
    )
    
    # Fit and transform the features
    X_preprocessed = preprocessor.fit_transform(X)
    
    return X_preprocessed, y


# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='sick', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='sick', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='sick', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = f"Results/sick_first_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/sick_first_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.968212,0.84375,0.586957,0.692308,KNNClassifier
1,2,0.972185,0.964286,0.574468,0.72,KNNClassifier
2,3,0.966844,0.83871,0.565217,0.675325,KNNClassifier
3,4,0.970822,0.928571,0.565217,0.702703,KNNClassifier
4,5,0.96817,0.84375,0.586957,0.692308,KNNClassifier


<a id="4.2.3-ozone"></a>
#### 4.2.3. ozone-level-8hr

**Relevant aspects**:

- Distribution of target variable ("Class"): "1" (2374) / "2" (160).

- All other attributes are numeric.

- No missing values.

In [24]:
from knn import KNNClassifier

# Define parameters for the dataset
file_path = 'DS/ozone-level-8hr.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'Class'
model = KNNClassifier(k=5)
model_name = 'KNNClassifier'

# Preprocess the data
def preprocess_data(df, target_column):
    X = df.drop(columns=[target_column]).values
    y = df[target_column].values
    return X, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Standardize the data
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='2', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='2', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='2', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = "Results/ozone-level-8hr_first_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/ozone-level-8hr_first_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.930966,0.333333,0.09375,0.146341,KNNClassifier
1,2,0.936884,0.5,0.25,0.333333,KNNClassifier
2,3,0.930966,0.4,0.1875,0.255319,KNNClassifier
3,4,0.940828,0.583333,0.21875,0.318182,KNNClassifier
4,5,0.936759,0.5,0.0625,0.111111,KNNClassifier


<a id="4.2.4-pc1"></a>
#### 4.2.4. pc1

**Relevant aspects**:

- Distribution of target variable ("defects"): "false" (1032) / "true" (77).

- All other attributes are numeric.

- No missing values.

In [25]:
from knn import KNNClassifier

# Define parameters for the dataset
file_path = 'DS/pc1.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'defects'
model = KNNClassifier(k=5)
model_name = 'KNNClassifier'

# Preprocess the data
def preprocess_data(df, target_column):
    X = df.drop(columns=[target_column]).values
    y = df[target_column].values
    return X, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Standardize the data
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='true', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='true', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='true', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = "Results/pc1_first_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/pc1_first_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.936937,0.571429,0.266667,0.363636,KNNClassifier
1,2,0.927928,0.4,0.133333,0.2,KNNClassifier
2,3,0.914414,0.363636,0.25,0.296296,KNNClassifier
3,4,0.932432,0.6,0.1875,0.285714,KNNClassifier
4,5,0.918552,0.285714,0.133333,0.181818,KNNClassifier


<a id="4.2.5-climate"></a>
#### 4.2.5. climate-model-simulation-crashes

**Relevant aspects**:

- Distribution of target variable ("outcome"): "0" (46) / "1" (494).

- All other attributes are numeric.

- No missing values.

In [26]:
from knn import KNNClassifier

# Define parameters for the dataset
file_path = 'DS/climate-model-simulation-crashes.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'outcome'
model = KNNClassifier(k=5)
model_name = 'KNNClassifier'

# Preprocess the data
def preprocess_data(df, target_column):
    X = df.drop(columns=[target_column]).values
    y = df[target_column].values
    return X, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Standardize the data
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='0', average='binary', zero_division=0))  
        results['Recall'].append(recall_score(y_test, predictions, pos_label='0', average='binary', zero_division=0))  
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='0', average='binary', zero_division=0))  
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = "Results/climate-model-simulation-crashes_first_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/climate-model-simulation-crashes_first_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.916667,0.666667,0.2,0.307692,KNNClassifier
1,2,0.907407,0.0,0.0,0.0,KNNClassifier
2,3,0.935185,1.0,0.222222,0.363636,KNNClassifier
3,4,0.916667,0.0,0.0,0.0,KNNClassifier
4,5,0.916667,0.5,0.111111,0.181818,KNNClassifier


<a id="4.2.6-pc3"></a>
#### 4.2.6. pc3

**Relevant aspects**:

- Distribution of target variable ("c"): "FALSE" (1403) / "TRUE" (160).

- All other attributes are numeric.

- No missing values.

In [27]:
from knn import KNNClassifier

# Define parameters for the dataset
file_path = 'DS/pc3.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'c'
model = KNNClassifier(k=5)
model_name = 'KNNClassifier'

# Preprocess the data
def preprocess_data(df, target_column):
    X = df.drop(columns=[target_column]).values
    y = df[target_column].values
    return X, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Standardize the data
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='TRUE', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='TRUE', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='TRUE', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = "Results/pc3_first_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/pc3_first_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.900958,0.545455,0.1875,0.27907,KNNClassifier
1,2,0.891374,0.428571,0.1875,0.26087,KNNClassifier
2,3,0.884984,0.375,0.1875,0.25,KNNClassifier
3,4,0.878205,0.2,0.0625,0.095238,KNNClassifier
4,5,0.88141,0.272727,0.09375,0.139535,KNNClassifier


<a id="4.2.7-pc4"></a>
#### 4.2.7. pc4

**Relevant aspects**:

- Distribution of target variable ("c"): "FALSE" (1280) / "TRUE" (178).

- All other attributes are numeric.

- No missing values.

In [28]:
from knn import KNNClassifier

# Define parameters for the dataset
file_path = 'DS/pc4.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'c'
model = KNNClassifier(k=5)
model_name = 'KNNClassifier'

# Preprocess the data
def preprocess_data(df, target_column):
    X = df.drop(columns=[target_column]).values
    y = df[target_column].values
    return X, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Standardize the data
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='TRUE', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='TRUE', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='TRUE', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = "Results/pc4_first_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/pc4_first_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.880137,0.526316,0.277778,0.363636,KNNClassifier
1,2,0.876712,0.5,0.361111,0.419355,KNNClassifier
2,3,0.893836,0.619048,0.361111,0.45614,KNNClassifier
3,4,0.883162,0.526316,0.285714,0.37037,KNNClassifier
4,5,0.914089,0.727273,0.457143,0.561404,KNNClassifier


<a id="4.2.8-internet"></a>
#### 4.2.8. Internet-Advertisements

**Relevant aspects**:

- Distribution of target variable ("class"): "ad" (459) / "noad" (2820).

- All other attributes are nominal ("0" or "1").

- No missing values.

In [29]:
from knn import KNNClassifier

# Define parameters for the dataset
file_path = 'DS/Internet-Advertisements.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'class'
model = KNNClassifier(k=5)
model_name = 'KNNClassifier'

# Preprocess the data
def preprocess_data(df, target_column):
    X = df.drop(columns=[target_column]).values
    y = df[target_column].values
    return X, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Standardize the data
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='ad', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='ad', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='ad', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = "Results/Internet-Advertisements_first_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/Internet-Advertisements_first_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.946646,0.952381,0.652174,0.774194,KNNClassifier
1,2,0.96189,0.924051,0.793478,0.853801,KNNClassifier
2,3,0.949695,0.953846,0.673913,0.789809,KNNClassifier
3,4,0.954268,0.918919,0.73913,0.819277,KNNClassifier
4,5,0.961832,0.945946,0.769231,0.848485,KNNClassifier


<a id="4.2.9-churn"></a>
#### 4.2.9. churn

**Relevant aspects**:

- Distribution of target variable ("class"): "0" (4293) / "1" (707).

- Most attributes are numeric, but some are nominal.

- No missing values.

**Key Changes** (related to data preprocessing):

- Handle Numeric and Nominal Attributes: The ColumnTransformer is used to apply different preprocessing steps to numeric and categorical columns.

- One-Hot Encoding: Categorical features are one-hot encoded.

- Standardization: Numeric features are standardized using StandardScaler.

In [30]:
from knn import KNNClassifier

# Define parameters for the dataset
file_path = 'DS/churn.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'class'
model = KNNClassifier(k=5)
model_name = 'KNNClassifier'

# Preprocess the data
def preprocess_data(df, target_column):
    # Separate features and target
    X = df.drop(columns=[target_column])
    y = df[target_column].values
    
    # Identify numeric and categorical columns
    numeric_features = X.select_dtypes(include=['int64', 'float64']).columns
    categorical_features = X.select_dtypes(include=['object']).columns
    
    # Create preprocessing pipelines for both numeric and categorical data
    numeric_transformer = Pipeline(steps=[
        ('scaler', StandardScaler())  # Standardize numeric features
    ])

    categorical_transformer = Pipeline(steps=[
        ('onehot', OneHotEncoder(handle_unknown='ignore'))  # One-hot encode categorical features
    ])

    # Combine preprocessing steps
    preprocessor = ColumnTransformer(
        transformers=[
            ('num', numeric_transformer, numeric_features),
            ('cat', categorical_transformer, categorical_features)
        ]
    )
    
    # Fit and transform the features
    X_preprocessed = preprocessor.fit_transform(X)
    
    return X_preprocessed, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='1', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='1', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='1', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = "Results/churn_first_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/churn_first_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.875,0.785714,0.156028,0.260355,KNNClassifier
1,2,0.886,0.846154,0.234043,0.366667,KNNClassifier
2,3,0.891,0.970588,0.234043,0.377143,KNNClassifier
3,4,0.879,0.8,0.197183,0.316384,KNNClassifier
4,5,0.885,0.909091,0.211268,0.342857,KNNClassifier


<a id="4.2.10-kc1"></a>
#### 4.2.10. kc1

**Relevant aspects**:

- Distribution of target variable ("defects"): "false" (1783) / "true" (326).

- All other attributes are numeric.

- No missing values.

In [31]:
from knn import KNNClassifier

# Define parameters for the dataset
file_path = 'DS/kc1.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'defects'
model = KNNClassifier(k=5)
model_name = 'KNNClassifier'

# Preprocess the data
def preprocess_data(df, target_column):
    X = df.drop(columns=[target_column]).values
    y = df[target_column].values
    return X, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Standardize the data
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='true', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='true', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='true', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = "Results/kc1_first_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/kc1_first_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.812796,0.333333,0.215385,0.261682,KNNClassifier
1,2,0.812796,0.305556,0.169231,0.217822,KNNClassifier
2,3,0.819905,0.387755,0.292308,0.333333,KNNClassifier
3,4,0.824645,0.431034,0.378788,0.403226,KNNClassifier
4,5,0.845606,0.5,0.246154,0.329897,KNNClassifier


<a id="4.3-Results"></a>
### 4.3. Results

After evaluating each dataset individually, I created a script that iterates over the CSV files and analyzes the results. The script reads each .csv file in the Results folder, calculates the **mean** and **standard deviation** for **accuracy**, **precision**, **recall**, and **F1 score**. It compiles these statistics into a summary DataFrame, providing a comprehensive overview of the model's performance across all datasets. The summary is then saved to a new CSV file named **summary_first_analysis.csv** for easy reference and further analysis.

In [3]:
import pandas as pd
import glob
import os

# Function to analyze a single results CSV file
def analyze_results(file_path, benchmark):
    df = pd.read_csv(file_path)
    
    # Calculate mean and standard deviation for each metric
    summary = {
        'Dataset': os.path.basename(file_path).replace(f'_{benchmark}_benchmark.csv', ''),
        'Mean Accuracy': df['Accuracy'].mean(),
        'Std Accuracy': df['Accuracy'].std(),
        'Mean Precision': df['Precision'].mean(),
        'Std Precision': df['Precision'].std(),
        'Mean Recall': df['Recall'].mean(),
        'Std Recall': df['Recall'].std(),
        'Mean F1 Score': df['F1 Score'].mean(),
        'Std F1 Score': df['F1 Score'].std()
    }
    
    return summary

# Main function to analyze all results
def analyze_all_results(results_folder, benchmark):
    results_files = glob.glob(os.path.join(results_folder, f'*_{benchmark}_benchmark.csv'))
    analysis_results = []

    for file_path in results_files:
        summary = analyze_results(file_path, benchmark)
        analysis_results.append(summary)

    # Create a summary DataFrame
    summary_df = pd.DataFrame(analysis_results)
    summary_df = summary_df[['Dataset', 'Mean Accuracy', 'Std Accuracy', 'Mean Precision', 'Std Precision', 'Mean Recall', 'Std Recall', 'Mean F1 Score', 'Std F1 Score']]
    
    # Sort the summary DataFrame by Dataset name
    summary_df = summary_df.sort_values(by='Dataset').reset_index(drop=True)
    
    # Save the summary DataFrame to a CSV file
    summary_file = os.path.join(results_folder, f'summary_{benchmark}_analysis.csv')
    summary_df.to_csv(summary_file, index=False)
    print(f'Summary analysis saved to {summary_file}')
    
    return summary_df

# Define the results folder and benchmark type
results_folder = 'Results'
benchmark = 'first'  # Specify the benchmark type as 'first'

# Run the analysis
summary_df = analyze_all_results(results_folder, benchmark)

# Display the summary DataFrame
print(summary_df)



Summary analysis saved to Results/summary_first_analysis.csv
                            Dataset  Mean Accuracy  Std Accuracy  \
0           Internet-Advertisements       0.954866      0.006938   
1                             churn       0.883200      0.006261   
2  climate-model-simulation-crashes       0.918519      0.010143   
3                               kc1       0.823150      0.013522   
4                   ozone-level-8hr       0.935281      0.004265   
5                               pc1       0.926053      0.009413   
6                               pc3       0.887386      0.009026   
7                               pc4       0.889587      0.015123   
8                              sick       0.969247      0.002187   
9                              wilt       0.969003      0.005261   

   Mean Precision  Std Precision  Mean Recall  Std Recall  Mean F1 Score  \
0        0.939029       0.016389     0.725585    0.060739       0.817113   
1        0.862309       0.077335     0

As expected with **imbalanced** binary datasets, there is a notable disparity between accuracy and other performance metrics such as precision, recall, and F1 score.

**Key notes**:

- The accuracy scores across datasets are generally high, ranging from 82% to 97%, which can be misleading.

- Despite high accuracy, the precision, recall, and F1 score are significantly lower in many datasets. For example, the pc1 dataset has an accuracy of 92.61%, but much lower precision (44.42%), recall (19.42%), and F1 score (26.55%).

<a id="5-KNN-variant-proposal"></a>
## 5. K-NN's variant proposal and implementation

In an effort to address the challenges associated with **class imbalance in binary classification problems**, some modifications were made to the original algorithm. These changes were implemented in a new version of the algorithm (in a file titled **knn2.py**, located in the 'MLAlgorithms' folder). The modifications involved adaptations to the original KNNClassifier, now renamed **KNNVariant**, derived from an adjusted base class KNNBase(BaseEstimator). The primary adjustments focus on dynamic neighbor selection, enhanced influence through distance weighting, and the incorporation of distance thresholds to better manage data imbalance. Here’s an expanded comparison between the original KNNClassifier and the revised KNNVariant:

**Dynamic k-Value**:
- Original: The original classifier uses a static k, defined during initialization, which does not change regardless of the data distribution or any local variations within the dataset.
- Variant: In contrast, the KNNVariant adjusts k dynamically. The adjustment is based on the average distance of the nearest neighbors:
    - If the average distance exceeds an upper threshold (set at the 75th percentile of all pairwise distances in the training data), k is increased but does not surpass max_k.
    - If the average distance falls below a lower threshold (set at the 25th percentile), k is decreased to no less than min_k.
 
**Threshold Calculation**
- Original: The concept of dynamically adjusting thresholds based on training data distances does not exist in the original model.
- Variant: The KNNVariant computes these thresholds during the fitting process. It uses the 75th and 25th percentiles of distances as upper and lower thresholds, respectively. These thresholds serve as benchmarks to dynamically adjust k, enhancing the model’s responsiveness to varying data distributions.

**Distance Weighting**:
- Original: The original method considers only the nearest k neighbors for voting, with each neighbor contributing equally to the decision, irrespective of their distance from the query point.
- Variant: The KNNVariant introduces an option to weight the influence of each neighbor by the inverse of their distance (1/dist). This weighting scheme prioritizes closer neighbors, allowing for a more nuanced decision-making process that reflects the immediate data structure around the query point.

**Aggregation Method**
- Original: Typically employs a straightforward majority voting system among the chosen k neighbors to classify a new instance.
-Variant: Provides flexibility by supporting both weighted and unweighted aggregation methods. If weight_distance is set to True, the votes are weighted by the inverse distances, promoting decisions that are more reflective of closer and potentially more relevant neighbors. This can be particularly advantageous in scenarios where minority classes are at risk of being overshadowed by more distant majority class neighbors.

These enhancements in the KNNVariant were designed to **improve** the classifier's performance on datasets with imbalanced classes by making the classification decision more sensitive to the local structure of the data and **reducing the bias** towards the majority class. The use of dynamic k values and distance-based weighting in the KNNVariant aims to boost precision, recall, and the F1 score, making it a potentially more robust choice for challenging classification tasks.

In [None]:
from collections import Counter
import numpy as np
import itertools
from scipy.spatial.distance import euclidean

# class BaseEstimator remains the same

class KNNBase(BaseEstimator):
    def __init__(self, k=5, max_k=9, min_k=3, distance_func=euclidean, weight_distance=True):
        """
        Base class for K Nearest Neighbors classifier with dynamic k
        """
        self.k = k
        self.max_k = max_k
        self.min_k = min_k
        self.distance_func = distance_func
        self.weight_distance = weight_distance
        self.upper_threshold = None
        self.lower_threshold = None

    def aggregate(self, neighbors_targets, neighbors_weights=None):
        raise NotImplementedError

    def _predict(self, X=None):
        predictions = [self._predict_x(x) for x in X]

        return np.array(predictions)

    def fit(self, X, y=None):
        super()._setup_input(X, y)
        # Calculate distance thresholds based on training data
        self._calculate_thresholds()

    def _calculate_thresholds(self):
        # Calculate distances between all pairs in training data
        distances = [self.distance_func(x, y) for x, y in itertools.combinations(self.X, 2)]
        distances = np.array(distances)
        self.upper_threshold = np.percentile(distances, 75)  # 75th percentile
        self.lower_threshold = np.percentile(distances, 25)  # 25th percentile

    def _predict_x(self, x):
        distances = np.array([self.distance_func(x, example) for example in self.X])
        sorted_neighbors = sorted(((dist, target) for (dist, target) in zip(distances, self.y)), key=lambda x: x[0])

        dynamic_k = self.k
        neighbors_targets = []
        neighbors_weights = []

        while True:
            avg_distance = np.mean([dist for (dist, _) in sorted_neighbors[:dynamic_k]])
            if avg_distance > self.upper_threshold and dynamic_k < self.max_k:
                dynamic_k = min(self.max_k, dynamic_k + 2)
            elif avg_distance < self.lower_threshold and dynamic_k > self.min_k:
                dynamic_k = max(self.min_k, dynamic_k - 2)
            else:
                break

        neighbors_targets = [target for (_, target) in sorted_neighbors[:dynamic_k]]
        neighbors_weights = [1/dist if dist != 0 else 1e-5 for (dist, _) in sorted_neighbors[:dynamic_k]]

        return self.aggregate(neighbors_targets, neighbors_weights if self.weight_distance else None)
        

class KNNVariant(KNNBase):
    """
    Nearest neighbors classifier.

    Note: if there is a tie for the most common label among the neighbors, then
    the predicted label is arbitrary. This class extends KNNBase and implements
    the aggregate method for making predictions based on neighbor voting.
    """

    def __init__(self, k=5, max_k=10, min_k=3, distance_func=euclidean, weight_distance=True):
        """
        Initialize the KNNVariant with the same parameters as KNNBase,
        ensuring all are passed correctly.
        """
        super().__init__(k=k, max_k=max_k, min_k=min_k, distance_func=distance_func, weight_distance=weight_distance)

    def aggregate(self, neighbors_targets, neighbors_weights=None):
        """
        Return the most common target label, considering weights if provided.
        """
        if neighbors_weights:
            weighted_vote = Counter()
            for label, weight in zip(neighbors_targets, neighbors_weights):
                weighted_vote[label] += weight
            most_common_label = weighted_vote.most_common(1)[0][0]
        else:
            most_common_label = Counter(neighbors_targets).most_common(1)[0][0]
        return most_common_label


<a id="6-Benchmark-second-evaluation"></a>
## 6. Benchmark Second Evaluation

<a id="6.1-DS-evaluation"></a>
### 6.1. Datasets Evaluation

The same process of evaluation, previously explained in the First Benchmark Evaluation, was applied to all datasets, but now using the **KNNVariant** included in **knn2.py**.

<a id="6.1.1-wilt"></a>
#### 6.1.1. wilt

In [4]:
from knn2 import KNNVariant

# Define parameters for the dataset
file_path = 'DS/wilt.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'class'
model = KNNVariant(k=5, max_k=9, min_k=3, weight_distance=True)
model_name = 'KNNVariant'

# Preprocess the data
def preprocess_data(df, target_column):
    X = df.drop(columns=[target_column]).values
    y = df[target_column].values
    return X, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Standardize the data
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='2', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='2', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='2', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = f"Results/wilt_second_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/wilt_second_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.96281,0.793103,0.433962,0.560976,KNNVariant
1,2,0.974174,0.885714,0.596154,0.712644,KNNVariant
2,3,0.970041,0.870968,0.519231,0.650602,KNNVariant
3,4,0.970041,0.896552,0.5,0.641975,KNNVariant
4,5,0.977249,0.96875,0.596154,0.738095,KNNVariant


<a id="6.1.2-sick"></a>
#### 6.1.2. sick

In [5]:
from knn2 import KNNVariant

# Define parameters for the dataset
file_path = 'DS/sick.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'Class'
model = KNNVariant(k=5, max_k=9, min_k=3, weight_distance=True)
model_name = 'KNNVariant'


def preprocess_data(df, target_column):
    # Drop the column 'TBG' as it only contains missing values
    df = df.drop(columns=['TBG'])
    
    # Separate features and target
    X = df.drop(columns=[target_column])
    y = df[target_column].values
    
    # Identify numeric and categorical columns
    numeric_features = X.select_dtypes(include=['int64', 'float64']).columns
    categorical_features = X.select_dtypes(include=['object']).columns
    
    # Create preprocessing pipelines for both numeric and categorical data
    numeric_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='mean')),  # Impute missing values with mean
        ('scaler', StandardScaler())  # Standardize numeric features
    ])

    categorical_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='most_frequent')),  # Impute missing values with most frequent value
        ('onehot', OneHotEncoder(handle_unknown='ignore'))  # One-hot encode categorical features
    ])

    # Combine preprocessing steps
    preprocessor = ColumnTransformer(
        transformers=[
            ('num', numeric_transformer, numeric_features),
            ('cat', categorical_transformer, categorical_features)
        ]
    )
    
    # Fit and transform the features
    X_preprocessed = preprocessor.fit_transform(X)
    
    return X_preprocessed, y


# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='sick', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='sick', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='sick', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = f"Results/sick_second_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/sick_second_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.965563,0.794118,0.586957,0.675,KNNVariant
1,2,0.966887,0.892857,0.531915,0.666667,KNNVariant
2,3,0.962865,0.75,0.586957,0.658537,KNNVariant
3,4,0.965517,0.857143,0.521739,0.648649,KNNVariant
4,5,0.969496,0.848485,0.608696,0.708861,KNNVariant


<a id="6.1.3-ozone"></a>
#### 6.1.3. ozone-level-8hr

In [7]:
from knn2 import KNNVariant

# Define parameters for the dataset
file_path = 'DS/ozone-level-8hr.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'Class'
model = KNNVariant(k=5, max_k=9, min_k=3, weight_distance=True)
model_name = 'KNNVariant'

# Preprocess the data
def preprocess_data(df, target_column):
    X = df.drop(columns=[target_column]).values
    y = df[target_column].values
    return X, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Standardize the data
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='2', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='2', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='2', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = "Results/ozone-level-8hr_second_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/ozone-level-8hr_second_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.934911,0.461538,0.1875,0.266667,KNNVariant
1,2,0.942801,0.571429,0.375,0.45283,KNNVariant
2,3,0.930966,0.428571,0.28125,0.339623,KNNVariant
3,4,0.952663,0.75,0.375,0.5,KNNVariant
4,5,0.934783,0.466667,0.21875,0.297872,KNNVariant


<a id="6.1.4-pc1"></a>
#### 6.1.4. pc1

In [8]:
from knn2 import KNNVariant

# Define parameters for the dataset
file_path = 'DS/pc1.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'defects'
model = KNNVariant(k=5, max_k=9, min_k=3, weight_distance=True)
model_name = 'KNNVariant'

# Preprocess the data
def preprocess_data(df, target_column):
    X = df.drop(columns=[target_column]).values
    y = df[target_column].values
    return X, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Standardize the data
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='true', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='true', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='true', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = "Results/pc1_second_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/pc1_second_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.941441,0.6,0.4,0.48,KNNVariant
1,2,0.918919,0.333333,0.2,0.25,KNNVariant
2,3,0.90991,0.375,0.375,0.375,KNNVariant
3,4,0.923423,0.444444,0.25,0.32,KNNVariant
4,5,0.918552,0.384615,0.333333,0.357143,KNNVariant


<a id="6.1.5-climate"></a>
#### 6.1.5. climate-model-simulation-crashes

In [21]:
from knn2 import KNNVariant

# Define parameters for the dataset
file_path = 'DS/climate-model-simulation-crashes.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'outcome'
model = KNNVariant(k=5, max_k=9, min_k=3, weight_distance=True)
model_name = 'KNNVariant'

# Preprocess the data
def preprocess_data(df, target_column):
    X = df.drop(columns=[target_column]).values
    y = df[target_column].values
    return X, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Standardize the data
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='0', average='binary', zero_division=0))  
        results['Recall'].append(recall_score(y_test, predictions, pos_label='0', average='binary', zero_division=0))  
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='0', average='binary', zero_division=0))  
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = "Results/climate-model-simulation-crashes_second_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/climate-model-simulation-crashes_second_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.916667,0.666667,0.2,0.307692,KNNVariant
1,2,0.898148,0.25,0.111111,0.153846,KNNVariant
2,3,0.916667,0.5,0.222222,0.307692,KNNVariant
3,4,0.916667,0.0,0.0,0.0,KNNVariant
4,5,0.907407,0.4,0.222222,0.285714,KNNVariant


<a id="6.1.6-pc3"></a>
#### 6.1.6 pc3

In [10]:
from knn2 import KNNVariant

# Define parameters for the dataset
file_path = 'DS/pc3.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'c'
model = KNNVariant(k=5, max_k=9, min_k=3, weight_distance=True)
model_name = 'KNNVariant'

# Preprocess the data
def preprocess_data(df, target_column):
    X = df.drop(columns=[target_column]).values
    y = df[target_column].values
    return X, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Standardize the data
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='TRUE', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='TRUE', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='TRUE', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = "Results/pc3_second_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/pc3_second_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.891374,0.428571,0.1875,0.26087,KNNVariant
1,2,0.881789,0.368421,0.21875,0.27451,KNNVariant
2,3,0.872204,0.333333,0.25,0.285714,KNNVariant
3,4,0.891026,0.4375,0.21875,0.291667,KNNVariant
4,5,0.875,0.315789,0.1875,0.235294,KNNVariant


<a id="6.1.7-pc4"></a>
#### 6.1.7. pc4

In [11]:
from knn2 import KNNVariant

# Define parameters for the dataset
file_path = 'DS/pc4.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'c'
model = KNNVariant(k=5, max_k=9, min_k=3, weight_distance=True)
model_name = 'KNNVariant'

# Preprocess the data
def preprocess_data(df, target_column):
    X = df.drop(columns=[target_column]).values
    y = df[target_column].values
    return X, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Standardize the data
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='TRUE', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='TRUE', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='TRUE', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = "Results/pc4_second_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/pc4_second_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.869863,0.464286,0.361111,0.40625,KNNVariant
1,2,0.883562,0.535714,0.416667,0.46875,KNNVariant
2,3,0.863014,0.433333,0.361111,0.393939,KNNVariant
3,4,0.900344,0.625,0.428571,0.508475,KNNVariant
4,5,0.90378,0.62069,0.514286,0.5625,KNNVariant


<a id="6.1.8-internet"></a>
#### 6.1.8. Internet-Advertisements

In [12]:
from knn2 import KNNVariant

# Define parameters for the dataset
file_path = 'DS/Internet-Advertisements.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'class'
model = KNNVariant(k=5, max_k=9, min_k=3, weight_distance=True)
model_name = 'KNNVariant'

# Preprocess the data
def preprocess_data(df, target_column):
    X = df.drop(columns=[target_column]).values
    y = df[target_column].values
    return X, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Standardize the data
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='ad', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='ad', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='ad', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = "Results/Internet-Advertisements_second_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/Internet-Advertisements_second_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.949695,0.927536,0.695652,0.795031,KNNVariant
1,2,0.967988,0.908046,0.858696,0.882682,KNNVariant
2,3,0.946646,0.901408,0.695652,0.785276,KNNVariant
3,4,0.948171,0.881579,0.728261,0.797619,KNNVariant
4,5,0.958779,0.932432,0.758242,0.836364,KNNVariant


<a id="6.1.9-churn"></a>
#### 6.1.9. churn

In [13]:
from knn2 import KNNVariant

# Define parameters for the dataset
file_path = 'DS/churn.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'class'
model = KNNVariant(k=5, max_k=9, min_k=3, weight_distance=True)
model_name = 'KNNVariant'

# Preprocess the data
def preprocess_data(df, target_column):
    # Separate features and target
    X = df.drop(columns=[target_column])
    y = df[target_column].values
    
    # Identify numeric and categorical columns
    numeric_features = X.select_dtypes(include=['int64', 'float64']).columns
    categorical_features = X.select_dtypes(include=['object']).columns
    
    # Create preprocessing pipelines for both numeric and categorical data
    numeric_transformer = Pipeline(steps=[
        ('scaler', StandardScaler())  # Standardize numeric features
    ])

    categorical_transformer = Pipeline(steps=[
        ('onehot', OneHotEncoder(handle_unknown='ignore'))  # One-hot encode categorical features
    ])

    # Combine preprocessing steps
    preprocessor = ColumnTransformer(
        transformers=[
            ('num', numeric_transformer, numeric_features),
            ('cat', categorical_transformer, categorical_features)
        ]
    )
    
    # Fit and transform the features
    X_preprocessed = preprocessor.fit_transform(X)
    
    return X_preprocessed, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='1', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='1', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='1', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = "Results/churn_second_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/churn_second_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.874,0.659574,0.219858,0.329787,KNNVariant
1,2,0.876,0.644068,0.269504,0.38,KNNVariant
2,3,0.886,0.787234,0.262411,0.393617,KNNVariant
3,4,0.866,0.595238,0.176056,0.271739,KNNVariant
4,5,0.883,0.755102,0.260563,0.387435,KNNVariant


<a id="6.1.10-kc1"></a>
#### 6.1.10. kc1

In [14]:
from knn2 import KNNVariant

# Define parameters for the dataset
file_path = 'DS/kc1.arff'
df = load_arff_to_dataframe(file_path)
target_column = 'defects'
model = KNNVariant(k=5, max_k=9, min_k=3, weight_distance=True)
model_name = 'KNNVariant'

# Preprocess the data
def preprocess_data(df, target_column):
    X = df.drop(columns=[target_column]).values
    y = df[target_column].values
    return X, y

# Model evaluation
def evaluate_model(X, y, model, model_name, n_splits, random_seed=42):
    # Stratified 5-Fold cross-validation
    kf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_seed)
    results = {
        'Fold': [],
        'Accuracy': [],
        'Precision': [],
        'Recall': [],
        'F1 Score': []
    }
    
    fold_idx = 1
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        # Standardize the data
        scaler = StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)
        
        # Fit the model and make predictions
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        
        # Record metrics
        results['Fold'].append(fold_idx)
        results['Accuracy'].append(accuracy_score(y_test, predictions))
        results['Precision'].append(precision_score(y_test, predictions, pos_label='true', average='binary'))
        results['Recall'].append(recall_score(y_test, predictions, pos_label='true', average='binary'))
        results['F1 Score'].append(f1_score(y_test, predictions, pos_label='true', average='binary'))
        
        fold_idx += 1
    
    # Convert results to DataFrame and save to CSV
    results_df = pd.DataFrame(results)
    results_df['Classifier'] = model_name
    results_file = "Results/kc1_second_benchmark.csv"
    results_df.to_csv(results_file, index=False)
    print(f'Results saved to {results_file}')
    return results_df

# Main function to run the evaluation
def run_evaluation(file_path, target_column, model, model_name, n_splits=5):
    df = load_arff_to_dataframe(file_path)
    X, y = preprocess_data(df, target_column)
    results_df = evaluate_model(X, y, model, model_name, n_splits)
    return results_df

# Run evaluation for the wilt dataset
results_df = run_evaluation(file_path, target_column, model, model_name)

# Display the results
results_df

Results saved to Results/kc1_second_benchmark.csv


Unnamed: 0,Fold,Accuracy,Precision,Recall,F1 Score,Classifier
0,1,0.791469,0.305085,0.276923,0.290323,KNNVariant
1,2,0.800948,0.27907,0.184615,0.222222,KNNVariant
2,3,0.781991,0.304348,0.323077,0.313433,KNNVariant
3,4,0.741706,0.282828,0.424242,0.339394,KNNVariant
4,5,0.809976,0.368421,0.323077,0.344262,KNNVariant


<a id="6.2-Results"></a>
### 6.2. Results and Discussion

After evluating the datasets with the KNNVariant, I applied the script that iterates over the CSV files and analyzes the results. As prviously explained, the script reads each .csv file in the Results folder, calculates the **mean** and **standard deviation** for **accuracy**, **precision**, **recall**, and **F1 score**. It compiles these statistics into a summary DataFrame, providing a comprehensive overview of the model's performance across all datasets. The summary is then saved to a new CSV file named **summary_second_analysis.csv** for easy reference and further analysis.

In [4]:
import pandas as pd
import glob
import os

# Function to analyze a single results CSV file
def analyze_results(file_path, benchmark):
    df = pd.read_csv(file_path)
    
    # Calculate mean and standard deviation for each metric
    summary = {
        'Dataset': os.path.basename(file_path).replace(f'_{benchmark}_benchmark.csv', ''),
        'Mean Accuracy': df['Accuracy'].mean(),
        'Std Accuracy': df['Accuracy'].std(),
        'Mean Precision': df['Precision'].mean(),
        'Std Precision': df['Precision'].std(),
        'Mean Recall': df['Recall'].mean(),
        'Std Recall': df['Recall'].std(),
        'Mean F1 Score': df['F1 Score'].mean(),
        'Std F1 Score': df['F1 Score'].std()
    }
    
    return summary

# Main function to analyze all results
def analyze_all_results(results_folder, benchmark):
    results_files = glob.glob(os.path.join(results_folder, f'*_{benchmark}_benchmark.csv'))
    analysis_results = []

    for file_path in results_files:
        summary = analyze_results(file_path, benchmark)
        analysis_results.append(summary)

    # Create a summary DataFrame
    summary_df = pd.DataFrame(analysis_results)
    summary_df = summary_df[['Dataset', 'Mean Accuracy', 'Std Accuracy', 'Mean Precision', 'Std Precision', 'Mean Recall', 'Std Recall', 'Mean F1 Score', 'Std F1 Score']]
    
    # Sort the summary DataFrame by Dataset name
    summary_df = summary_df.sort_values(by='Dataset').reset_index(drop=True)
    
    # Save the summary DataFrame to a CSV file
    summary_file = os.path.join(results_folder, f'summary_{benchmark}_analysis.csv')
    summary_df.to_csv(summary_file, index=False)
    print(f'Summary analysis saved to {summary_file}')
    
    return summary_df

# Define the results folder and benchmark type
results_folder = 'Results'
benchmark = 'second'  # Specify the benchmark type as 'first'

# Run the analysis
summary_df = analyze_all_results(results_folder, benchmark)

# Display the summary DataFrame
print(summary_df)

Summary analysis saved to Results/summary_second_analysis.csv
                            Dataset  Mean Accuracy  Std Accuracy  \
0           Internet-Advertisements       0.954256      0.009010   
1                             churn       0.877000      0.007874   
2  climate-model-simulation-crashes       0.911111      0.008282   
3                               kc1       0.785218      0.026473   
4                   ozone-level-8hr       0.939225      0.008659   
5                               pc1       0.922449      0.011689   
6                               pc3       0.882279      0.008859   
7                               pc4       0.884112      0.018019   
8                              sick       0.966066      0.002411   
9                              wilt       0.970863      0.005430   

   Mean Precision  Std Precision  Mean Recall  Std Recall  Mean F1 Score  \
0        0.910200       0.020591     0.747301    0.067502       0.819394   
1        0.688243       0.080145     

The following script was then generated to **compare the performance metrics** of the two algorithms. **Statistical tests** were performed to determine if the observed differences between the two algorithms were **statistically significant**. The Shapiro-Wilk test was used to check for normality, and depending on the results, either the **paired t-test** (for normally distributed differences) or the **Wilcoxon signed-rank test** (for non-normally distributed differences) was applied. The results of the comparison was saved in a file name **comparison_results_with_normality.csv**):

In [6]:
import pandas as pd
from scipy.stats import ttest_rel, shapiro, wilcoxon

# Load the summary data for both benchmarks
results_folder = 'Results'  # Define the results folder
first_summary_file = os.path.join(results_folder, 'summary_first_analysis.csv')
second_summary_file = os.path.join(results_folder, 'summary_second_analysis.csv')

first_df = pd.read_csv(first_summary_file)
second_df = pd.read_csv(second_summary_file)

# Merge the summaries on the 'Dataset' column
merged_df = pd.merge(first_df, second_df, on='Dataset', suffixes=('_first', '_second'))

# Function to perform normality test and statistical comparison
def compare_metrics(metric):
    print(f'\nComparing {metric}:')
    
    first_values = merged_df[f'Mean {metric}_first']
    second_values = merged_df[f'Mean {metric}_second']
    differences = first_values - second_values
    
    # Shapiro-Wilk test for normality
    stat, p_value_shapiro = shapiro(differences)
    print(f'Shapiro-Wilk test for normality of differences in {metric}: statistic = {stat:.3f}, p-value = {p_value_shapiro:.3f}')
    
    if p_value_shapiro > 0.05:
        print(f'Differences in {metric} are normally distributed. Using paired t-test.')
        # Paired t-test
        t_stat, p_value_ttest = ttest_rel(first_values, second_values)
        print(f'Paired t-test for {metric}: t-statistic = {t_stat:.3f}, p-value = {p_value_ttest:.3f}')
        test_result = (t_stat, p_value_ttest, 't-test')
    else:
        print(f'Differences in {metric} are not normally distributed. Using Wilcoxon signed-rank test.')
        # Wilcoxon signed-rank test
        w_stat, p_value_wilcoxon = wilcoxon(first_values, second_values)
        print(f'Wilcoxon signed-rank test for {metric}: w-statistic = {w_stat:.3f}, p-value = {p_value_wilcoxon:.3f}')
        test_result = (w_stat, p_value_wilcoxon, 'Wilcoxon')
    
    # Calculate means and standard deviations
    mean_first = first_values.mean()
    std_first = first_values.std()
    mean_second = second_values.mean()
    std_second = second_values.std()
    
    # Determine which algorithm is better based on the means
    if test_result[1] < 0.05:  # If the difference is statistically significant
        if mean_second > mean_first:
            better = 'KNNVariant'
        else:
            better = 'KNNClassifier'
    else:
        better = 'No significant difference'
    
    return test_result + (mean_first, std_first, mean_second, std_second, better)

# Metrics to compare
metrics = ['Accuracy', 'Precision', 'Recall', 'F1 Score']

# Compare metrics
results = {}
for metric in metrics:
    stat, p_value, test_used, mean_first, std_first, mean_second, std_second, better = compare_metrics(metric)
    results[metric] = {
        'statistic': stat,
        'p-value': p_value,
        'test': test_used,
        'mean_first': mean_first,
        'std_first': std_first,
        'mean_second': mean_second,
        'std_second': std_second,
        'better': better
    }

# Display the results
results_df = pd.DataFrame(results).T
print('\nComparison Results:')
print(results_df)

# Save the results to a CSV file
comparison_results_file = os.path.join(results_folder, 'comparison_results_with_normality.csv')
results_df.to_csv(comparison_results_file)
print(f'Comparison results saved to {comparison_results_file}')


Comparing Accuracy:
Shapiro-Wilk test for normality of differences in Accuracy: statistic = 0.668, p-value = 0.000
Differences in Accuracy are not normally distributed. Using Wilcoxon signed-rank test.
Wilcoxon signed-rank test for Accuracy: w-statistic = 7.000, p-value = 0.037

Comparing Precision:
Shapiro-Wilk test for normality of differences in Precision: statistic = 0.952, p-value = 0.695
Differences in Precision are normally distributed. Using paired t-test.
Paired t-test for Precision: t-statistic = 2.133, p-value = 0.062

Comparing Recall:
Shapiro-Wilk test for normality of differences in Recall: statistic = 0.952, p-value = 0.686
Differences in Recall are normally distributed. Using paired t-test.
Paired t-test for Recall: t-statistic = -4.492, p-value = 0.002

Comparing F1 Score:
Shapiro-Wilk test for normality of differences in F1 Score: statistic = 0.964, p-value = 0.829
Differences in F1 Score are normally distributed. Using paired t-test.
Paired t-test for F1 Score: t-st

**Accuracy**:
- Shapiro-Wilk test: Differences are not normally distributed (statistic = 0.668, p-value = 0.000).
- Wilcoxon signed-rank test: The difference in Accuracy is statistically significant (w-statistic = 7.000, p-value = 0.037).
- Mean Accuracy: KNNClassifier = 0.9156, KNNVariant = 0.9093
- Conclusion: KNNClassifier has a statistically significant better Accuracy.

**Precision**:
- Shapiro-Wilk test: Differences are normally distributed (statistic = 0.952, p-value = 0.695).
- Paired t-test: The difference in Precision is not statistically significant (t-statistic = 2.133, p-value = 0.062).
- Mean Precision: KNNClassifier = 0.6289, KNNVariant = 0.5857
- Conclusion: There is no significant difference in Precision between the two algorithms.

**Recall**:
- Shapiro-Wilk test: Differences are normally distributed (statistic = 0.952, p-value = 0.686).
- Paired t-test: The difference in Recall is statistically significant (t-statistic = -4.492, p-value = 0.002).
- Mean Recall: KNNClassifier = 0.3184, KNNVariant = 0.3767
- Conclusion: KNNVariant significantly improves Recall compared to KNNClassifier.

**F1 Score**:
- Shapiro-Wilk test: Differences are normally distributed (statistic = 0.964, p-value = 0.829).
- Paired t-test: The difference in F1 Score is statistically significant (t-statistic = -2.645, p-value = 0.027).
- Mean F1 Score: KNNClassifier = 0.4076, KNNVariant = 0.4483
- Conclusion: KNNVariant significantly improves F1 Score compared to KNNClassifier.

The **KNNVariant** proposed to improve binary classification performance on imbalanced datasets has demonstrated **significant improvements** in **Recall** and **F1 Score** compared to the original KNNClassifier. These improvements suggest that KNNVariant is more **effective** at handling imbalanced data, making it a better choice for applications where identifying the minority class accurately is critical.

While **Accuracy favors KNNClassifier**, and there is **no significant difference** in **Precision**, the enhanced Recall and F1 Score indicate that KNNVariant may achieve better performance in the context of imbalanced datasets. Therefore, KNNVariant is recommended for use in imbalanced scenarios and scenarios where achieving a higher F1 Score is a priority.

To further investigate the **impact of distance weighting** on the performance of the proposed KNNVariant, an additional evaluation was conducted with the distance_weight parameter set to **False**. This process was repeated for all 10 datasets used in the initial benchmarks. The results were then summarized: 

In [19]:
import pandas as pd
import glob
import os

# Function to analyze a single results CSV file
def analyze_results(file_path, benchmark):
    df = pd.read_csv(file_path)
    
    # Calculate mean and standard deviation for each metric
    summary = {
        'Dataset': os.path.basename(file_path).replace(f'_{benchmark}_benchmark.csv', ''),
        'Mean Accuracy': df['Accuracy'].mean(),
        'Std Accuracy': df['Accuracy'].std(),
        'Mean Precision': df['Precision'].mean(),
        'Std Precision': df['Precision'].std(),
        'Mean Recall': df['Recall'].mean(),
        'Std Recall': df['Recall'].std(),
        'Mean F1 Score': df['F1 Score'].mean(),
        'Std F1 Score': df['F1 Score'].std()
    }
    
    return summary

# Main function to analyze all results
def analyze_all_results(results_folder, benchmark):
    results_files = glob.glob(os.path.join(results_folder, f'*_{benchmark}_benchmark.csv'))
    analysis_results = []

    for file_path in results_files:
        summary = analyze_results(file_path, benchmark)
        analysis_results.append(summary)

    # Create a summary DataFrame
    summary_df = pd.DataFrame(analysis_results)
    summary_df = summary_df[['Dataset', 'Mean Accuracy', 'Std Accuracy', 'Mean Precision', 'Std Precision', 'Mean Recall', 'Std Recall', 'Mean F1 Score', 'Std F1 Score']]
    
    # Sort the summary DataFrame by Dataset name
    summary_df = summary_df.sort_values(by='Dataset').reset_index(drop=True)
    
    # Save the summary DataFrame to a CSV file
    summary_file = os.path.join(results_folder, f'summary_{benchmark}_analysis.csv')
    summary_df.to_csv(summary_file, index=False)
    print(f'Summary analysis saved to {summary_file}')
    
    return summary_df

# Define the results folder and benchmark type
results_folder = 'Results'
benchmark = 'additional'  # Specify the benchmark type as additional

# Run the analysis
summary_df = analyze_all_results(results_folder, benchmark)

# Display the summary DataFrame
print(summary_df)

Summary analysis saved to Results/summary_additional_analysis.csv
                            Dataset  Mean Accuracy  Std Accuracy  \
0           Internet-Advertisements       0.958830      0.009933   
1                             churn       0.877000      0.007874   
2  climate-model-simulation-crashes       0.911111      0.008282   
3                               kc1       0.793276      0.019460   
4                   ozone-level-8hr       0.939225      0.008659   
5                               pc1       0.918841      0.015305   
6                               pc3       0.889305      0.012826   
7                               pc4       0.883416      0.012203   
8                              sick       0.966861      0.001615   
9                              wilt       0.970656      0.004979   

   Mean Precision  Std Precision  Mean Recall  Std Recall  Mean F1 Score  \
0        0.933956       0.013922     0.760416    0.082204       0.836055   
1        0.688243       0.080145 

Then, these results (**summary_additional_analysis.csv**) were compared against the results obtained in the first benchmark evaluation (and saved to a file named **comparison_results_with_normality_2.csv**):

In [20]:
import pandas as pd
from scipy.stats import ttest_rel, shapiro, wilcoxon

# Load the summary data for both benchmarks
results_folder = 'Results'  # Define the results folder
first_summary_file = os.path.join(results_folder, 'summary_first_analysis.csv')
second_summary_file = os.path.join(results_folder, 'summary_additional_analysis.csv')

first_df = pd.read_csv(first_summary_file)
second_df = pd.read_csv(second_summary_file)

# Merge the summaries on the 'Dataset' column
merged_df = pd.merge(first_df, second_df, on='Dataset', suffixes=('_first', '_additional'))

# Function to perform normality test and statistical comparison
def compare_metrics(metric):
    print(f'\nComparing {metric}:')
    
    first_values = merged_df[f'Mean {metric}_first']
    second_values = merged_df[f'Mean {metric}_additional']
    differences = first_values - second_values
    
    # Shapiro-Wilk test for normality
    stat, p_value_shapiro = shapiro(differences)
    print(f'Shapiro-Wilk test for normality of differences in {metric}: statistic = {stat:.3f}, p-value = {p_value_shapiro:.3f}')
    
    if p_value_shapiro > 0.05:
        print(f'Differences in {metric} are normally distributed. Using paired t-test.')
        # Paired t-test
        t_stat, p_value_ttest = ttest_rel(first_values, second_values)
        print(f'Paired t-test for {metric}: t-statistic = {t_stat:.3f}, p-value = {p_value_ttest:.3f}')
        test_result = (t_stat, p_value_ttest, 't-test')
    else:
        print(f'Differences in {metric} are not normally distributed. Using Wilcoxon signed-rank test.')
        # Wilcoxon signed-rank test
        w_stat, p_value_wilcoxon = wilcoxon(first_values, second_values)
        print(f'Wilcoxon signed-rank test for {metric}: w-statistic = {w_stat:.3f}, p-value = {p_value_wilcoxon:.3f}')
        test_result = (w_stat, p_value_wilcoxon, 'Wilcoxon')
    
    # Calculate means and standard deviations
    mean_first = first_values.mean()
    std_first = first_values.std()
    mean_second = second_values.mean()
    std_second = second_values.std()
    
    # Determine which algorithm is better based on the means
    if test_result[1] < 0.05:  # If the difference is statistically significant
        if mean_second > mean_first:
            better = 'KNNVariant'
        else:
            better = 'KNNClassifier'
    else:
        better = 'No significant difference'
    
    return test_result + (mean_first, std_first, mean_second, std_second, better)

# Metrics to compare
metrics = ['Accuracy', 'Precision', 'Recall', 'F1 Score']

# Compare metrics
results = {}
for metric in metrics:
    stat, p_value, test_used, mean_first, std_first, mean_second, std_second, better = compare_metrics(metric)
    results[metric] = {
        'statistic': stat,
        'p-value': p_value,
        'test': test_used,
        'mean_first': mean_first,
        'std_first': std_first,
        'mean_second': mean_second,
        'std_second': std_second,
        'better': better
    }

# Display the results
results_df = pd.DataFrame(results).T
print('\nComparison Results:')
print(results_df)

# Save the results to a CSV file
comparison_results_file = os.path.join(results_folder, 'comparison_results_with_normality_2.csv')
results_df.to_csv(comparison_results_file)
print(f'Comparison results saved to {comparison_results_file}')



Comparing Accuracy:
Shapiro-Wilk test for normality of differences in Accuracy: statistic = 0.772, p-value = 0.007
Differences in Accuracy are not normally distributed. Using Wilcoxon signed-rank test.
Wilcoxon signed-rank test for Accuracy: w-statistic = 12.000, p-value = 0.131

Comparing Precision:
Shapiro-Wilk test for normality of differences in Precision: statistic = 0.912, p-value = 0.293
Differences in Precision are normally distributed. Using paired t-test.
Paired t-test for Precision: t-statistic = 1.622, p-value = 0.139

Comparing Recall:
Shapiro-Wilk test for normality of differences in Recall: statistic = 0.912, p-value = 0.293
Differences in Recall are normally distributed. Using paired t-test.
Paired t-test for Recall: t-statistic = -4.641, p-value = 0.001

Comparing F1 Score:
Shapiro-Wilk test for normality of differences in F1 Score: statistic = 0.886, p-value = 0.155
Differences in F1 Score are normally distributed. Using paired t-test.
Paired t-test for F1 Score: t-s

**Accuracy**:
- Shapiro-Wilk test: Differences are not normally distributed (statistic = 0.772, p-value = 0.007).
- Wilcoxon signed-rank test: The difference in Accuracy is not statistically significant (w-statistic = 12.000, p-value = 0.131).
- Mean Accuracy: KNNClassifier = 0.9156, KNNVariant without Weights = 0.9109
- Conclusion: There is no significant difference in Accuracy between KNNClassifier and KNNVariant without Weights.

**Precision**:
- Shapiro-Wilk test: Differences are normally distributed (statistic = 0.912, p-value = 0.293).
- Paired t-test: The difference in Precision is not statistically significant (t-statistic = 1.622, p-value = 0.139).
- Mean Precision: KNNClassifier = 0.6289, KNNVariant without Weights = 0.5924
- Conclusion: There is no significant difference in Precision between KNNClassifier and KNNVariant without Weights.

**Recall**:
- Shapiro-Wilk test: Differences are normally distributed (statistic = 0.912, p-value = 0.293).
- Paired t-test: The difference in Recall is statistically significant (t-statistic = -4.641, p-value = 0.001).
- Mean Recall: KNNClassifier = 0.3184, KNNVariant without Weights = 0.3685
- Conclusion: KNNVariant without Weights significantly improves Recall compared to KNNClassifier.

**F1 Score**:
- Shapiro-Wilk test: Differences are normally distributed (statistic = 0.886, p-value = 0.155).
- Paired t-test: The difference in F1 Score is statistically significant (t-statistic = -2.570, p-value = 0.030).
- Mean F1 Score: KNNClassifier = 0.4076, KNNVariant without Weights = 0.4449
- Conclusion: KNNVariant without Weights significantly improves F1 Score compared to KNNClassifier.

In conclusion, the **KNNVariant**, both with and without distance weights, significantly enhances the performance in terms of recall and F1 score for imbalanced binary classification tasks. The distance weights **do not** appear to play a **crucial role** in improving these metrics. Therefore, KNNVariant with dynamic K and without distance weights might be recommended for use in imbalanced scenarios, as it simplifies the model while still achieving substantial improvements in critical performance metrics. 

<a id="7-References"></a>
## 7. References

[BOOKS]

- A General Introduction to Data Analytics, 2018. J. Moreira, A. Carvalho, and T. Horvath - John Wiley & Sons, ISBN: 978-1-119-29626-3

- Extração de Conhecimento de Dados: Data Mining, 2017. J. Gama, A. Carvalho, K. Faceli, A. Lorena, and M. Oliveira - Sílabo, ISBN: 978-972-618-914-5
  
- "k-Nearest Neighbour Classifiers 2nd Edition (with Python examples)". Cunningham, Pádraig, and Sarah Jane Delany. "k-Nearest Neighbour Classifiers 2nd Edition (with Python examples)." arXiv preprint. arXiv:2004.04523 (2020). https://arxiv.org/abs/2004.04523

[WEBSITES AND CODE]

- https://github.com/rushter/MLAlgorithms

- https://github.com/PadraigC/kNNTutorial

- https://chatgpt.com

- https://machinelearningmastery.com/tutorial-to-implement-k-nearest-neighbors-in-python-from-scratch/