# The ScikitLearn.jl library

The Scikit-learn library is an open-source machine learning library developed for the Python programming language, the first version of which dates back to 2010. It implements many machine learning models, related to classification, regression, clustering or dimensionality reduction. These models include Support Vector Machines (SVM), decision trees, random forests, or k-means. It is currently one of the most widely used libraries in the field of machine learning, due to the large number of functionalities it offers as well as its ease of use, since it provides a uniform interface for training and using models. The documentation for this library is available at https://scikit-learn.org/stable/.

For Julia, the ScikitLearn.jl library implements this interface and the algorithms contained in the scikit-learn library, supporting both Julia's own models and those of the scikit-learn library. The latter is done by means of the PyCall.jl library, which allows code written in Python to be executed from Julia in a transparent way for the user, who only needs to have ScikitLearn.jl installed. Documentation for this library can be found at https://scikitlearnjl.readthedocs.io/en/latest/.

However, recently, some incompatibilities have been reported with some versions of the SSL library. To avoid potential compatibility issues between Julia, PyCall, and ScikitLearn, we will use a different library for this exercise.
The library we will use is MLJ (Machine Learning in Julia), which is not strictly a library but rather a framework that allows the use of various related libraries through a common interface.
As a result, the function names used to create and train models remain the same regardless of the specific models being used.
In the practical sessions of this course, in addition to ANNs, we will use the following models, available within the MLJ framework:

- Support Vector Machines (SVM)
- Decision trees
- kNN

In order to use these models, it is first necessary to install and import the library:

In [1]:
    import Pkg;
    Pkg.add("MLJ")
    Pkg.add([
      "LIBSVM", "MLJLIBSVMInterface",
      "DecisionTree", "MLJDecisionTreeInterface",
      "NearestNeighborModels", "CategoricalArrays"
    ])
    using MLJ;
    using LIBSVM, DecisionTree, Random, MLJLIBSVMInterface, MLJDecisionTreeInterface, NearestNeighborModels, DelimitedFiles, CategoricalArrays

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `/opt/julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `/opt/julia/environments/v1.9/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `/opt/julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `/opt/julia/environments/v1.9/Manifest.toml`


Similarly, it is necessary to install the packages that contain the specific learning algorithms (e.g., LIBSVM, NearestNeighborModels, DecisionTree) as well as the packages that provide the interfaces between these algorithms and the MLJ framework (MLJLIBSVMInterface, MLJDecisionTreeInterface).
To import the models to be used, we can rely on the `MLJ.@load` macro. For example, the following lines import the three models mentioned above, which will be used in this course:

In [2]:
SVMClassifier = MLJ.@load SVC pkg=LIBSVM verbosity=0
kNNClassifier = MLJ.@load KNNClassifier pkg=NearestNeighborModels verbosity=0
DTClassifier = MLJ.@load DecisionTreeClassifier pkg=DecisionTree verbosity=0

Pkg.status()


[32m[1mStatus[22m[39m `/opt/julia/environments/v1.9/Project.toml`
  [90m[336ed68f] [39mCSV v0.10.15
[33m⌅[39m [90m[324d7699] [39mCategoricalArrays v0.10.8
[33m⌅[39m [90m[a93c6f00] [39mDataFrames v1.7.1
  [90m[7806a523] [39mDecisionTree v0.12.4
  [90m[8bb1440f] [39mDelimitedFiles v1.9.1
[32m⌃[39m [90m[5789e2e9] [39mFileIO v1.17.0
[33m⌅[39m [90m[587475ba] [39mFlux v0.14.21
[33m⌅[39m [90m[f67ccb44] [39mHDF5 v0.16.16
[33m⌅[39m [90m[7073ff75] [39mIJulia v1.26.0
  [90m[916415d5] [39mImages v0.26.2
  [90m[033835bb] [39mJLD2 v0.6.2
  [90m[b1bec4e5] [39mLIBSVM v0.8.1
  [90m[23992714] [39mMAT v0.10.7
[32m⌃[39m [90m[add582a8] [39mMLJ v0.20.0
  [90m[c6f25543] [39mMLJDecisionTreeInterface v0.4.2
  [90m[61c7150f] [39mMLJLIBSVMInterface v0.2.1
  [90m[6ee0df7b] [39mMLJLinearModels v0.10.1
  [90m[6f286f6a] [39mMultivariateStats v0.10.3
  [90m[9bbee03b] [39mNaiveBayes v0.5.6
  [90m[636a865e] [39mNearestNeighborModels v0.2.3
[33m⌅[39m [90m[91a5

As can be seen, each model is loaded from a different package. The `verbosity=0` option is simply used to suppress the output message that would otherwise be printed during the import.
This way, we define three functions to create each one of the three models. Each function receives as arguments the specific hyperparameters for the corresponding model.
Below are three examples, one for each type of model that will be used in these course exercises:

In [3]:
model = SVMClassifier(kernel=LIBSVM.Kernel.RadialBasis, cost=1.0, gamma=2.0, degree=Int32(3))

SVC(
  kernel = LIBSVM.Kernel.RadialBasis, 
  gamma = 2.0, 
  cost = 1.0, 
  cachesize = 200.0, 
  degree = 3, 
  coef0 = 0.0, 
  tolerance = 0.001, 
  shrinking = true)

In [4]:
model = DTClassifier(max_depth=4, rng=Random.MersenneTwister(1))

DecisionTreeClassifier(
  max_depth = 4, 
  min_samples_leaf = 1, 
  min_samples_split = 2, 
  min_purity_increase = 0.0, 
  n_subfeatures = 0, 
  post_prune = false, 
  merge_purity_threshold = 1.0, 
  display_depth = 5, 
  feature_importance = :impurity, 
  rng = MersenneTwister(1))

In [13]:
model = kNNClassifier(K=3)

KNNClassifier(
  K = 3, 
  algorithm = :kdtree, 
  metric = Euclidean(0.0), 
  leafsize = 10, 
  reorder = true, 
  weights = Uniform())

When creating a kNN model, the main hyperparameter is `K`, which defines the number of neighbors.
For decision trees, the main hyperparameter is `max_depth`, which sets the maximum depth of the tree.
In the case of decision trees, as shown earlier, there is also a parameter called `rng`. This parameter controls the randomness involved in a specific part of the model construction process.

Specifically, for decision trees, this randomness occurs during the selection of features used to split a node. The `DecisionTree` library uses a random number generator (RNG) for this step, which is updated with each call. As a result, different calls to the function (along with subsequent calls to `fit!`) may produce different models, even with the same data.

To control this randomness and make the process deterministic, it is advisable to provide a fixed integer value as the RNG seed, as shown in the previous example.
This ensures that creating a model with a given input-output dataset and a defined set of hyperparameters becomes a reproducible process.

In general, it is preferable to control randomness across the entire model development workflow (e.g., cross-validation, training/test splits) by setting a global random seed at the beginning.
However, for the purposes of these exercises, we will use the `rng` keyword specifically for the decision tree model.

SVMs have a more complex set of hyperparameters, which depend on the kernel function being used.  
First, the hyperparameter `C` controls the trade-off between the margin width and classification error. Lower values allow for more misclassifications (more tolerance), while higher values fit the model more tightly to the data.  
In MLJ, this parameter is passed using the keyword `cost` when calling `SVMClassifier`.

Additionally, it is necessary to specify which kernel to use. This is done using the keyword `kernel`, which can take one of the following values provided by the `LIBSVM` library:

- `LIBSVM.Kernel.Linear`
- `LIBSVM.Kernel.RadialBasis`
- `LIBSVM.Kernel.Sigmoid`
- `LIBSVM.Kernel.Polynomial`

Depending on the kernel selected, different additional hyperparameters are used:

- **Linear kernel**: Only requires `C` (via `cost`).
- **RBF (Radial Basis Function) kernel**: In addition to `C`, it uses `gamma`, which controls the influence of each support vector.
- **Sigmoid kernel**: Uses `C`, `gamma`, and `coef0`. This kernel behaves similarly to a neural network, where `gamma` and `coef0` influence the shape of the decision function.
- **Polynomial kernel**: Uses `C`, `degree` (the degree of the polynomial), `gamma`, and `coef0`.

Typical values for these hyperparameters include:  
`0.001`, `0.1`, `1`, `10`, `100`, `1000`.

The following table summarises the different hyperparameters, the kernels that use them, and typical values they may take.

Note that when calling the `SVMClassifier` function, the hyperparameters use the same names as listed here, except for `C`, which must be passed as `cost`, as shown in the previous example.

It is also important that the arguments passed to the function have the correct type, as required by the `LIBSVM` library. Otherwise, an error may occur.

To prevent this, it is recommended to explicitly cast each hyperparameter to the appropriate type when calling the function.

| Hyperparameter | Applicable Kernels                 | Typical Values                     | Required Type in LIBSVM |
|----------------|------------------------------------|------------------------------------|--------------------------|
| `cost` (`C`)   | Linear, RBF, Sigmoid, Polynomial   | 0.001, 0.1, 1, 10, 100, 1000       | `Float64`               |
| `gamma`        | RBF, Sigmoid, Polynomial           | 0.1, 0.01, 0.001, 0.0001           | `Float64`               |
| `coef0`        | Sigmoid, Polynomial                | 0, 1, 5, 10                        | `Int32`                 |
| `degree`       | Polynomial                         | 2, 3, 4, 5                         | `Float64`               |

Although the basic SVM model is inherently binary, the implementation provided in MLJ already supports multi-class classification.  
Therefore, it is not necessary to manually apply a one-vs-all strategy for multi-class problems.

Once a model has been created, it must be wrapped in a `machine` object. This object acts as a container that associates the model with the data and handles both training and prediction.  
It is a core concept in MLJ and simplifies model workflows by centralizing model fitting (`fit!`) and prediction (`predict`) logic.

A `machine` has three main components:

- **Model**: Specifies the algorithm to be used. It has been created earlier, without any data or learned state.
- **Data**: Provides the input features and target labels (if supervised).
- **Internal State**: Stores learned parameters after the model is trained.

To create a `machine`, you can use the `machine` function, passing in the model, the input features, and the target labels.  
Note that the input data must not be plain arrays. Instead, it should be converted to a supported table format such as `Tables.table`, `DataFrame`, or a `NamedTuple`.  
If your data is currently stored as arrays, as is the case in these exercises, the following line shows how to construct the machine:

In [5]:
#Load the data from previous notebooks
if !isdefined(Main, :UtilsML1)
    include("utils_ML1.jl")
end
using .UtilsML1
using Flux  # For activation functions like σ


In [None]:
# define the model
model = SVMClassifier(kernel=LIBSVM.Kernel.RadialBasis, cost=1.0, gamma=2.0, degree=Int32(3))
# create the machine object
mach = machine(model, MLJ.table(trainingInputs), categorical(trainingTargets))


As shown, the input matrix is converted into a table, and the target vector is converted into a categorical array, since this exercise involves classification problems.  

It is important to note that the variable `targets` (the output labels) should be a **vector**, not a matrix. Each element in the vector corresponds to the label of one input sample and can be of any type (e.g., integer, string, etc.).

Although some models may support one-hot encoded labels, others do not. Therefore, in these exercises, we will use a vector of labels, with one label per instance, rather than one-hot encoding (which is typically used in neural networks).

To prevent compatibility issues with certain model implementations, we convert all label values to `String` before passing them to the model.

Once the machine object has been created, the model can be trained using the `fit!` function as follows:

In [None]:
MLJ.fit!(mach, verbosity=0)

This function only requires the `machine` object as an argument, since the training data has already been bound to it.
The optional argument `verbosity=0` is used to suppress output messages during training.

### Question 6.1

> ❓ What does the fact that the name of this function ends in bang (!) indicate?

**Answer**
It indicates that this function not only returns some result, but changes its arguments. It called "mutating function". In this case the _fit_ function changes mach object passed as an argumen.

Contrary to the Flux library, where it was necessary to write the ANN training loop, in this library the loop is already implemented, and it is called automatically when the `fit!` function is executed. Therefore, it is not necessary to write the code for the training loop.

An important aspect to consider is the layout of the data to be used.  

As shown in previous exercises, when training an Artificial Neural Network (ANN), the input samples (patterns) are arranged in **columns**, and each **row** in the input matrix represents a feature.

However, outside the scope of ANNs — and therefore for all other techniques used in this course — it is assumed that the samples are arranged in **rows**, meaning each **column** in the input matrix corresponds to a feature. This format is generally more intuitive and will be used throughout the rest of the course.


### Question 6.2

> ❓ As in the case of ANNs, a loop is necessary for training several models. Where in the code (inside or outside the loop) will you need to create the model? Which models will need to be trained several times and which ones only once? Why?

**Answer**  
The model must be created inside the cross-validation loop, is necessary because each fold uses different training data, so we need untrained model for each fold to ensure fair evaluation.  
SVM, DDecision Tree, 
k - once per fold, because they are deterministic, they produce the same result when trained on the same data with the same hyperparameters  
ANNs - multiple times N times because training multiple times and averaging the results gives more reliable performance estimates and reduces the impact of unlucky initializationsNN

### Question 6.3

> ❓ Which condition must the matrix of inputs and the vector of desired outputs passed as an argument to this function fulfil?

**Answer**  
matrix of inputs: The features should be stored in a 2D array where each row is a sample and each column is a feature. Before training, this array must be converted to a table using MLJ.table(inputs) so that MLJ can read it correctly.  
vector of desired outputs: The labels should be a 1D vector with the class names (not one-hot encoded). Then, convert them to categorical values using categorical(targets).

Finally, once the model has been trained, it can be used to make predictions. This is done using the `predict` function. The following is an example of how to use it:

In [None]:
testOutputs = MLJ.predict(model, MLJ.table(testInputs));

As shown, the `predict` function requires two arguments: the `machine` object and the input matrix, which must be converted to a table format.

In classification problems, the type of the prediction result depends on the model and the underlying library:

- **For SVMs**, `predict` returns a `CategoricalArray`, which can be directly compared with the ground truth labels. No post-processing is needed.
- **For Decision Trees and kNN**, `predict` returns a `UnivariateFiniteArray`, which represents a probability distribution over the possible classes.  
  To convert this into a single predicted label (so it can be compared with the true values), you can use the `mode` function to extract the most likely class.

The model being used is stored in memory as a structured object with several fields, and it can be very useful to inspect its contents.  
The `machine` object holds the model, the data, and the results of training. Therefore, you can access the trained model through the `machine`, or more directly through the variable `model`.

For example, when training an SVM, you can access one of its hyperparameters in either of the following ways:

In [None]:
model.gamma
mach.model.gamma

To inspect the learned parameters after training, MLJ provides several options.

One particularly interesting case is with SVMs, where it is useful to check which instances were selected as support vectors.  
This can be done in two ways:

In [None]:
mach.fitresult[1].SVs.indices

or using the higher-level MLJ interface:

In [None]:
fitted_params(mach)[:libsvm_model].SVs.indices

These commands return the indices of the support vectors in the training dataset.

In this notebook, the task will be to develop a single function that allows training the three different models using the MLJ library, and, in addition, artificial neural networks (ANNs) using the functions developed in previous exercises.

The training will be performed using cross-validation. For each fold, the specified model will be trained, and metrics will be computed on the test set.

As in the previous exercise, it is useful to generate a confusion matrix that reflects the distribution of instances across the test sets. In this case, it is simpler than before because the methods used are deterministic, so only one confusion matrix will be created per fold, and the final confusion matrix will be the sum of all of them.

Nevertheless, the considerations from the previous exercise still apply — in particular, that the metrics derived from this global confusion matrix may not match the metrics obtained through cross-validation.

In this exercise, you will develop a single function called `modelCrossValidation` that, in addition to training artificial neural networks (ANNs), performs cross-validation for SVMs, decision trees, and kNN.

The function should receive the following arguments:

- **`modelType::Symbol`**: This parameter indicates the type of model to train. It should take one of the following values:
  - `:ANN` — Artificial Neural Network
  - `:SVC` — Support Vector Machine
  - `:DecisionTreeClassifier` — Decision Trees
  - `:KNeighborsClassifier` — k-Nearest Neighbors

- **`modelHyperparameters::Dict`**: A dictionary containing the model's hyperparameters. Keys may be of type `String` or `Symbol`.
  
  To check whether a hyperparameter is defined, you can use `haskey`.  
  To retrieve a value that may or may not exist in the dictionary, the `get` function is also useful.

  - **ANN (`:ANN`)**:
        The expected Hyperparameters are
        - Topology (number of hidden layers and number of neurons in each hidden layer, required) and transfer funtion in each layer. In "shallow" networks such as those used in this course, the transfer function has less impact, so a standard one, shuch as `tansig` or `logsig`, can be used.
        - Learning rate
        - Ratio of patterns used for validation
        - Number of consecutive iterations without improving the validation loss to stop the process
        - Number of times each ANN is trained.
        
### Question 6.4    
> ❓ Why should a linear transfer function not be used for neurons in the hidden layers?

**Answer**

As we now from a teory class, multiple layers using linear activation functions can always be reduced to a single layer, and this is mathematically proven. Using linear transfer functions in hidden layers would make the neural network equivalent to a simple linear model, eliminating all the advantages of deep learning. Non-linear activation functions are what let neural networks learn complex patterns in data

  For the other models, the expected hyperparameters are:

  - **SVM (`:SVC`)**:  
    The expected hyperparameters are:
    - `C`
    - `kernel`
    - `degree`
    - `gamma`
    - `coef0`

    The `kernel` parameter should be provided as a `String` with one of the following values:
  `"linear"`, `"rbf"`, `"sigmoid"`, or `"poly"`.

    Depending on the selected kernel, some of the hyperparameters may be ignored. For example:
    - The `"poly"` kernel uses `degree`, `gamma`, and `coef0`.
    - The `"sigmoid"` kernel uses `gamma` and `coef0`.
    - The `"linear"` kernel only uses `C`.

    The `C` hyperparameter must be passed using the keyword `cost`, and the kernel must be translated to one of the predefined constants in the `LIBSVM` library:

    - `LIBSVM.Kernel.Linear`
    - `LIBSVM.Kernel.RadialBasis`
    - `LIBSVM.Kernel.Sigmoid`
    - `LIBSVM.Kernel.Polynomial`

    To avoid type errors, it is recommended to cast each value explicitly.  
    For example, to create a polynomial SVM:

  ```julia
    model = SVMClassifier(
        kernel = LIBSVM.Kernel.Polynomial,
        cost = Float64(C),
        gamma = Float64(gamma),
        degree = Int32(degree),
        coef0 = Float64(coef0)
    )
  ```

  - **Decision Tree (`:DecisionTreeClassifier`)**:

    - `max_depth`: defines the maximum depth of the tree.
    - `rng`: the random seed generator. It should be set to `Random.MersenneTwister(1)` to ensure reproducibility.

  - **k-Nearest Neighbors (`:KNeighborsClassifier`)**:
    - `n_neighbors`: the value of k, which determines the number of neighbors to consider.

- **`dataset::Tuple{AbstractArray{<:Real,2}, AbstractArray{<:Any,1}}`**:  
  A tuple containing two elements:
  - The first is the input matrix (`X`). Unlike neural network training, there is no need to convert the data to `Float32`, since both `Float32` and `Float64` are commonly used in this library depending on the desired precision.
  - The second is the target vector (`y`), which contains the labels.

- **`crossValidationIndices::Array{Int64,1}`**:  
  This vector contains the indices used to assign each sample to a fold in the cross-validation process.

  As in the previous exercise, the fold assignment must be done **outside** the `modelCrossValidation` function.  
  This ensures that the exact same data partitioning is used when training different models, allowing fair comparisons.


The function will begin by checking whether the model to be trained is a neural network, by examining the `modelType` parameter.  
If this is the case, it will call the `ANNCrossValidation` function, passing the hyperparameters provided in `modelHyperparameters`.

Keep in mind that many of the hyperparameters for neural networks may not be defined in the dictionary.  
As mentioned earlier, the function `haskey` can be used to check whether a key is present in a `Dict`.  
Alternatively, the `get` function can be used to safely retrieve a value with a default if the key is missing.

Once the call to `ANNCrossValidation` is made, the function returns its result and exits — meaning that no further processing will occur in this case.

If a different type of model is to be trained, the logic continues similarly to the previous exercise:

- Create seven vectors to store the results of the metrics for each fold.
- Create a 2D array to accumulate the confusion matrix, initialized with zeros.

A key modification when using models from the MLJ library is to **convert the target labels to strings** before training any model.  
This helps prevent errors caused by internal type mismatches in some model implementations.

This can be done with the following simple line:

```julia
targets = string.(targets);
```

Additionally, it will be necessary to compute the vector of unique classes, just like in the previous exercise.  
This can be done with:

```julia
classes = unique(targets);
```

Once these initial steps are completed, the cross-validation loop can begin.

In each iteration, the following steps are performed:

1. Extract the training and test input matrices and the corresponding target vectors.  
   These should be of type `AbstractArray{<:Any,1}` for the targets.

2. Create the model with the specified hyperparameters.

3. For MLJ models (SVM, Decision Tree, kNN):
   - Instantiate the model using the appropriate constructor: `SVMClassifier`, `DTClassifier`, or `kNNClassifier`, depending on `modelType`.
   - Wrap the model in a `machine` with the training data.
   - Train the model using `fit!`.

4. Perform predictions on the test data using `predict`.

   - For Decision Trees and kNN, use `mode` to convert the probabilistic predictions into categorical labels:
     ```julia
     ŷ = mode.(predict(mach, MLJ.table(Xtest)))
     ```

   - For SVMs, the output of `predict` can be compared directly with the ground truth, since it returns a `CategoricalArray`.

Although the general structure of the code will be the same for the three model types, each model requires a different constructor and may require post-processing (e.g., `mode`) depending on the prediction format.

Once the predicted labels for the test set are available, the evaluation metrics and the confusion matrix should be computed using the `confusionMatrix` function.

- The metrics returned should be stored in their respective positions within the metric vectors.
- The confusion matrix obtained for each fold should be **added** to a global confusion matrix for the test set.

A key difference compared to the ANN training in the previous exercise is that these models (SVM, Decision Tree, kNN) are **deterministic**.  
Therefore, each model only needs to be trained **once per fold**, without requiring multiple executions or averaging across runs.

   ### Question 6.5
   > ❓ The other models do not have the number of times to train them as a parameter. Why? If you train several times, Which statistical properties will the results of these trainings have?

**Answer**  
Because other models are deterministic and they give the very same result having same given the same input, so retraining is pointless

As previously described, when using techniques such as SVM, decision trees, kNN, **one-hot encoding is not used**.  
Instead, metrics are computed using the `confusionMatrix` function developed in a previous exercise, which takes three arguments:
- The predicted labels
- The true labels
- The list of class labels

All of these must be of type `AbstractArray{<:Any,1}`.

It is important to use the version of the `confusionMatrix` function that receives the vector of classes.

### Question 6.6

> ❓ What could happen if the version that does not receive the class vector is used?

**Answer**
When performing cross-validation we divide the data is divided into N folds. In a multi-class problem there is a chance that one of the test folds will miss one or more classes. Then when the code tries to accumulate the global confusion matrix it will be dimension mismatch.  
Using the two-argument version (without the list of class labels) is dangerous because it determines the size and order of classes only based on the current fold, leading to dimensionality mismatch errors if a class is missing. Explicitly passing the full list of classes ensures that all confusion matrices have the same structure and allows them to be correctly summed into a global cross-validation matrix.

The `modelCrossValidation` function must return the same structure as in the previous exercise: a tuple with 8 elements.

- The first **7 elements** correspond to metrics: **accuracy**, **error rate**, **recall**, **specificity**, **PPV**, **NPV**, and **F1-score**.  
  Each of these is itself a tuple with the **mean** and **standard deviation** across the folds.

- The **8th element** is the **global confusion matrix** computed on the test sets.

Once the function has been developed, it can be used to evaluate different model configurations by comparing test results across the selected metrics.  
This process does **not return a final model ready for production**, but rather identifies the best-performing model type and hyperparameter configuration.

After selecting the best configuration, the final model should be trained **from scratch** using **all available data**, without performing cross-validation.  
This training is done just once, without setting aside a test set.  
As a result, the final model is expected to perform slightly better than during cross-validation, since it benefits from more training data.

This final model is the one intended for production use, and a confusion matrix can be computed for it as well.

### Question 6.7

> ❓ In the case of using decision trees or kNN, a corresponding function is not necessary to perform the "one-against-all" strategy, why?

**Answer**

Because decision trees and kNN can directly handle multiple classes without converting the problem into binary form.  
- In decision trees, each split is chosen to maximize class separation across all existing classes, not just two. The tree grows branches that directly lead to leaves labeled with any of the possible classes. Thus, it inherently learns to distinguish among all L classes simultaneously.
- In kNN, classification is based on a majority vote among the k nearest samples, which can belong to any class. The algorithm simply counts the class frequencies and assigns the label of the most frequent one.

In [6]:
function modelCrossValidation(
        modelType::Symbol, modelHyperparameters::Dict,
        dataset::Tuple{AbstractArray{<:Real,2}, AbstractArray{<:Any,1}},
        crossValidationIndices::Array{Int64,1})
    
    # Helper function to get parameter value with flexible key types (String or Symbol)
    function getParam(dict, key_str, key_sym, default)
        if haskey(dict, key_str)
            return dict[key_str]
        elseif haskey(dict, key_sym)
            return dict[key_sym]
        else
            return default
        end
    end
    
    # Normalize model type symbols - accept both variants
    normalizedModelType = modelType
    if modelType == :SVC
        normalizedModelType = :SVMClassifier
    elseif modelType == :kNN
        normalizedModelType = :KNeighborsClassifier
    end
    
    # Handle ANN case - delegate to ANNCrossValidation
    if normalizedModelType == :ANN
        # Extract ANN hyperparameters
        topology = getParam(modelHyperparameters, "topology", :topology, nothing)
        @assert topology !== nothing "Topology is required for ANN"
        
        learningRate = getParam(modelHyperparameters, "learningRate", :learningRate, 0.01)
        validationRatio = getParam(modelHyperparameters, "validationRatio", :validationRatio, 0.0)
        numExecutions = getParam(modelHyperparameters, "numExecutions", :numExecutions, 50)
        maxEpochs = getParam(modelHyperparameters, "maxEpochs", :maxEpochs, 1000)
        maxEpochsVal = getParam(modelHyperparameters, "maxEpochsVal", :maxEpochsVal, 20)
        
        # Get transfer functions if provided
        transferFunctions = getParam(modelHyperparameters, "transferFunctions", :transferFunctions, fill(σ, length(topology)))
        
        # Call ANNCrossValidation
        return ANNCrossValidation(
            topology,
            dataset,
            crossValidationIndices;
            numExecutions = numExecutions,
            transferFunctions = transferFunctions,
            maxEpochs = maxEpochs,
            minLoss = 0.0,
            learningRate = learningRate,
            validationRatio = validationRatio,
            maxEpochsVal = maxEpochsVal
        )
    end
    
    # Handle MLJ models (SVM, Decision Tree, kNN)
    inputs, targets = dataset
    
    # Convert targets to strings to prevent type issues
    targets = string.(targets)
    
    # Get unique classes
    classes = unique(targets)
    
    # Initialize metric vectors for each fold
    accuracy_folds = Float64[]
    error_folds = Float64[]
    sensitivity_folds = Float64[]
    specificity_folds = Float64[]
    ppv_folds = Float64[]
    npv_folds = Float64[]
    f1_folds = Float64[]
    
    # Initialize global confusion matrix
    global_confusion = zeros(Int64, length(classes), length(classes))
    
    # Get number of folds
    numFolds = maximum(crossValidationIndices)
    
    # Cross-validation loop
    for fold in 1:numFolds
        # Split data into train and test for this fold
        test_mask = crossValidationIndices .== fold
        train_mask = .!test_mask
        
        train_inputs = inputs[train_mask, :]
        train_targets = targets[train_mask]
        test_inputs = inputs[test_mask, :]
        test_targets = targets[test_mask]
        
        # Create model based on model type
        model = nothing
        
        if normalizedModelType == :SVMClassifier
            # Extract SVM hyperparameters
            C = getParam(modelHyperparameters, "C", :C, 1.0)
            kernel_str = getParam(modelHyperparameters, "kernel", :kernel, "rbf")
            gamma = getParam(modelHyperparameters, "gamma", :gamma, 0.1)
            degree = getParam(modelHyperparameters, "degree", :degree, 3)
            coef0 = getParam(modelHyperparameters, "coef0", :coef0, 0.0)
            
            # Map kernel string to LIBSVM kernel type
            kernel = if kernel_str == "linear"
                LIBSVM.Kernel.Linear
            elseif kernel_str == "rbf"
                LIBSVM.Kernel.RadialBasis
            elseif kernel_str == "sigmoid"
                LIBSVM.Kernel.Sigmoid
            elseif kernel_str == "poly" || kernel_str == "polynomial"
                LIBSVM.Kernel.Polynomial
            else
                LIBSVM.Kernel.RadialBasis  # default
            end
            
            # Create SVM model with kernel-specific parameters
        model = if kernel_str == "linear"
            # Linear kernel: only C
            SVMClassifier(
                kernel = LIBSVM.Kernel.Linear,
                cost = Float64(C)
            )
        elseif kernel_str == "rbf"
            # RBF kernel: C and gamma
            SVMClassifier(
                kernel = LIBSVM.Kernel.RadialBasis,
                cost = Float64(C),
                gamma = Float64(gamma)
            )
        elseif kernel_str == "sigmoid"
            # Sigmoid kernel: C, gamma, and coef0
            SVMClassifier(
                kernel = LIBSVM.Kernel.Sigmoid,
                cost = Float64(C),
                gamma = Float64(gamma),
                coef0 = Float64(coef0)
            )
        elseif kernel_str == "poly" || kernel_str == "polynomial"
            # Polynomial kernel: C, degree, gamma, and coef0
            SVMClassifier(
                kernel = LIBSVM.Kernel.Polynomial,
                cost = Float64(C),
                gamma = Float64(gamma),
                degree = Int32(degree),
                coef0 = Float64(coef0)
            )
        else
            # Default to RBF
            SVMClassifier(
                kernel = LIBSVM.Kernel.RadialBasis,
                cost = Float64(C),
                gamma = Float64(gamma)
            )
        end
            
        elseif normalizedModelType == :DecisionTreeClassifier
            # Extract Decision Tree hyperparameters
            max_depth = getParam(modelHyperparameters, "max_depth", :max_depth, -1)
            
            # Create Decision Tree model with fixed random seed for reproducibility
            model = DTClassifier(
                max_depth = max_depth,
                rng = Random.MersenneTwister(1)
            )
            
        elseif normalizedModelType == :KNeighborsClassifier
            # Extract kNN hyperparameters
            K = getParam(modelHyperparameters, "K", :K, 3)
            n_neighbors = getParam(modelHyperparameters, "n_neighbors", :n_neighbors, K)
            
            # Create kNN model
            model = kNNClassifier(K = n_neighbors)
            
        else
            error("Unknown model type: $normalizedModelType")
        end
        
        # Create machine object with training data
        mach = machine(model, MLJ.table(train_inputs), categorical(train_targets))
        
        # Train the model
        MLJ.fit!(mach, verbosity=0)
        
        # Make predictions on test set
        if normalizedModelType == :SVMClassifier
            # SVM returns CategoricalArray directly
            predictions = MLJ.predict(mach, MLJ.table(test_inputs))
            predictions = string.(predictions)  # Convert to strings for consistency
        else
            # Decision Tree and kNN return UnivariateFiniteArray
            # Need to use mode to get the most likely class
            predictions = mode.(MLJ.predict(mach, MLJ.table(test_inputs)))
            predictions = string.(predictions)  # Convert to strings for consistency
        end
        
        # Compute confusion matrix and metrics for this fold
        metrics = confusionMatrix(predictions, test_targets, classes)
        acc, err, sens, spec, ppv, npv, f1, confusion = metrics
        
        # Store metrics
        push!(accuracy_folds, acc)
        push!(error_folds, err)
        push!(sensitivity_folds, sens)
        push!(specificity_folds, spec)
        push!(ppv_folds, ppv)
        push!(npv_folds, npv)
        push!(f1_folds, f1)
        
        # Accumulate confusion matrix
        global_confusion .+= confusion
    end
    
    # Compute mean and standard deviation for each metric
    stats(vector) = (mean(vector), std(vector; corrected=false))
    
    # Return results in the same format as ANNCrossValidation
    return (stats(accuracy_folds),
            stats(error_folds),
            stats(sensitivity_folds),
            stats(specificity_folds),
            stats(ppv_folds),
            stats(npv_folds),
            stats(f1_folds),
            global_confusion)
end

modelCrossValidation (generic function with 1 method)

### Learn Julia

#### Symbols and Dictionaries in Julia
One Julia type that is important for this exercise is the `Symbol` type. An object of this type can be any symbol you want, simply by typing its name after a colon (":"). In this practice, you can use it to indicate which model you want to train, for example, in the `modelCrossValidation` function, symbols will be used to indicate which model to train:

```julia
:KNeighborsClassifier, :SVC, :DecisionTreeClassifier, :ANN
```

#### Passing Model-Specific Parameters
This function will also require model-specific parameters to be passed.
The recommended way to do this is to define a variable of type Dict, which works similarly to Python dictionaries.

For instance, to define the hyperparameters for an artificial neural network (ANN):

```julia
  modelHyperparameters = Dict(
      "topology" => [5, 3],
      "learningRate" => 0.01,
      "validationRatio" => 0.2,
      "numExecutions" => 50,
      "maxEpochs" => 1000,
      "maxEpochsVal" => 6
  )
```

Another way to define the same dictionary:

```julia
  modelHyperparameters = Dict()
  modelHyperparameters["topology"] = topology
  modelHyperparameters["learningRate"] = learningRate
  modelHyperparameters["validationRatio"] = validationRatio
  modelHyperparameters["numExecutions"] = numRepetitionsANNTraining
  modelHyperparameters["maxEpochs"] = numMaxEpochs
  modelHyperparameters["maxEpochsVal"] = maxEpochsVal
```
To access a value, simply use:
```julia
  modelHyperparameters["topology"]
```
#### Example for SVM Parameters
You can also define hyperparameters for other models similarly.
For example, for an SVM:
```julia
  modelHyperparameters = Dict("C" => 1, "kernel" => "rbf", "gamma" => 2)
```
Or using the alternative form:

```julia
  modelHyperparameters = Dict()
  modelHyperparameters["C"] = 1
  modelHyperparameters["kernel"] = "rbf"
  modelHyperparameters["gamma"] = 2
```
Other kernels may require different parameters, such as `degree` and `coef0`.

When building the SVM model inside the function, you might write:
```julia
  if modelHyperparameters["kernel"] == "rbf"
    model = SVMClassifier(
        kernel = LIBSVM.Kernel.RadialBasis,
        cost = Float64(modelHyperparameters["C"]),
        gamma = Float64(modelHyperparameters["gamma"])
    )
```

You can apply a similar strategy for decision trees, kNN, and DoME models.

In the examples above, the dictionary keys are `String`, but you may also use `Symbol` keys interchangeably.
For example:
```julia
  modelHyperparameters = Dict(:C => 1, :kernel => "rbf", :gamma => 2)
```

Another type of Julia that may be interesting for this assignment is the `Symbol` type. An object of this type can be any symbol you want, simply by typing its name after a colon (":"). In this practice, you can use it to indicate which model you want to train, for example `:ANN`, `:SVM`, `:DecisionTree` or `:kNN`.

**Test of modelCrossValidation function**

In [None]:
### Test the modelCrossValidation function with the iris dataset

# Load the iris dataset
using DelimitedFiles

# Read the iris data
data = readdlm("iris.data", ',')

# Extract inputs (first 4 columns) and targets (last column)
inputs = Float64.(data[:, 1:4])
targets = data[:, 5]

# Normalize the inputs
inputs = normalizeMinMax(inputs)

# Create cross-validation indices (5-fold cross-validation)
using Random
Random.seed!(42)  # For reproducibility
numPatterns = size(inputs, 1)
numFolds = 5
crossValidationIndices = repeat(1:numFolds, outer=ceil(Int, numPatterns/numFolds))[1:numPatterns]
crossValidationIndices = crossValidationIndices[randperm(numPatterns)]

println("Dataset loaded: $(size(inputs, 1)) samples, $(size(inputs, 2)) features")
println("Classes: $(unique(targets))")
println("Cross-validation folds: $numFolds")
println()


# Test 1: SVM with Linear kernel
println("=" ^ 60)
println("Test 1: SVM with Linear kernel")
println("=" ^ 60)
svm_linear_params = Dict(
    "C" => 1.0,
    "kernel" => "linear"
)

results_svm_linear = modelCrossValidation(
    :SVC,
    svm_linear_params,
    (inputs, targets),
    crossValidationIndices
)

acc, err, sens, spec, ppv, npv, f1, conf = results_svm_linear
println("Accuracy: $(round(acc[1], digits=4)) ± $(round(acc[2], digits=4))")
println("Error Rate: $(round(err[1], digits=4)) ± $(round(err[2], digits=4))")
println("Sensitivity: $(round(sens[1], digits=4)) ± $(round(sens[2], digits=4))")
println("Specificity: $(round(spec[1], digits=4)) ± $(round(spec[2], digits=4))")
println("PPV: $(round(ppv[1], digits=4)) ± $(round(ppv[2], digits=4))")
println("NPV: $(round(npv[1], digits=4)) ± $(round(npv[2], digits=4))")
println("F1-Score: $(round(f1[1], digits=4)) ± $(round(f1[2], digits=4))")
println("Confusion Matrix:")
println(conf)
println()

# Test 2: SVM with RBF kernel
println("=" ^ 60)
println("Test 2: SVM with RBF kernel")
println("=" ^ 60)
svm_rbf_params = Dict(
    "C" => 1.0,
    "kernel" => "rbf",
    "gamma" => 0.1
)

results_svm_rbf = modelCrossValidation(
    :SVC,
    svm_rbf_params,
    (inputs, targets),
    crossValidationIndices
)

acc, err, sens, spec, ppv, npv, f1, conf = results_svm_rbf
println("Accuracy: $(round(acc[1], digits=4)) ± $(round(acc[2], digits=4))")
println("Error Rate: $(round(err[1], digits=4)) ± $(round(err[2], digits=4))")
println("Sensitivity: $(round(sens[1], digits=4)) ± $(round(sens[2], digits=4))")
println("Specificity: $(round(spec[1], digits=4)) ± $(round(spec[2], digits=4))")
println("PPV: $(round(ppv[1], digits=4)) ± $(round(ppv[2], digits=4))")
println("NPV: $(round(npv[1], digits=4)) ± $(round(npv[2], digits=4))")
println("F1-Score: $(round(f1[1], digits=4)) ± $(round(f1[2], digits=4))")
println("Confusion Matrix:")
println(conf)
println()

# Test 3: SVM with Sigmoid kernel
println("=" ^ 60)
println("Test 3: SVM with Sigmoid kernel")
println("=" ^ 60)
svm_sigmoid_params = Dict(
    "C" => 1.0,
    "kernel" => "sigmoid",
    "gamma" => 0.1,
    "coef0" => 0.0
)

results_svm_sigmoid = modelCrossValidation(
    :SVC,
    svm_sigmoid_params,
    (inputs, targets),
    crossValidationIndices
)

acc, err, sens, spec, ppv, npv, f1, conf = results_svm_sigmoid
println("Accuracy: $(round(acc[1], digits=4)) ± $(round(acc[2], digits=4))")
println("Error Rate: $(round(err[1], digits=4)) ± $(round(err[2], digits=4))")
println("Sensitivity: $(round(sens[1], digits=4)) ± $(round(sens[2], digits=4))")
println("Specificity: $(round(spec[1], digits=4)) ± $(round(spec[2], digits=4))")
println("PPV: $(round(ppv[1], digits=4)) ± $(round(ppv[2], digits=4))")
println("NPV: $(round(npv[1], digits=4)) ± $(round(npv[2], digits=4))")
println("F1-Score: $(round(f1[1], digits=4)) ± $(round(f1[2], digits=4))")
println("Confusion Matrix:")
println(conf)
println()

# Test 4: SVM with Polynomial kernel
println("=" ^ 60)
println("Test 4: SVM with Polynomial kernel")
println("=" ^ 60)
svm_poly_params = Dict(
    "C" => 1.0,
    "kernel" => "poly",
    "degree" => 3,
    "gamma" => 0.1,
    "coef0" => 1.0
)

results_svm_poly = modelCrossValidation(
    :SVC,
    svm_poly_params,
    (inputs, targets),
    crossValidationIndices
)

acc, err, sens, spec, ppv, npv, f1, conf = results_svm_poly
println("Accuracy: $(round(acc[1], digits=4)) ± $(round(acc[2], digits=4))")
println("Error Rate: $(round(err[1], digits=4)) ± $(round(err[2], digits=4))")
println("Sensitivity: $(round(sens[1], digits=4)) ± $(round(sens[2], digits=4))")
println("Specificity: $(round(spec[1], digits=4)) ± $(round(spec[2], digits=4))")
println("PPV: $(round(ppv[1], digits=4)) ± $(round(ppv[2], digits=4))")
println("NPV: $(round(npv[1], digits=4)) ± $(round(npv[2], digits=4))")
println("F1-Score: $(round(f1[1], digits=4)) ± $(round(f1[2], digits=4))")
println("Confusion Matrix:")
println(conf)
println()

# Test 5: Decision Tree
println("=" ^ 60)
println("Test 5: Decision Tree (max_depth=4)")
println("=" ^ 60)
dt_hyperparameters = Dict(
    "max_depth" => 4
)

results_dt = modelCrossValidation(
    :DecisionTreeClassifier,
    dt_hyperparameters,
    (inputs, targets),
    crossValidationIndices
)

acc, err, sens, spec, ppv, npv, f1, conf = results_dt
println("Accuracy: $(round(acc[1], digits=4)) ± $(round(acc[2], digits=4))")
println("Error Rate: $(round(err[1], digits=4)) ± $(round(err[2], digits=4))")
println("Sensitivity: $(round(sens[1], digits=4)) ± $(round(sens[2], digits=4))")
println("Specificity: $(round(spec[1], digits=4)) ± $(round(spec[2], digits=4))")
println("PPV: $(round(ppv[1], digits=4)) ± $(round(ppv[2], digits=4))")
println("NPV: $(round(npv[1], digits=4)) ± $(round(npv[2], digits=4))")
println("F1-Score: $(round(f1[1], digits=4)) ± $(round(f1[2], digits=4))")
println("Confusion Matrix:")
println(conf)
println()

# Test 6: k-Nearest Neighbors
println("=" ^ 60)
println("Test 6: k-Nearest Neighbors (K=3)")
println("=" ^ 60)
knn_hyperparameters = Dict(
    "K" => 3
)

results_knn = modelCrossValidation(
    :KNeighborsClassifier,
    knn_hyperparameters,
    (inputs, targets),
    crossValidationIndices
)

acc, err, sens, spec, ppv, npv, f1, conf = results_knn
println("Accuracy: $(round(acc[1], digits=4)) ± $(round(acc[2], digits=4))")
println("Error Rate: $(round(err[1], digits=4)) ± $(round(err[2], digits=4))")
println("Sensitivity: $(round(sens[1], digits=4)) ± $(round(sens[2], digits=4))")
println("Specificity: $(round(spec[1], digits=4)) ± $(round(spec[2], digits=4))")
println("PPV: $(round(ppv[1], digits=4)) ± $(round(ppv[2], digits=4))")
println("NPV: $(round(npv[1], digits=4)) ± $(round(npv[2], digits=4))")
println("F1-Score: $(round(f1[1], digits=4)) ± $(round(f1[2], digits=4))")
println("Confusion Matrix:")
println(conf)
println()

# Test 7: Artificial Neural Network
println("=" ^ 60)
println("Test 7: ANN with topology [4, 3]")
println("=" ^ 60)
ann_hyperparameters = Dict(
    "topology" => [4, 3],
    "learningRate" => 0.01,
    "validationRatio" => 0.2,
    "numExecutions" => 10,  # Reduced for faster testing
    "maxEpochs" => 500,
    "maxEpochsVal" => 10
)

results_ann = modelCrossValidation(
    :ANN,
    ann_hyperparameters,
    (inputs, targets),
    crossValidationIndices
)

acc, err, sens, spec, ppv, npv, f1, conf = results_ann
println("Accuracy: $(round(acc[1], digits=4)) ± $(round(acc[2], digits=4))")
println("Error Rate: $(round(err[1], digits=4)) ± $(round(err[2], digits=4))")
println("Sensitivity: $(round(sens[1], digits=4)) ± $(round(sens[2], digits=4))")
println("Specificity: $(round(spec[1], digits=4)) ± $(round(spec[2], digits=4))")
println("PPV: $(round(ppv[1], digits=4)) ± $(round(ppv[2], digits=4))")
println("NPV: $(round(npv[1], digits=4)) ± $(round(npv[2], digits=4))")
println("F1-Score: $(round(f1[1], digits=4)) ± $(round(f1[2], digits=4))")
println("Confusion Matrix:")
println(conf)
println()

println("=" ^ 75)
println("Model Comparison Summary")
println("=" ^ 75)
println("Model   Accuracy (Mean ± Std)  F1-Score (Mean ± Std)")
println("-" ^ 75)
println("SVM (Linear) $(round(results_svm_linear[1][1], digits=4)) ± $(round(results_svm_linear[1][2], digits=4))   $(round(results_svm_linear[7][1], digits=4)) ± $(round(results_svm_linear[7][2], digits=4))")
println("SVM (RBF)   $(round(results_svm_rbf[1][1], digits=4)) ± $(round(results_svm_rbf[1][2], digits=4))   $(round(results_svm_rbf[7][1], digits=4)) ± $(round(results_svm_rbf[7][2], digits=4))")
println("SVM (Sigmoid)  $(round(results_svm_sigmoid[1][1], digits=4)) ± $(round(results_svm_sigmoid[1][2], digits=4))   $(round(results_svm_sigmoid[7][1], digits=4)) ± $(round(results_svm_sigmoid[7][2], digits=4))")
println("SVM (Polynomial)  $(round(results_svm_poly[1][1], digits=4)) ± $(round(results_svm_poly[1][2], digits=4))   $(round(results_svm_poly[7][1], digits=4)) ± $(round(results_svm_poly[7][2], digits=4))")
println("Decision Tree  $(round(results_dt[1][1], digits=4)) ± $(round(results_dt[1][2], digits=4))   $(round(results_dt[7][1], digits=4)) ± $(round(results_dt[7][2], digits=4))")
println("k-NN (K=3)   $(round(results_knn[1][1], digits=4)) ± $(round(results_knn[1][2], digits=4))   $(round(results_knn[7][1], digits=4)) ± $(round(results_knn[7][2], digits=4))")
println("ANN [4,3] $(round(results_ann[1][1], digits=4)) ± $(round(results_ann[1][2], digits=4))   $(round(results_ann[7][1], digits=4)) ± $(round(results_ann[7][2], digits=4))")
println("=" ^ 75)


Dataset loaded: 150 samples, 4 features
Classes: Any["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
Cross-validation folds: 5

Test 1: SVM with Linear kernel
Accuracy: 0.96 ± 0.0389
Error Rate: 0.04 ± 0.0389
Sensitivity: 0.96 ± 0.0389
Specificity: 0.9811 ± 0.0202
PPV: 0.9634 ± 0.0372
NPV: 0.9824 ± 0.0167
F1-Score: 0.9596 ± 0.0393
Confusion Matrix:
[50 0 0; 0 48 2; 0 4 46]

Test 2: SVM with RBF kernel
Accuracy: 0.9133 ± 0.0581
Error Rate: 0.0867 ± 0.0581
Sensitivity: 0.9133 ± 0.0581
Specificity: 0.9664 ± 0.0225
PPV: 0.9326 ± 0.0452
NPV: 0.9502 ± 0.0428
F1-Score: 0.9143 ± 0.0567
Confusion Matrix:
[50 0 0; 0 46 4; 0 9 41]

Test 3: SVM with Sigmoid kernel
Accuracy: 0.7333 ± 0.1366
Error Rate: 0.2667 ± 0.1366
Sensitivity: 0.7333 ± 0.1366
Specificity: 0.9085 ± 0.0362
PPV: 0.7656 ± 0.2167
NPV: 0.8734 ± 0.069
F1-Score: 0.6999 ± 0.1787
Confusion Matrix:
[50 0 0; 0 31 19; 0 21 29]

Test 4: SVM with Polynomial kernel
Accuracy: 0.9267 ± 0.0533
Error Rate: 0.0733 ± 0.0533
Sensitivity: 0.9267 ±