# Creating Model Wrappers for Integration with *Scikit-Learn*

In this notebook, we will demonstrate how to create a wrapper for a `Flux` model in Julia and use it in a `Pipeline` along with `GridSearchCV` to find the best hyperparameters. This procedure is identical for integrating any type of model. There are just two considerations to keep in mind.

The first is that the model must implement the *ScikitLearnBase* interface, or at least, as we will see later, the necessary functions to be executable.

The second consideration is that, since in Julia, Scikit-Learn is itself a wrapper for the Python package, once that package is called, you cannot return to Julia, and all models must return the results to Julia before proceeding. To give a clear example, if we tried to use GridSearchCV from Python, the model in Flux could not be integrated because the intermediate results are not accessible from Python to the Julia matrix. However, if we use the GridSearchCV package implemented in Julia, as we will see later, it does obtain the results from the Python packages, and therefore, it will work correctly.

## 1. Installing Packages

The first step is to ensure that the `Flux` library, which is a machine learning library for Julia, is installed.

``` julia
using Pkg
Pkg.add(["Flux", "ScikitLearn", "ScikitLearnBase", "Statistics"])
```

In [None]:
import ScikitLearnBase: BaseClassifier, fit!, predict, score, @declare_hyperparameters, is_classifier # Se explica más adelante
using Flux
using Flux.Losses
using Statistics

## 2. Creating a Container Structure

To create the *wrapper*, the first step in Python would be to create a class that inherits or adheres to a certain *interface*, meaning a set of method signatures that allow us to make calls uniformly and recognizable to the Scikit-Learn library.

In Julia, unlike Python or other Object-Oriented languages, there are no classes as such. However, this is not a problem since it can be simulated with a mutable structure. These structures allow changing values and storing calls to different methods, similar to a class, and will also serve as a type for method overloading.

The first step is importing the interface, in this case, `ScikitLearnBase`, which will provide all the method signatures needed. Below is an example with classification networks.

In [None]:
mutable struct ClassANN <: BaseClassifier
    # Hyperparameters of the model (not learnable from data)
    topology::AbstractVector{Int}
    transferFunctions::AbstractVector{Function}
    maxEpochs::Int
    minLoss::Real
    learningRate::Real

    # Learnable Parameters (model in Flux and optimizer)
    model::Chain
    opt::ADAM

    # Constructor which takes the hyperparameters as arguments
    ClassANN(; topology=[1], transferFunctions=fill(σ, 1), maxEpochs=1000, minLoss=0.0, learningRate=0.01) =
        new(topology, transferFunctions, maxEpochs, minLoss, learningRate, Chain(), ADAM(learningRate))
end


In this code, you can see an initial section of parameters that will be required for the constructor, and it will be the user's responsibility to define them. Next, there are utility parameters that will be extracted from the data but will not be exposed externally, and finally, the definition of the constructor for this "pseudo-class." The latter includes the default values for the set.

To complete the definition, two additional things need to be defined. The first is a method that declares the type of problem the model can solve, in this case, classification. For this, the following call is necessary:

In [None]:
# Point at ClassANN as a classifier
is_classifier(::ClassANN) = true

The second necessary element is to declare the parameters that the user needs to provide to construct an instance of this type:

In [None]:
@declare_hyperparameters(ClassANN, [:topology, :transferFunctions, :maxEpochs, :minLoss, :learningRate])

# 3. Auxiliary Functions
Next, we would define all the auxiliary functions necessary for the model's execution. This can be done within the calls to `fit!`, `predict`, and `score` themselves. Another option is to use functions that have been implemented in previous instances, such as those from the Machine Learning Fundamentals course. For this, you can use the following call:

```julia
include("MypackageFunctions.jl")
```

However, for the sake of self-containment, below is an example code of the necessary functions for this example.

In [None]:
# Function to perform encoding, which receives the feature vector (one per pattern) and the classes
function oneHotEncoding(feature::AbstractArray{<:Any,1}, classes::AbstractArray{<:Any,1})
    # First, ensure that all elements in the feature vector are present in the classes vector
    @assert(all([in(value, classes) for value in feature]))
    numClasses = length(classes)
    @assert(numClasses > 1)
    if (numClasses == 2)
        # If there are only two classes, return a matrix with one column
        oneHot = reshape(feature .== classes[1], :, 1)
    else
        # If there are more than two classes, return a matrix with one column per class
        # Either of these two types (Array{Bool,2} or BitArray{2}) works perfectly
        # oneHot = Array{Bool,2}(undef, length(targets), numClasses)
        # oneHot = BitArray{2}(undef, length(feature), numClasses)
        oneHot = classes' .== feature
    end
    return oneHot
end

# This function is similar to the previous one, but if classes are not specified, they are taken from the feature variable itself
oneHotEncoding(feature::AbstractArray{<:Any,1}) = oneHotEncoding(feature, unique(feature))

# We overload the oneHotEncoding function in case a vector of boolean values is passed
# In this case, the vector is already encoded, so we simply convert it into a column matrix
oneHotEncoding(feature::AbstractArray{Bool,1}) = reshape(feature, :, 1)

In [None]:
# Function for Constructing the Artificial Neural Network
function buildClassANN(numInputs::Int, topology::AbstractVector{Int}, numOutputs::Int;
            transferFunctions::AbstractVector{Function}=fill(σ, length(topology)))
    layers = []
    numInputsLayer = numInputs

    for (i, numNeurons) in enumerate(topology)
        push!(layers, Dense(numInputsLayer, numNeurons, transferFunctions[i]))
        numInputsLayer = numNeurons
    end

    if numOutputs == 1
        push!(layers, Dense(numInputsLayer, 1, σ))
    else
        push!(layers, Dense(numInputsLayer, numOutputs, identity))
        push!(layers, softmax)
    end

    return Chain(layers...)
end

We might also need the `trainANNClass` function, but in this case, for purely educational purposes, a simple implementation will be provided directly within the function to serve as an example.

## 4. Implementing the *Scikit-Learn* Functions

As previously mentioned, it is necessary to implement the methods required by ScikitLearnBase. For classification, these include `fit!`, `predict`, and `score`.

First, the `fit!` function is used for fitting the model using Flux for training. If it is separated into a function, it can be utilized accordingly.


In [None]:
# Implementing the model fitting (fit!)
function fit!(model::ClassANN, X, y)
    numInputs = size(X, 2)
    numOutputs = length(unique(y))

    # Build the model using Flux
    model.model = buildClassANN(numInputs, model.topology, numOutputs, transferFunctions=model.transferFunctions)
    model.opt = ADAM(model.learningRate)

    # Convert the labels using one-hot encoding if the classification is  multiclass
    if numOutputs > 1
        y = oneHotEncoding(y, unique(y))
    end

    # Define the loss function
    loss(x, y) = Flux.crossentropy(model.model(x), y)

    # Train the model
    # Alternatively
    #(model, results) = trainANNClass(model.topology, (X', y')], model.maxEpoachs, model.minLoss, model.learningRate)
    for epoch in 1:model.maxEpochs
        Flux.train!(loss, Flux.params(model.model), [(X', y')], model.opt)
        current_loss = loss(X', y')
        println("Epoch: $epoch, Loss: $current_loss")
        if current_loss <= model.minLoss
            break
        end
    end
    return model
end

It is also required the implementation of `predict` for the use in crossvalidation and pipeline evaluation

In [None]:
# Implement the models prediction (predict)
function predict(model::ClassANN, X)
    if size(model.model(X'), 1) > 1
        return Flux.onecold(model.model(X'), 1:size(model.model(X'), 1))
    else
        return round.(model.model(X'))
    end
end

It will also be required the `score`function, becasuse we are going to use this *wrapper* with `GridSearchCV`

In [None]:
# Función adicional para calcular el puntaje (score) del modelo
function score(model::ClassANN, X, y)
    predictions = predict(model, X)
    return mean(predictions .== y)
end

### 4.1 Transformadores y regresión

En caso de emplearse un wrapper de otro método como puede ser un PCA o alguna técnica de aprendizaje no supervisado. Sería necesrio implementar las funciones `fit!`, `transform` y `fit_transform`.
POr su parte en el caso de implementar un modelo de regresión, sería preciso implementar `fit!`, `predict` `score`y en la mayoría de los casos `predict_proba`. Consulte el API en [Scikit Learn Model API](https://scikitlearnjl.readthedocs.io/en/latest/api/)

# 5. Interaction with the Library

The previous code can be saved in a file and loaded using an `include` statement or through a module that can be used later. In that case, remember to export the necessary functions from the module.

In any case, the following code also demonstrates how to call these functions and how to integrate them within a `Pipeline`. You can modify it to search for the best combination

In [None]:
# Test.jl

using ScikitLearn
using ScikitLearn.Pipelines: Pipeline, named_steps
using ScikitLearn.GridSearch: GridSearchCV         #IMPORTANT use the Julia Implemetation not the Python one
#@sk_import model_selection: GridSearchCV          # This implementation is the Python one and would give an error becase it can not found the Wrapper

@sk_import decomposition: PCA
@sk_import datasets: load_iris

# Load data
iris = load_iris()
X = iris["data"]
y = iris["target"]

# Define the Grid (hiperparameters to test)
param_grid = Dict(
    "ann__maxEpochs" => [500, 1000], 
    "ann__learningRate" => [0.01, 0.1]
)

# Define the base parámeters
topology = [3, 4]
functions = [σ, σ]
maxEpochs = 1000
minLoss = 0.0
learningRate = 0.01

# create a pipeline with a model included in the wrapper
ann = ClassANN(topology=topology, transferFunctions=functions, maxEpochs=maxEpochs, minLoss=minLoss, learningRate=learningRate)

estimators = [("pca",PCA()),("ann",ann)]
pipe = Pipeline(estimators)

# Setup the GridSearchCV
grid_search = GridSearchCV(pipe, param_grid)

# Train the model using GridSearchCV
fit!(grid_search, X, y)

# Get the best model and its hyperparameters
println("Best model: ", grid_search.best_estimator_)
println("Best hiperparameters: ", grid_search.best_params_)