
# Extending Aggregation and Transformation Functions

This tutorial serves as a guide for utilizing the extension tools for aggregation and transformer functions in Scikit-Criteria. After going through this tutorial, you will be able to implement your own multi-criteria decision models compatible with the data types and tools provided by the library.

## Introduction

In Scikit-Criteria, leveraging the provided decorators (`@extend.mkagg` and `@extend.mktransformer`) for extending aggregation and transformation functions provides a powerful means to customize decision-making models allowing the creation of custom functions, enabling domain-specific logic implementation for diverse use cases. 

Decorators simplify the process of converting functions into model classes, promoting flexibility in model creation without complex class hierarchies. This facilitates quick prototyping and experimentation by allowing direct modification of functions. Additionally, the decorators handle hyperparameter initialization, encapsulating them within models, promoting clean, organized code and reducing the chances of errors related to parameter handling.


Example Usage:

In [1]:
# Import the decorators from the module
from skcriteria.extend import mkagg, mktransformer

# Define custom aggregation and transformation functions
@mkagg
def CustomAggregation(**kwargs):
    # Implement aggregation logic
    pass

@mktransformer
def CustomTransformation(**kwargs):
    # Implement transformation logic
    pass

While this code is syntactically valid, attempting to use it may not work as intended since it doesn't return the required values.


## 2. A New Aggregation Model

To create a custom aggregation model, follow these steps:

1. Declare a function with the name of your model using the [`CapWords`/`UpperCamelCase`/`PascalCase`](https://en.wikipedia.org/wiki/Camel_case) convention. While this is not mandatory, not adhering to this convention will trigger a warning message from scikit-criteria, notifying that the model name does not follow the Scikit-Criteria standard.


In [2]:
@mkagg
def bad_model_name(**kwargs):
    pass

  return _agg_maker if maybe_func is None else _agg_maker(maybe_func)


In [3]:
@mkagg
def GodModelName(**kwargs):
    pass

2. The function should take parameters representing the decomposed decision matrix after calling the `DecisionMatrix.to_dict()` method, and a parameter `hparams`, which will be explained later and contains the hyper-parameters of the model.


- `hparams`: Model Hyperparameters.
- `matrix`: Alternatives matrix as pandas DataFrame.
- `objectives`: Objectives for criteria as integers: $maximize = 1$ and $minimize = -1$
- `weights`: Weights of the criteria.
- `dtypes`: Dtypes of the criteria.
- `alternatives`: Names of the alternatives.
- `criteria`: Names of the criteria.

Additionally, if you do not want to use any of those parts of the matrix, you can declare the function with [Variable Keyword Arguments (`**kwargs`)](https://www.w3schools.com/python/gloss_python_function_arbitrary_keyword_arguments.asp).


**If any parameter is forgotten and `**kwargs` is not present, a [`TypeError`](https://docs.python.org/3/library/exceptions.html#TypeError) is raised.**


So this next two functions are a valid Aggregation functions

In [4]:
@mkagg
def AllParameters(hparams, matrix, objectives, weights, dtypes, alternatives, criteria):
    pass

@mkagg
def OnlyTwoWithKwargs(matrix, weights, **kwargs):
    pass

3. Utilizing the received parameters, the function should return two objects:

    1. A `numpy.array`/`list`/`tuple` or any kind of sequence containing a valid ranking.
       Where the `i`-th position in the returned sequence has the ranking value for
       the `i`-th alternative in the array of alternatives received as a parameter.
    2. A `dict` with extra values from the ranking (intermediate results or other useful data for decision-making analysis).

<div class="alert alert-info">
**Note:** Understanding the Rankings

A valid ranking has the following conditions:

1. **Length:** It should have the same length as the number of alternatives received by the function.
2. **Ascending and Consecutive Order:** The values must be in ascending order and consecutive. This means that values should start from 1 and increase by increments of 1 without skips  For example, `[1, 2, 3, 4]`  and `[1, 2, 1]` is valid, but `[4, 2, 4, 1]` is not valid because the value 3 is missing.
3. **Integers Only:** Values must be integers. Fractional or other types of values are not allowed.

---

So if we have the alternatives `["banana", "apple", "orange"]` and the ranking `[1, 2, 1]`

The meaning of the ranking in relation to the alternatives is as follows:

1. The first position in the ranking is 1, indicating that the alternative in the first position is the most preferred or the best choice.
2. The second position in the ranking is 2, suggesting that the alternative in the second position is the second-best choice.
3. The third position in the ranking is also 1, implying that the alternative in the third position is equally preferred to the alternative in the first position.

Therefore, the ranking `[1, 2, 1]` could be interpreted as stating that "Banana" and "Orange" are equally preferred, and "Apple" is the second preferred choice. It's important to note that the ranking must adhere to the specific conditions mentioned in the definitions, such as the correct length, ascending and consecutive order, and integer values.
</div>

With all of this, a complete and valid aggregation function would be:

In [5]:
import numpy as np

@mkagg
def AllAlternativesAreFirst(alternatives, **kwargs):
    # Assign a rank of 1 to each alternative
    rank = [1] * len(alternatives)
    
    # Define extra information (example: some important value)
    extra = {"some_important_value": "the_important_value"}
    
    # Return the rank and extra information
    return rank, extra

Let's test the new aggregation with a dataset.


In [6]:
import skcriteria as skc
dm = skc.datasets.load_simple_stock_selection() # load the dataset
dm

Unnamed: 0,ROE[▲ 2.0],CAP[▲ 4.0],RI[▼ 1.0]
PE,7,5,35
JN,5,4,26
AA,5,6,28
FX,3,4,36
MM,1,7,30
GN,5,8,30


In [7]:
# Instantiate the new aggregation
agg = AllAlternativesAreFirst()
agg

<AllAlternativesAreFirst []>

In [8]:
# evaluate
rank = agg.evaluate(dm)
rank

Alternatives,PE,JN,AA,FX,MM,GN
Rank,1,1,1,1,1,1


In [9]:
rank.e_.some_important_value

'the_important_value'

## 3. Hyperparameters 

The [Hyper-parameters](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)) (in the context of machine learning) are parameters that allow you to specify details on how the function will carry out its aggregation. In this sense, they are more similar to [Free-Parameters](https://en.wikipedia.org/wiki/Free_parameter) as they cannot be predicted or constrained by the model.

In Scikit-Criteria, we define the concept of Hyper-parameters similar to the Hyper-parameters in Scikit-Learn: Parameters received by the model's (Aggregation function class) constructor and **always** should have some default value.

For example, in the case of Scikit-Criteria's implementation of [TOPSIS](https://en.wikipedia.org/wiki/TOPSIS), it has a hyper-parameter for the metric it will use, and by default, it is set to `"euclidean"`.


In [10]:
from skcriteria.agg import similarity
similarity.TOPSIS()

<TOPSIS [metric='euclidean']>

In [11]:
similarity.TOPSIS(metric="cityblock")

<TOPSIS [metric='cityblock']>

The hyper-parameters can be provided as named parameters to the `@mkagg` decorator, and their values can be accessed using the `hparams` parameter.



<div class="alert alert-info">
**Note:** Regarding the nature of <code>hparams</code>

If you are familiar with how methods work in Python classes, `hparams` is essentially the `self` of the model.

</div>

Now, for example, if we want to create a model named `MaybeWSM`, which is a [weighted-sum-model](https://en.wikipedia.org/wiki/Weighted_sum_model) that uses weights only when the `use_weight` hyperparameter is set to `True`, and the default value is indeed `True`.

In [12]:
import numpy as np

from skcriteria.utils import rank

@mkagg(use_weights=True)
def MaybeWSM(hparams, matrix, objectives, weights, **kwargs):
    """The Maybe-Weighted Sum Model (WSM) to rank alternatives.

    If the use_weights parameter in hparams is set to True, the 
    function applies weights to the decision matrix. This is done 
    by taking the inner product of the matrix and the weights vector.
    
    """
    # Check if objectives contain -1 (minimize objectives)
    if -1 in objectives:
        raise ValueError("'MaybeWSM' cant operate with minimize objectives")

    # If use_weights is True, apply weights to the matrix
    if hparams.use_weights:
        matrix = matrix * weights

    # Calculate the scores by row/alternative
    score = np.sum(matrix, axis=1)

    # rank_values calculates the ranking based on the scores. 
    # `reverse = True` indicates that higher scores are closer to the 1st place.
    # Additionally, we will return the calculated 'score' as extra information.
    return rank.rank_values(score, reverse=True), {"score": score}


Let's use our MaybeWSM model.

First, let's see what happens if we create a `MaybeWSM` with the default (`use_weights=True`) and try to evaluate the available decision matrix (`dm`).

In [13]:
with_useweight = MaybeWSM()
with_useweight

<MaybeWSM [use_weights=True]>

If we use `dm` as it is right now, we will get an exception:  `'MaybeWSM' can't operate with minimize objectives` because, indeed, `dm` has some criteria to minimize.

In [14]:
dm.minwhere  # the critetia to minimize

ROE    False
CAP    False
RI      True
Name: minwhere, dtype: bool

For this reason, first, we will use the `InvertMinimize` transformer to eliminate criteria to minimize.


In [15]:
from skcriteria.preprocessing import invert_objectives

dm = invert_objectives.InvertMinimize().transform(dm)
dm.minwhere

ROE    False
CAP    False
RI     False
Name: minwhere, dtype: bool

In [16]:
rank_with_uw = with_useweight.evaluate(dm)
rank_with_uw

Alternatives,PE,JN,AA,FX,MM,GN
Rank,3,5,2,6,4,1


Now, let's try `use_weights=False`.

In [17]:
without_useweight = MaybeWSM(use_weights=False)
without_useweight

<MaybeWSM [use_weights=False]>

In [18]:
rank_without_uw = without_useweight.evaluate(dm)
rank_without_uw

Alternatives,PE,JN,AA,FX,MM,GN
Rank,2,4,3,6,5,1


It can be seen that depending on the configuration of the hyperparameter `use_weights`, the results are different.

In addition to this, the `score` is available within `extra_`.

In [19]:
rank_with_uw.e_.score, rank_without_uw.e_.score

(array([34.02857143, 26.03846154, 34.03571429, 22.02777778, 30.03333333,
        42.03333333]),
 array([12.02857143,  9.03846154, 11.03571429,  7.02777778,  8.03333333,
        13.03333333]))

## 3. A New Transformer