# PIRD ELECTRE Tri - PreProcess

The function of this file is to develop the methods which will be used for the preprocessing of the input data:

- The performance $a$ of the alternative $i$ regarding the criterion $j$ (noted $u_j(a_i)$). In the performance matrix, each column corresponds to an alternative and each line to a criterion. 

- The reference profiles $b_k$.

### Python environment

The code is developed with the library Pandas, Numpy.

This code uses the methods developped in PreProcess.py and Process.py. 

In [1]:
import pandas as pd
import numpy as np
from numpy import random, vstack, empty

### Monte Carlo Function (Gauthier and Viala, 2023)

The **Monte Carlo method** is used to obtain data sets from distributions and use those data sets in the ELECTRE Tri procedure. 

Monte-Carlo simulation is used in complex systems in order to estimate some operations by using random sample and statistical modeling. 
1. Pick a value from Probability Density Functions to obtain a performance matrix
2. Run ELECTRE Tri with the performance matrix
3. Repeat the procedure a sufficient number of times

The first step involve to be given Probability Distribution Functions as inputs. For our study, all the values will be represented as normal distributions. To describe these distributions 2 parameters are needed : 
- the mean value : `m` given per scenario $S$ and per criterion $g$
- the variance : `variance` given per criterion $g$

To obtain the alternative's own variance, the mean value is multiplied by the variance. 

These values are in the `d` DataFrame given as input of the code. 

The following function allows to :
1. Creates the Normal Distribution from the input data present in `data`
2. Pick a random value in each of it
3. Return a DataFrame called `ndata` with the random values picked 

*The DataFrame returned will also contain all the parameters initially present in the `data` DataFrame.*

In [2]:
def MC(data):
    """
    Build a new performance matrix from the distribution
    with m : mean value and v : variance per criterion

    PARAMETERS
    ----------
    data: Data Frame 
    Table with input data and parameters

    RETURNS
    ---------
    ndata: Data frame 
    Table with the new performance Data Frame with random value picked
    in the distribution
    """
    ndata = data.copy()
    variance = ndata['VAR'].values  # general variance located in the column "VAR"
    m = ndata.iloc[:, 0:28].values  # for each scenario : columns 0 to 27
    v = np.abs(m * variance[:, np.newaxis])  # variance v of the performance matrix
    perf = np.random.normal(m, v)  # random value in the normal distribution
    ndata.iloc[:, 0:28] = perf
    return ndata

### Reference profiles as interval

Figure 1 illustrates the representation of reference profiles and their influence on categories. Notably, each reference profile for every criterion is traditionally characterized by crisp values. In contrast, the novel approach adopted in this study employs intervals to define reference profiles, thereby introducing fuzziness into the membership characterization of alternatives within categories.

<center>
<figure>
  <img src="Figures/ref_profiles.png" width="50%" height="50%">
  <figcaption><i> Figure 1: Reference profiles and categories </i></figcaption>
</figure>
</center>

The impact of this shift is perceptible in Figure 2, where the absence of crisp values in the reference profiles results in an overlap between two adjacent categories. This overlap has noteworthy implications, and its manifestation can be further elucidated through a diagram illustrating the membership functions for each conceivable category assignment. Figure 3 provides a visual representation of these membership functions, delineating the spectrum between strict categorization and overlapping categories.

<center>
<figure>
  <img src="Figures/ref_profiles_bis.png" width="70%" height="70%">
  <figcaption><i> Figure 2: Definition of the reference profiles </i></figcaption>
</figure>
</center>

<center>
<figure>
  <img src="Figures/degree_membership.png" width="35%" height="35%">
  <figcaption><i> Figure 3: Degree of membership of an alternative to categories </i></figcaption>
</figure>
</center>

A consequence of this definition of reference profile is the increased number of combinations of alternatives $a_i$ and value for the reference profiles $b_k$. More data (i.e. more relationship “value”) will be obtained and used for the sorting process and more importantly for the analysis of the results.

To define the values $b_{k,min}$ and $b_{k,max}$ the use of a percentage P is chosen. P need to be defined by the decision maker together with the technical team and is used to set the limits of each reference profile interval.

$b_{k,min}=b_k-P \cdot b_k$

$b_{k,max}=b_k+P \cdot b_k$

The value of $P$ is chosen equal to 0.018 by trial and error such that the following inequation is observed.

$b_{k-1,max}<b_{k,min}$

*The DataFrame returned contains, for each criterion $g$, the values $b_{k,min}$ and $b_{k,max}$ which can be consider as whole new reference profiles in the way that they will be studied as such in the ELECTRE Tri calculation, no matter if they are crisp reference profiles or boudaries of an interval*

In [3]:
# Reference profiles matrix
def refIntervals(data):
    """
    Build a reference matrix from the reference profiles of the input data

    PARAMETERS
    ----------
    data : Data Fram
        Table with input data and parameters

    RETURNS
    -------
    nref : Data Frame
        Table with the reference Data Frame
    """
    nref_intermediate = pd.DataFrame(index=['g1.1', 'g1.2', 'g1.3', 'g1.4', 'g1.5',
                                'g2.1', 'g2.2', 'g2.3', 'g2.4',
                                'g3.1', 'g3.2', 'g3.3', 'g3.4',
                                'g4.1', 'g4.2', 'g4.3'],
                         columns=['b0_min', 'b0_max',
                                  'b1_min', 'b1_max',
                                  'b2_min', 'b2_max',
                                  'b3_min', 'b3_max',
                                  'b4_min', 'b4_max',
                                  'b5_min', 'b5_max'])
    
    nref = pd.DataFrame(index=['g1.1', 'g1.2', 'g1.3', 'g1.4', 'g1.5',
                                'g2.1', 'g2.2', 'g2.3', 'g2.4',
                                'g3.1', 'g3.2', 'g3.3', 'g3.4',
                                'g4.1', 'g4.2', 'g4.3'],
                         columns=['b0_max',
                                  'b1_min', 'b1_max',
                                  'b2_min', 'b2_max',
                                  'b3_min', 'b3_max',
                                  'b4_min', 'b4_max',
                                  'b5_min'])
    P = data['P'].values  # general interval located in the column "P"
    bk = data.iloc[:, 30:36].values  # for each reference profile : columns 30 to 35
    bk_min = bk - bk*P[:, np.newaxis]
    bk_max = bk + bk*P[:, np.newaxis]

    # new reference matrix of bk_min and bk_max
    for k in range(bk_min.shape[1]):
            nref_intermediate.iloc[:,  2*k] = bk_min[:, k]
            nref_intermediate.iloc[:,  2*k+1] = bk_max[:, k]
    nref_intermediate.iloc[6, 6:12] = 1  # g2.2 is a YES/NO criterion, the interval cannot be applied --> à faire + proprement avec original ref.
    columns_to_delete = ['b0_min', 'b5_max']
    nref = nref_intermediate.drop(columns=columns_to_delete)
    nref.to_csv('new_ref_matrix.csv')
    return nref