# Learning the parameters of a MR-Sort model using a readapted version of an existing metaheuristic algorithm (oso-pymcda) in context of unknown preference directions

## Introduction

We intend in this document to brievely describe our version of a metaheuristic algorithm for the learning of MR-Sort model parameters without knowing in advance the preference directions of some criteria.

The initial metaheuristic algorithm at our disposal comes from github (https://github.com/oso/pymcda) developped by Olivier Sobrie). This algorithm is the starting point on which we present our approach.

More specifically, this proposition - that can be seen as an extension of the existing metaheuristic - integrate more parameters, in particular preference directions on each criteria. We also include the computation preference direction restoration rate accuracy, as well as the generation of statistic plots on the learning results. 

The code originally in Python 2 has been upgraded to some extent to Python 3 and is located in the "oso-pymcda/" directory, which is in the same directory as this notebook.

## Some helpful articles on MRSort models

In order to grasp the metaheuristic that is used in this notebook and have a overview on the methodology of MCDA, we give here below some useful related articles :
   * [Learning monotone preferences using a majority rule sorting model](papers/Sobrie_Mousseau_Pirlot.pdf) (in particular, this explains with details the procedure of the metaheuristic)
   * [Learning the Parameters of a Multiple Criteria Sorting Method Based on a Majority Rule](papers/Leroy_Mousseau_Pirlot.pdf)
   * [A new decision support model for preanesthetic evaluation](papers/Sobrie_and_al.pdf)


ulimit -n 4096

## Settings

Before digging into the code, here are some requirements to have : 


* The version of Python used for this notebook is 3.7. Please check if you have the right version with this command on a terminal : *python --version* . If not, you can download this version on https://www.python.org/downloads/.

 * You may need to download Anaconda3 (you will find here the complete procedure : https://docs.anaconda.com/anaconda/install/mac-os/)

* The library matplotlib.pyplot need to be installed. This can be done with the command line below (preferably using pip  - that can be also installed following the instructions of this link : https://pip.pypa.io/en/stable/installing/):

In [1]:
pip install matplotlib

You should consider upgrading via the '/Users/pegdwendeminoungou/opt/anaconda3/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


   * Download CPLEX Optimization Studio. Go to https://www.ibm.com/products/ilog-cplex-optimization-studio (choose the student/teacher free edition)  and follow the steps until the download of the "ILOG CPLEX Optimization Studio" following your operating system. The CPLEX version used in this notebook is 12.9. You may have to create a IBMid account. 

   * Then, look at instructions in the ReadMe file of the CPLEX directory that has been created in the Applications directory. In particular, it may require that you update your Java runtime application.

   * Open also the ReadMe file in the python directory of the CPLEX directory. Execute this command line on the terminal : *pip install docplex*

   * In order to set up the CPLEX Python API, follow instructions here : https://www.ibm.com/support/knowledgecenter/SSSA5P_12.9.0/ilog.odms.cplex.help/CPLEX/GettingStarted/topics/set_up/Python_setup.html. In the same directory as previously, execute the command line on the terminal  : *python setup.py install*

  * Set the environment variable PYTHONPATH on the terminal so that it may contains the path from the root folder to "cplex" via "Anaconda3" and another path from the "Applications" folder to "cplex". Here is an example : *export PYTHONPATH=$PYTHONPATH:/Users/pegdwendeminoungou/anaconda3/lib/python3.7/site-packages/cplex:/Applications/CPLEX_Studio129/cplex*


* Any help could be found here : https://www.ibm.com/support/knowledgecenter/SSSA5P_12.9.0/ilog.odms.studio.help/Optimization_Studio/topics/COS_home.html
   

We need to set the global variable *DATADIR* so that it contains the right path from the root to this working directory  **MRSort-jupyter-notebook** . Here an example :

In [2]:
%env DATADIR /Users/pegdwendeminoungou/python_workspace/MRSort-jupyter-notebook

env: DATADIR=/Users/pegdwendeminoungou/python_workspace/MRSort-jupyter-notebook


## Brief description of the metaheuristic for learning preference directions

As stated before, our approach is based on a evolutionrary algorithm.
It consist in the generation and evolution of heterogeneous models (models with both increasing and decreasing preference directions on some criteria) in the population.
The goal of this readaptation is to foster the evolution of good models, those on which criteria possess the true preference directions.

The implementation relies on a 3 axes : the mechanism of generation and renewal of the population of model, the core strategy of the method acting on model weights and profiles, and finally the decision rule on the selection of the yielded model, as well as learned preference directions.

## Description of the code

The code contain mainly 2 parts : 
   * the first component is about the generation and learning of one parameterized MRSort model (one running of the learning algorithm followed by tests),
   * the second component is about the compilation of series of parameterized MRSort runnings and the output of interesting statistic plots.

### The first component

First, in order to load what is needed in this part, let us excute the following command line :  

In [3]:
run oso-pymcda/apps/random_model_generation_msjp_meta.py

Second, we progressively follow  these steps : 
   * <u>Step 1</u> : initialize both problem and algorithm parameters,
   * <u>Step 2</u> : generate a new random MR-Sort model (profile, weights, threshold) => this is the ground truth model,
   * <u>Step 3</u> : generate randomly a set of alternatives and performance table,
   * <u>Step 4</u> : assign categories to these alternatives to yield a learning set in accordance with the problem addressed,
   * <u>Step 5</u> : run the MR-Sort the readapted metaheuristic algorithm,
   * <u>Step 6</u> : validate the learning of the model (% of classification accuracy (CA) of the learned model compared to the initial model on the learning set, restoration rate of preference directions )
   * <u>Step 7</u> : test the learned algorithm on a benchmarch of alternatives examples
   * <u>Step 8</u> : display the important results (summarized also in a csv file)
   
   

#### Step 1 : initialize the required parameters

Here, we initialize the parameters for one running of the learning algorithm. 
First, we have the problem parameters : 
   * *nb_categories* : the number of categories (classes)
   * *nb_criteria* : the number of criteria taken in consideration in the problem
   * *nb_alternatives* : the number of alternatives in the learning set
   * *dir_criteria* : the list of preference directions of the original model
   * *nb_unk_criteria* : the number of criteria with unknown preference directions
   * *l_dupl_criteria* : the list of criteria (indices) with unknown preference directions
   * *meta_l* : the number of iteration of the metaheuristic algorithm (outer loop)
   * *meta_ll* : the number of iteration of the metaheuristic algorithm (inner loop)
   * *meta_nb_models* : the number of models (population) handled by the metaheuristic (evolutionary) algorithm during the learning process
   * *nb_tests* : the number of alternatives taken into account in the test set
   * *nb_models* : the number of models (instance problems) considered in order to compute averaged results

Let's notice that *nb_unk_criteria* must be smaller than *nb_criteria*.
By default the criteria whose preference directions are known, have an increasing preference direction.
By default the criteria whose preference directions are unknown, are the *nb_unk_criteria* first criteria (the list of criteria starting with c1, c2, c3, ....).
Then, we have the algorithm specific parameters:

   * *version_meta* : the version of implementation
   * *renewal_method* : the method used to renew the population depending on preference directions distribution.  
   * *renewal_models* : the first element of the tuple is the renewal rate, and the second is the coefficient rate (must not be both null or both non null)
   * *strategy* : the first element of the tuple is the starting lower bound on weights, and the second is the starting percentile for the profile interval restriction (on criteria with unknown preference directions)
   * *stopping_condition* : it corresponds to the maximal number of iterations
   * *decision_rule* : it corresponds to the rank of the chosen model among learned models of the population (sorted according to their fitness).
   


In [4]:
nb_categories = 2 # fixed
nb_criteria = 5
nb_alternatives = 50
dir_criteria = [1]*nb_criteria # fixed to 1 for all criteria
nb_unk_criteria = 1
l_dupl_criteria = list(range(nb_criteria))[:nb_unk_criteria]

# parameters of the metaheuristic MRSort
meta_l = 30
meta_ll = 20
meta_nb_models = 50

# test parameters
nb_tests = 10000
nb_models = 5

# additionnal parameters of the algorithm
version_meta = 8 #fixed
renewal_method = 2 #fixed
renewal_models = (0,0.35)
strategy = (0.2,25)
stopping_condition = meta_l
decision_rule = 1


Now we can create an instance of the one running of the learning algorithm as follows :

In [5]:
inst = RandMRSortLearning(nb_alternatives, nb_categories, nb_criteria, dir_criteria, l_dupl_criteria, 
                          nb_tests, nb_models, meta_l, meta_ll, meta_nb_models,renewal_method = renewal_method,
                          renewal_models = renewal_models, strategy = strategy,stopping_condition = stopping_condition, 
                          decision_rule = decision_rule)

#### Step 2 to 4 : generate a new random MRSort model, alternatives and assignments

Here 3 steps are performed one after the other in the same function. We generate a new random MRSort, then we generate alternatives, and finally we assign these alternatives in 2 categories regarding the MRSort rule on the given model. In addition to these 3 operations.

In [6]:
inst.generate_random_instance()

We can have a look on the model that have been generated :
   * generated parameters of the model MRSort

In [7]:
inst.model.cv.display() # display the weights of each criteria of the model w

     c1    c2    c3    c4    c5 
w 0.171 0.228  0.15 0.167 0.284 


In [8]:
print("Majority threshold (lambda) : \t%.7s" % inst.model.lbda) 

Majority threshold (lambda) : 	0.582


In [9]:
inst.model.bpt.display() # display the limit profile of the random model b1

      c1    c2    c3    c4    c5 
b1  0.21 0.535 0.869 0.661  0.34 


   * performance table of generated alternatives

In [10]:
inst.pt.display()

       c1    c2    c3    c4    c5 
a1   0.67 0.798 0.576 0.655  0.24 
a10 0.614 0.904 0.656 0.219 0.168 
a11 0.682 0.376 0.705 0.862 0.956 
a12 0.593 0.079 0.453 0.025 0.379 
a13 0.971 0.221 0.564  0.27 0.974 
a14  0.41 0.457 0.613 0.969 0.367 
a15 0.032 0.995 0.121  0.94 0.616 
a16 0.921 0.442 0.217  0.51 0.288 
a17 0.244 0.859 0.868 0.184  0.13 
a18  0.75 0.264 0.938 0.238 0.497 
a19 0.667 0.594 0.717 0.858 0.956 
a2  0.487 0.497 0.711 0.906 0.964 
a20 0.271 0.914 0.776  0.58  0.95 
a21 0.209 0.781  0.42 0.851 0.039 
a22 0.776 0.078 0.232 0.094 0.909 
a23 0.059 0.649 0.996 0.977 0.214 
a24 0.097 0.946 0.287 0.002 0.422 
a25 0.567  0.14 0.403 0.802 0.852 
a26 0.032 0.001  0.63 0.687 0.819 
a27 0.945 0.441 0.308  0.75  0.57 
a28 0.147 0.814 0.415 0.149 0.721 
a29  0.07 0.328 0.476 0.946 0.074 
a3  0.788 0.392 0.664 0.142 0.371 
a30 0.265 0.973 0.673 0.817  0.98 
a31 0.324 0.385 0.632 0.283 0.851 
a32 0.957 0.907 0.632 0.014 0.601 
a33 0.407 0.388 0.311 0.271 0.462 
a34 0.716 0.836 0.44

   * the result of the assignment of alternatives

In [11]:
inst.aa.display()

    category
a1      cat1
a10     cat1
a11     cat2
a12     cat1
a13     cat1
a14     cat2
a15     cat2
a16     cat1
a17     cat1
a18     cat2
a19     cat2
a2      cat2
a20     cat2
a21     cat1
a22     cat1
a23     cat1
a24     cat1
a25     cat2
a26     cat1
a27     cat2
a28     cat1
a29     cat1
a3      cat1
a30     cat2
a31     cat1
a32     cat2
a33     cat1
a34     cat2
a35     cat1
a36     cat1
a37     cat2
a38     cat2
a39     cat1
a4      cat1
a40     cat1
a41     cat1
a42     cat2
a43     cat2
a44     cat1
a45     cat1
a46     cat1
a47     cat1
a48     cat2
a49     cat1
a5      cat2
a50     cat1
a6      cat2
a7      cat2
a8      cat1
a9      cat2


#### Step 5: run the MRSort metaheuristic learning algorithm

This following step represents one running of the metaheuristic algorithm. This execution learns a randomized model from a generated learning set (performance table and assignments of alternatives).

In [12]:
inst.num_model = 0 # the number of the current running
execution_time = inst.run_mrsort()
print("Time (s) : %f" % execution_time) # computational time of the running

Time (s) : 13.475376


We display the parameters of the model learned :

In [13]:
inst.model2.bpt.display()
inst.model2.cv.display()
print("lambda\t%.7s" % inst.model2.lbda) 

      c1      c2      c3    c4    c5 
b1 0.245 0.52601 0.77601 0.666 0.367 
      c1     c2     c3     c4     c5 
w 0.3333 0.1667 0.1666 0.1667 0.1667 
lambda	0.5001


We also display the learned preference directions : (+) for an increasing direction and (-) for a decreasing preference direction

In [14]:
print(list(inst.model2.criteria))

[c1 (+), c2 (+), c3 (+), c4 (+), c5 (+)]


#### Step 6 : validate the learning of the random model

We can calculate the CA for the validation of the model regarding the learning set.

In [15]:
ca_v,cag_v = inst.eval_model_validation() # calculating the validation rate
print("validation rate : %f" % ca_v)

validation rate : 1.000000


We can also draw the confusion matrix of the validation phase :

In [16]:
matrix = compute_confusion_matrix(inst.aa, inst.aa_learned, inst.model.categories) # construction of the confusion matrix
print_confusion_matrix(matrix, inst.model.categories) # printing the confusion matrix

     cat1 cat2 
cat1   29    0 
cat2    0   21 


#### Step 7 : test the learned algorithm on a benchmarch of alternatives examples

Analogously, we can calculate the CA for the test phase regarding a test set.

In [17]:
ao_tests,al_tests,ca_t,cag_t = inst.eval_model_test()
print("test rate : %f" % ca_t)

test rate : 0.889300


#### Step 8 : show the important results

In order to present generalized statistics, we need to carry out the algorithm runnings several times yielding *nb_models* learned models. To do so, we can straightforwardly execute :  

In [18]:
inst.run_mrsort_all_models()

In [19]:
DATADIR

'/Users/pegdwendeminoungou/python_workspace/MRSort-jupyter-notebook'

As a result, all the tests are done and we have also generated a csv file summarizing the tests and giving details on each one. This file is found on the directory *rand_valid_test_na100_nca2_ncr5-0_dupl1* visible from the root directory of this notebook. The file name begins with "valid_test_dupl...." .

Another csv file is the file that contains more compact data facilitating the drawing of different plots. This file is generated with the command line :

In [20]:
inst.report_plot_results_csv()

It yields a csv file, which name begins by "plot_results...." in the same directory as the previous file.

The final function of this section is the function that ouputs an instance of the learning algorithm (criteria, categories, performance tables and assignments, all codified in a customized syntax)

In [21]:
inst.build_osomcda_instance_random()

'/Users/pegdwendeminoungou/python_workspace/MRSort-jupyter-notebook/rand_valid_test_na50_nca2_ncr5-0_dupl1/osomcda_rand-50-2-5-1-20210706-123344.csv'

This output file is also in the previous directory as the previous files.

In [22]:
print("Statistics :")
print("CA (validation) : " ,inst.stats_cav)
print("CA (generalization) : " ,inst.stats_cag)
print("CA (preference direction) : " ,inst.stats_capd)
print("Time execution (seconds) : " ,inst.stats_time)

Statistics :
CA (validation) :  1.0
CA (generalization) :  0.90212
CA (preference direction) :  0.8
Time execution (seconds) :  12.155595350265504
