# Learning the parameters of a MR-Sort model using a readapted version of an existing metaheuristic algorithm (oso-pymcda) in context of unknown preference directions

## Introduction

We intend in this document to brievely describe our version of a metaheuristic algorithm for the learning of MR-Sort model parameters without knowing in advance the preference directions of some criteria.

The initial metaheuristic algorithm at our disposal comes from github (https://github.com/oso/pymcda) developped by Olivier Sobrie). This algorithm is the starting point on which we present our approach.

More specifically, this proposition - that can be seen as an extension of the existing metaheuristic - integrate more parameters, in particular preference directions on each criteria. We also include the computation preference direction restoration rate accuracy, as well as the generation of statistic plots on the learning results. 

The code originally in Python 2 has been upgraded to some extent to Python 3 and is located in the "oso-pymcda/" directory, which is in the same directory as this notebook.

## Some helpful articles on MRSort models

In order to grasp the metaheuristic that is used in this notebook and have a overview on the methodology of MCDA, we give here below some useful related articles :
   * [Learning monotone preferences using a majority rule sorting model](papers/Sobrie_Mousseau_Pirlot.pdf) (in particular, this explains with details the procedure of the metaheuristic)
   * [Learning the Parameters of a Multiple Criteria Sorting Method Based on a Majority Rule](papers/Leroy_Mousseau_Pirlot.pdf)
   * [A new decision support model for preanesthetic evaluation](papers/Sobrie_and_al.pdf)


## Settings

Before digging into the code, here are some requirements to have : 


* The version of Python used for this notebook is 3.7. Please check if you have the right version with this command on a terminal : *python --version* . If not, you can download this version on https://www.python.org/downloads/.

 * You may need to download Anaconda3 (you will find here the complete procedure : https://docs.anaconda.com/anaconda/install/mac-os/)

* The library matplotlib.pyplot need to be installed. This can be done with the command line below (preferably using pip  - that can be also installed following the instructions of this link : https://pip.pypa.io/en/stable/installing/):

In [1]:
pip install matplotlib

Note: you may need to restart the kernel to use updated packages.


   * Download CPLEX Optimization Studio. Go to https://www.ibm.com/products/ilog-cplex-optimization-studio (choose the student/teacher free edition)  and follow the steps until the download of the "ILOG CPLEX Optimization Studio" following your operating system. The CPLEX version used in this notebook is 12.9. You may have to create a IBMid account. 

   * Then, look at instructions in the ReadMe file of the CPLEX directory that has been created in the Applications directory. In particular, it may require that you update your Java runtime application.

   * Open also the ReadMe file in the python directory of the CPLEX directory. Execute this command line on the terminal : *pip install docplex*

   * In order to set up the CPLEX Python API, follow instructions here : https://www.ibm.com/support/knowledgecenter/SSSA5P_12.9.0/ilog.odms.cplex.help/CPLEX/GettingStarted/topics/set_up/Python_setup.html. In the same directory as previously, execute the command line on the terminal  : *python setup.py install*

  * Set the environment variable PYTHONPATH on the terminal so that it may contains the path from the root folder to "cplex" via "Anaconda3" and another path from the "Applications" folder to "cplex". Here is an example : *export PYTHONPATH=$PYTHONPATH:/Users/pegdwendeminoungou/anaconda3/lib/python3.7/site-packages/cplex:/Applications/CPLEX_Studio129/cplex*


* Any help could be found here : https://www.ibm.com/support/knowledgecenter/SSSA5P_12.9.0/ilog.odms.studio.help/Optimization_Studio/topics/COS_home.html
   

We need to set the global variable *DATADIR* so that it contains the right path from the root to this working directory  **MRSort-jupyter-notebook** . Here an example :

In [2]:
%env DATADIR /Users/pegdwendeminoungou/python_workspace/MRSort-jupyter-notebook

env: DATADIR=/Users/pegdwendeminoungou/python_workspace/MRSort-jupyter-notebook


## Brief description of the metaheuristic for learning preference directions

As stated before, our approach is based on a evolutionrary algorithm.
It consist in the generation and evolution of heterogeneous models (models with both increasing and decreasing preference directions on some criteria) in the population.
The goal of this readaptation is to foster the evolution of good models, those on which criteria possess the true preference directions.

The implementation relies on a 3 axes : the mechanism of generation and renewal of the population of model, the core strategy of the method acting on model weights and profiles, and finally the decision rule on the selection of the yielded model, as well as learned preference directions.

## Description of the code

The code contain mainly 2 parts : 
   * the first component is about the generation and learning of one parameterized MRSort model (one running of the learning algorithm followed by tests),
   * the second component is about the compilation of series of parameterized MRSort runnings and the output of interesting statistic plots.

### The first component

First, in order to load what is needed in this part, let us excute the following command line :  

In [3]:
run oso-pymcda/apps/random_model_generation_msjp.py

Second, we progressively follow  these steps : 
   * <u>Step 1</u> : initialize both problem and algorithm parameters,
   * <u>Step 2</u> : generate a new random MR-Sort model (profile, weights, threshold) => this is the ground truth model,
   * <u>Step 3</u> : generate randomly a set of alternatives and performance table,
   * <u>Step 4</u> : assign categories to these alternatives to yield a learning set in accordance with the problem addressed,
   * <u>Step 5</u> : run the MR-Sort the readapted metaheuristic algorithm,
   * <u>Step 6</u> : validate the learning of the model (% of classification accuracy (CA) of the learned model compared to the initial model on the learning set, restoration rate of preference directions )
   * <u>Step 7</u> : test the learned algorithm on a benchmarch of alternatives examples
   * <u>Step 8</u> : display the important results (summarized also in a csv file)
   
   

#### Step 1 : initialize the required parameters

Here, we initialize the parameters for one running of the learning algorithm. 
First, we have the problem parameters : 
   * *nb_categories* : the number of categories (classes)
   * *nb_criteria* : the number of criteria taken in consideration in the problem
   * *nb_alternatives* : the number of alternatives in the learning set
   * *dir_criteria* : the list of preference directions of the original model
   * *nb_unk_criteria* : the number of criteria with unknown preference directions
   * *l_dupl_criteria* : the list of criteria (indices) with unknown preference directions
   * *nb_tests* : the number of alternatives taken into account in the test set
   * *nb_models* : the number of models that independantly learn during one running of the learning algorithm
   * *meta_l* : the number of iteration of the metaheuristic algorithm (outer loop)
   * *meta_ll* : the number of iteration of the metaheuristic algorithm (inner loop)
   * *meta_nb_models* : the number of models (population) handled by the metaheuristic (evolutionary) algorithm during the learning process

Let's notice that *nb_unk_criteria* must be smaller than *nb_criteria*.
By default the criteria whose preference directions are known, have an increasing preference direction.
By default the criteria whose preference directions are unknown, are the *nb_unk_criteria* first criteria (the list of criteria starting with c1, c2, c3, ....).
Then, we have the algorithm specific parameters:

   * *version_meta* : the version of implementation
   * *renewal_method* : the method used to renew the population depending on preference directions distribution.  
   * *renewal_models* : the first element of the tuple is the renewal rate, and the second is the coefficient rate (must not be both null or both non null)
   * *strategy* : the first element of the tuple is the starting lower bound on weights, and the second is the starting percentile for the profile interval restriction (on criteria with unknown preference directions)
   * *stopping_condition* : it corresponds to the maximal number of iterations
   * *decision_rule* : it corresponds to the rank of the chosen model among learned models of the population (sorted according to their fitness).
   


In [4]:
nb_categories = 2 # fixed
nb_criteria = 5
nb_alternatives = 50
dir_criteria = [1]*nb_criteria # fixed to 1 for all criteria
nb_unk_criteria = 1
l_dupl_criteria = list(range(nb_criteria))[:nb_unk_criteria]

# test parameters
nb_tests = 10000
nb_models = 5

# parameters of the metaheuristic MRSort
meta_l = 30
meta_ll = 20
meta_nb_models = 50

# additionnal parameters of the algorithm
version_meta = 8 #fixed
renewal_method = 2 #fixed
renewal_models = (0,0.35)
strategy = (0.2,25)
stopping_condition = meta_l
decision_rule = 1


Now we can create an instance of the one running of the learning algorithm as follows :

In [15]:
inst = RandMRSortLearning(nb_alternatives, nb_categories, nb_criteria, dir_criteria, l_dupl_criteria, 
                          nb_tests, nb_models, meta_l, meta_ll, meta_nb_models,renewal_method = renewal_method,
                          renewal_models = renewal_models, strategy = strategy,stopping_condition = stopping_condition, 
                          decision_rule = decision_rule)

#### Step 2 to 4 : generate a new random MRSort model, alternatives and assignments

Here 3 steps are performed one after the other in the same function. We generate a new random MRSort, then we generate alternatives, and finally we assign these alternatives in 2 categories regarding the MRSort rule on the given model. In addition to these 3 operations.

In [6]:
inst.generate_random_instance()

We can have a look on the model that have been generated :
   * generated parameters of the model MRSort

In [7]:
inst.model.cv.display() # display the weights of each criteria of the model w

     c1    c2    c3    c4    c5 
w 0.373 0.219 0.224  0.07 0.114 


In [8]:
print("Majority threshold (lambda) : \t%.7s" % inst.model.lbda) 

Majority threshold (lambda) : 	0.656


In [9]:
inst.model.bpt.display() # display the limit profile of the random model b1

      c1    c2    c3    c4    c5 
b1 0.286 0.796  0.28 0.259 0.594 


   * performance table of generated alternatives

In [10]:
inst.pt.display()

       c1    c2    c3    c4    c5 
a1  0.399 0.685 0.802 0.755 0.392 
a10 0.267 0.672 0.367 0.349 0.637 
a11 0.624 0.411 0.055  0.99 0.263 
a12 0.286 0.956 0.565 0.755 0.664 
a13 0.165  0.61 0.371 0.989 0.335 
a14 0.246 0.502 0.862 0.634 0.007 
a15  0.72 0.578 0.548 0.019  0.54 
a16 0.723 0.169 0.579 0.093 0.655 
a17 0.854 0.022 0.793 0.487 0.331 
a18 0.456 0.719 0.708 0.553 0.668 
a19 0.005 0.993 0.848 0.104 0.434 
a2  0.768 0.867 0.931 0.687 0.214 
a20 0.674  0.89 0.606 0.117 0.271 
a21 0.081 0.129 0.168 0.616  0.85 
a22 0.127 0.272 0.081 0.327 0.032 
a23 0.623 0.782 0.863 0.392 0.994 
a24 0.869   0.9 0.803 0.469 0.812 
a25  0.92 0.289 0.352 0.948 0.459 
a26 0.115 0.246 0.674   0.5 0.204 
a27 0.558 0.236 0.571 0.983 0.578 
a28 0.125 0.993 0.179 0.815 0.111 
a29 0.289 0.954 0.875 0.609 0.006 
a3  0.873 0.136 0.106   0.5 0.778 
a30 0.431 0.737 0.725 0.139 0.806 
a31 0.792 0.259 0.707  0.38 0.909 
a32 0.661 0.723 0.999 0.811 0.323 
a33 0.514 0.904 0.577 0.772 0.362 
a34 0.749 0.158 0.76

   * the result of the assignment of alternatives

In [11]:
inst.aa.display()

    category
a1      cat2
a10     cat1
a11     cat1
a12     cat2
a13     cat1
a14     cat1
a15     cat1
a16     cat2
a17     cat2
a18     cat2
a19     cat1
a2      cat2
a20     cat2
a21     cat1
a22     cat1
a23     cat2
a24     cat2
a25     cat2
a26     cat1
a27     cat2
a28     cat1
a29     cat2
a3      cat1
a30     cat2
a31     cat2
a32     cat2
a33     cat2
a34     cat2
a35     cat2
a36     cat2
a37     cat1
a38     cat1
a39     cat1
a4      cat2
a40     cat2
a41     cat2
a42     cat1
a43     cat1
a44     cat1
a45     cat2
a46     cat1
a47     cat2
a48     cat2
a49     cat2
a5      cat2
a50     cat1
a6      cat2
a7      cat2
a8      cat2
a9      cat1


#### Step 5: run the MRSort metaheuristic learning algorithm

This following step represents one running of the metaheuristic algorithm. This execution learns a randomized model from a generated learning set (performance table and assignments of alternatives).

In [13]:
inst.num_model = 0 # the number of the current running
execution_time = inst.run_mrsort()
print("Time (s) : %f" % execution_time) # computational time of the running

The history saving thread hit an unexpected error (OperationalError('unable to open database file')).History will not be written to the database.

Exception in thread IPythonHistorySavingThread:
Traceback (most recent call last):
  File "/Users/pegdwendeminoungou/anaconda3/lib/python3.7/site-packages/IPython/core/history.py", line 834, in run
  File "</Users/pegdwendeminoungou/anaconda3/lib/python3.7/site-packages/decorator.py:decorator-gen-23>", line 2, in writeout_cache
  File "/Users/pegdwendeminoungou/anaconda3/lib/python3.7/site-packages/IPython/core/history.py", line 58, in needs_sqlite
  File "/Users/pegdwendeminoungou/anaconda3/lib/python3.7/site-packages/IPython/core/history.py", line 780, in writeout_cache
  File "/Users/pegdwendeminoungou/anaconda3/lib/python3.7/site-packages/IPython/core/history.py", line 764, in _writeout_input_cache
sqlite3.OperationalError: unable to open database file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pegdwendeminoungou/anaconda3/lib/python3.7/site-packages/ipykernel/iostream.py", line 97, in _event_pipe
Attribut

Traceback (most recent call last):
  File "/Users/pegdwendeminoungou/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3296, in run_code
  File "<ipython-input-13-a797cd8df882>", line 2, in <module>
    execution_time = inst.run_mrsort()
  File "/Users/pegdwendeminoungou/python_workspace/MRSort-jupyter-notebook/oso-pymcda/apps/random_model_generation_msjp.py", line 357, in run_mrsort
  File "/Users/pegdwendeminoungou/python_workspace/MRSort-jupyter-notebook/oso-pymcda/apps/../pymcda/learning/meta_mrsortvc4_impl8.py", line 481, in optimize
  File "/Users/pegdwendeminoungou/anaconda3/lib/python3.7/multiprocessing/context.py", line 102, in Queue
  File "/Users/pegdwendeminoungou/anaconda3/lib/python3.7/multiprocessing/queues.py", line 41, in __init__
  File "/Users/pegdwendeminoungou/anaconda3/lib/python3.7/multiprocessing/connection.py", line 517, in Pipe
OSError: [Errno 24] Too many open files

During handling of the above exception, another exception occurre

OSError: [Errno 24] Too many open files

We display the parameters of the model learned :

In [14]:
inst.model2.bpt.display()
inst.model2.cv.display()
print("lambda\t%.7s" % inst.model2.lbda) 

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "/Users/pegdwendeminoungou/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3296, in run_code
  File "<ipython-input-14-3f58ae61fc88>", line 1, in <module>
    inst.model2.bpt.display()
AttributeError: 'RandMRSortLearning' object has no attribute 'model2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pegdwendeminoungou/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2033, in showtraceback
AttributeError: 'AttributeError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pegdwendeminoungou/anaconda3/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1095, in get_records
  File "/Users/pegdwendeminoungou/anaconda3/lib/python3.7/site-packages/IPython/core/ultratb.py", line 313, in wrapped
  File 

AttributeError: 'RandMRSortLearning' object has no attribute 'model2'

We also display the learned preference directions : (+) for an increasing direction and (-) for a decreasing preference direction

In [None]:
print(list(inst.model2.criteria))

#### Step 6 : validate the learning of the random model

We can calculate the CA for the validation of the model regarding the learning set.

In [None]:
ca_v,cag_v = inst.eval_model_validation() # calculating the validation rate
print("validation rate : %f" % ca_v)

We can also draw the confusion matrix of the validation phase :

In [None]:
matrix = compute_confusion_matrix(inst.aa, inst.aa_learned, inst.model.categories) # construction of the confusion matrix
print_confusion_matrix(matrix, inst.model.categories) # printing the confusion matrix

#### Step 7 : test the learned algorithm on a benchmarch of alternatives examples

Analogously, we can calculate the CA for the test phase regarding a test set.

In [None]:
ao_tests,al_tests,ca_t,cag_t = inst.eval_model_test()
print("test rate : %f" % ca_t)

#### Step 8 : show the important results

In order to present generalized statistics, we need to carry out the algorithm runnings several times yielding *nb_models* learned models. To do so, we can straightforwardly execute :  

In [16]:
inst.run_mrsort_all_models()

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "/Users/pegdwendeminoungou/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3296, in run_code
  File "<ipython-input-16-58331af88069>", line 1, in <module>
    inst.run_mrsort_all_models()
  File "/Users/pegdwendeminoungou/python_workspace/MRSort-jupyter-notebook/oso-pymcda/apps/random_model_generation_msjp.py", line 616, in run_mrsort_all_models
  File "/Users/pegdwendeminoungou/python_workspace/MRSort-jupyter-notebook/oso-pymcda/apps/random_model_generation_msjp.py", line 723, in report_stats_parameters_csv
OSError: [Errno 24] Too many open files: '/Users/pegdwendeminoungou/python_workspace/MRSort-jupyter-notebook/rand_valid_test_na50_nca2_ncr5-0_dupl1//valid_test_dupl_meta_mrsort3-rand-50-2-5-1-20200427-215703.csv'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pegdwendeminoungou/anaconda3/lib/python3.7/site-packages/IPython/core/interactive

OSError: [Errno 24] Too many open files: '/Users/pegdwendeminoungou/python_workspace/MRSort-jupyter-notebook/rand_valid_test_na50_nca2_ncr5-0_dupl1//valid_test_dupl_meta_mrsort3-rand-50-2-5-1-20200427-215703.csv'

In [None]:
DATADIR

As a result, all the tests are done and we have also generated a csv file summarizing the tests and giving details on each one. This file is found on the directory *rand_valid_test_na100_nca2_ncr5-0_dupl1* visible from the root directory of this notebook. The file name begins with "valid_test_dupl...." .

Another csv file is the file that contains more compact data facilitating the drawing of different plots. This file is generated with the command line :

In [None]:
inst.report_plot_results_csv()

It yields a csv file, which name begins by "plot_results...." in the same directory as the previous file.

The final function of this section is the function that ouputs an instance of the learning algorithm (criteria, categories, performance tables and assignments, all codified in a customized syntax)

In [None]:
inst.build_osomcda_instance_random()

This output file is also in the previous directory as the previous files.

In [None]:
print("CA (validation) : " ,inst.stats_cav)
print("CA (generalization) : " ,inst.stats_cag)
print("CA (preference direction) : " ,inst.stats_capd)
print("Time execution (seconds) : " ,inst.stats_time)