# Learning the parameters of a MRSort model using the metaheuristic algorithm (oso-pymcda)

## Introduction

The metaheuristic code at our disposal comes from github (https://github.com/oso/pymcda) developped by Olivier Sobrie). In addition to this code, we took into account mainly these adaptations : the learning a MRSort model from "a duplicated data set", the generation of statistic plots on the learning results. The code originally in Python 2 has been upgraded to some extent to Python 3. This code is located in the "oso-pymcda/" directory, which is in the same directory as this notebook.

## Settings

Before digging into the code, here are some requirements to have : 
   * The library matplotlib.pyplot need to be installed. This can be done with the command line below (preferably using pip  - that can be also installed following the instructions of this link : https://pip.pypa.io/en/stable/installing/):

In [1]:
pip install matplotlib

Note: you may need to restart the kernel to use updated packages.


   * Set up the Python API of CPLEX : https://www.ibm.com/support/knowledgecenter/SSSA5P_12.9.0/ilog.odms.cplex.help/CPLEX/GettingStarted/topics/set_up/Python_setup.html and follow the steps. The manual (here https://www.ibm.com/support/knowledgecenter/SSSA5P_12.9.0/ilog.odms.cplex.help/CPLEX/Python/topics/cplex_python_overview.html can help also.)
   

We need to set the global variable *DATADIR* so that it contains the right path from the root to this working directory  **MRSort-jupyter-notebook** . Here an example :

In [2]:
%env DATADIR /Users/pegdwendeminoungou/python_workspace/MRSort-jupyter-notebook

env: DATADIR=/Users/pegdwendeminoungou/python_workspace/MRSort-jupyter-notebook


## Description of the code


The code gathers mainly 2 parts : 
   * the first component on the generation, learning and tests of a parametered MRSort model (one running of the learning algorithm followed by tests),
   * the second component on the compilation of series of parametered MRSort runnings and the output of interesting statistic plots.

### The first component

First, we need to move to our working environment and run the main file in order to keep in memory the implementations of functions :

In [3]:
run oso-pymcda/apps/random_model_generation_msjp.py

Second, to achieve our goal we follow these steps : 
   * <u>Step 1</u> : initialize the required parameters,
   * <u>Step 2</u> : generate a new random MRSort model (profile, weights, threshold),
   * <u>Step 3</u> : generate randomly a set of alternatives and performance table,
   * <u>Step 4</u> : assign categories to these alternatives to yield a learning set,
   * <u>Step 5</u> : run the MRSort metaheuristic learning algorithm,
   * <u>Step 6</u> : validate the learning of the random model (% of classification "initial model VS learned model" on the learning set)
   * <u>Step 7</u> : test the learned algorithm on a benchmarch of alternatives examples
   * <u>Step 8</u> : show the important results (summarized also in a csv file)
   
   

#### Step 1 : initialize the required parameters

Here, we initialize the parameters of one running for the learning algorithm. Respectively, we have : 
   * *nb_categories* : the number of categories (classes)
   * *nb_criteria* : the number of criteria taken in consideration of the MCDA problem
   * *nb_alternatives* : the number of alternatives (for the learning set)
   * *dir_criteria* : the list of order/direction on preferences of the criteria  (1 for a criteria to maximize)
   * *l_dupl_criteria* : the list of criteria (indices) to duplicate during the learning process
   * *nb_tests* : the number of tests (number of alternatives) to carry out in order to compare the performance of the learned model regarding the initial model
   * *nb_models* : the number of models that independantly learn during one running of the learning algorithm
   * *meta_l* : the number of iteration of the metaheuristic algorithm (outer loop)
   * *meta_ll* : the number of iteration of the metaheuristic algorithm (inner loop)
   * *meta_nb_models* : the number of models (population) handled by the metaheuristic (evolutionary) algorithm during the learning process

In [4]:
nb_categories = 2 # fixed
nb_criteria = 5
nb_alternatives = 100
dir_criteria = [1]*nb_criteria # fixed to 1 for all criteria
l_dupl_criteria = list(range(nb_criteria))[:1]

# parameters of test
nb_tests = 10000
nb_models = 10

# parameters of the metaheuristic MRSort
meta_l = 10
meta_ll = 10
meta_nb_models = 10

Now we can create an instance of the one running of the learning algorithm as follows :

In [5]:
inst = RandMRSortLearning(nb_alternatives, nb_categories, 
        nb_criteria, dir_criteria, l_dupl_criteria, 
        nb_tests, nb_models, meta_l, meta_ll, meta_nb_models)

#### Step 2 to 4 : generate a new random MRSort model, alternatives and assignments

Here 3 steps are performs one after the other in the same function. We generate a new random MRSort, then we generate alternatives, and finally we assign these alternatives in 2 categories regarding the MRSort rule of the given model. In addition to these 3 operations, we introduce a coefficient that enable us to control the balance between the set of alternatives (number of alternatives) sorted in the categories.

In [6]:
inst.generate_random_instance()

We can have a look on the model that have been generated :
   * generated parameters of the model MRSort

In [7]:
inst.model.bpt.display() # display the limit profile of the random model b1

      c1    c2    c3    c4    c5 
b1 0.148 0.541 0.352 0.269 0.061 


In [8]:
inst.model.cv.display() # display the weights of each criteria of the model w

     c1    c2    c3    c4    c5 
w 0.319 0.087 0.012 0.478 0.103 


In [9]:
print("lambda\t%.7s" % inst.model.lbda) 

lambda	0.867


   * performance table of generated alternatives

In [10]:
inst.pt.display()

        c1    c2    c3    c4    c5 
a1   0.697 0.274 0.829  0.97 0.332 
a10  0.507 0.982 0.895  0.51 0.524 
a100 0.887 0.547 0.084 0.113 0.723 
a11  0.203 0.054 0.091 0.432 0.496 
a12   0.92 0.494 0.935 0.893 0.971 
a13  0.467 0.672 0.482 0.113 0.092 
a14  0.147 0.599 0.544 0.892 0.838 
a15  0.589 0.024 0.399 0.285 0.903 
a16   0.75 0.526 0.824 0.076 0.471 
a17  0.611 0.678 0.295 0.295 0.662 
a18  0.515 0.503 0.695 0.441 0.726 
a19  0.815 0.045  0.89 0.948 0.713 
a2   0.183 0.495 0.428 0.644 0.137 
a20  0.387 0.882 0.026 0.962 0.899 
a21  0.429 0.753 0.298   0.7 0.051 
a22  0.727 0.533  0.94 0.262 0.452 
a23  0.064 0.617 0.746  0.09 0.459 
a24   0.47 0.717 0.323 0.557 0.453 
a25  0.082 0.866 0.863 0.224 0.299 
a26  0.968 0.491 0.028 0.257 0.449 
a27  0.954 0.415 0.747  0.74 0.466 
a28  0.138 0.902 0.399 0.496 0.702 
a29  0.391 0.904 0.788 0.394 0.408 
a3   0.952 0.723 0.052 0.119 0.766 
a30  0.253 0.342 0.033  0.77 0.407 
a31  0.681 0.382 0.488  0.52 0.377 
a32  0.105 0.927 0.218 0.739

   * the result of the assignment of alternatives

In [11]:
inst.aa.display()

     category
a1       cat1
a10      cat1
a100     cat2
a11      cat1
a12      cat1
a13      cat2
a14      cat2
a15      cat1
a16      cat2
a17      cat1
a18      cat1
a19      cat1
a2       cat1
a20      cat1
a21      cat1
a22      cat2
a23      cat2
a24      cat1
a25      cat2
a26      cat2
a27      cat1
a28      cat2
a29      cat1
a3       cat2
a30      cat1
a31      cat1
a32      cat2
a33      cat1
a34      cat2
a35      cat1
a36      cat1
a37      cat2
a38      cat1
a39      cat1
a4       cat1
a40      cat2
a41      cat2
a42      cat1
a43      cat2
a44      cat2
a45      cat1
a46      cat1
a47      cat1
a48      cat1
a49      cat1
a5       cat2
a50      cat1
a51      cat2
a52      cat1
a53      cat2
a54      cat1
a55      cat2
a56      cat2
a57      cat2
a58      cat1
a59      cat1
a6       cat2
a60      cat2
a61      cat1
a62      cat1
a63      cat2
a64      cat2
a65      cat1
a66      cat1
a67      cat2
a68      cat1
a69      cat1
a7       cat2
a70      cat1
a71      cat1
a72   

#### Step 5: run the MRSort metaheuristic learning algorithm

This following step represents one model iteration of the metaheuristic algorithm. This iteration learns with a single model the initial model from the previous learning set (performance table and assignments of alternatives).

In [12]:
inst.num_model = 0 # initialization of the position of the model that is currently learning
execution_time = inst.run_mrsort()
print("Time (s) : %f" % execution_time) # computational time of the running

Time (s) : 0.097778


We show the parameters of the model learned :

In [13]:
inst.model2.bpt.display()
inst.model2.cv.display()
print("lambda\t%.7s" % inst.model2.lbda) 

        c1   c1d    c2      c3      c4    c5 
b1 0.14701 0.886 0.379 0.97901 0.26301 0.707 
      c1   c1d    c2    c3     c4    c5 
w 0.0001     0     0     0 0.9999     0 
lambda	1.0


#### Step 6 : validate the learning of the random model

We can calculate the rate of validation of the model (which is the percentage for the learned model to find the good classifications compared to assignments given by the original model) regarding the learning set.

In [14]:
ca_v,cag_v = inst.eval_model_validation() # calculating the validation rate
print("validation rate : %f" % ca_v)

validation rate : 1.000000


We can also draw the confusion matrix of the validation phase :

In [15]:
matrix = compute_confusion_matrix(inst.aa, inst.aa_learned, inst.model.categories) # construction of the confusion matrix
print_confusion_matrix(matrix, inst.model.categories) # printing the confusion matrix

     cat2 cat1 
cat2   40    0 
cat1    0   60 


#### Step 7 : test the learned algorithm on a benchmarch of alternatives examples

Analogously, we can calculate the test rate (which is the percentage for the learned model to find the good classification compared to right assignments given by the original model) regarding a test set.

In [16]:
ao_tests,al_tests,ca_t,cag_t = inst.eval_model_test()
print("test rate : %f" % ca_t)

test rate : 0.973800


#### Step 8 : show the important results

In order to show the final results, we need to achieve all the tests ; in fact, until now we only compute one learned model. Therefore, it is important to carry out the runnings and yield *nb_models* learned models. To do so, we can straightforwardly execute :  

In [17]:
inst.run_mrsort_all_models()

In [18]:
DATADIR

'/Users/pegdwendeminoungou/python_workspace/MRSort-jupyter-notebook'

As a result, all the tests are done and we have also generated a csv file summarizing the tests and giving details of each one. This file is found on the directory *rand_valid_test_na100_nca2_ncr5-0_dupl1* visible from the root directory of this notebook. The file name begins with "valid_test_dupl...." .

Another csv file is the file that contains more compact data facilitating the drawing of different plots. This file is generated with the command line :

In [19]:
inst.report_plot_results_csv()

It yields a csv file, which name begins with "plot_results...." in the same directory as the previous file.

The final function of this section is the function that ouputs a instance of the learning algorithm (criteria, categories, performance tables and assignments, all codified in a customized syntax)

In [20]:
inst.build_osomcda_instance_random()

'/Users/pegdwendeminoungou/python_workspace/MRSort-jupyter-notebook/rand_valid_test_na100_nca2_ncr5-0_dupl1//osomcda_rand-100-2-5-1-20191015-164258.csv'

This output file is also in the previous directory as the previous files.

### The second component

In this section, we will use what have been done in the previous section as a unit test, and then we will repeat it several times, varying different parameters.

We give here the call of a unit test : 

In [21]:
inst.learning_process()

In order to read in advance the implementation of the functions of this part, we run :

In [22]:
run oso-pymcda/apps/learning_random_models_results.py

At the beginning of the series of tests, some parameters must be set:

In [23]:
nb_categories = 2 #fixed
nb_criteria = 6

ticks_criteria = list(range(0,nb_criteria+1,2)) # ticks on plots results representing the number fo criteria
ticks_alternatives = list(range(50,200,50)) # ticks on plots results representing the number of alternatives

nb_tests = 10000
nb_models = 10

#Parameters of the metaheuristic MRSort
meta_l = 10
meta_ll = 10
meta_nb_models = 10
directory = DATADIR
output_dir = DATADIR + "/learning_results_plots"

Four variables have not been explained yet. These are :
   * *ticks_criteria* : the range of the ticks values corresponding to the number of duplicated criteria. One tick corresponds to a unit test made on a given number of duplicated criteria.
   * *ticks_alternatives* : the range of the ticks values of the number of alternatives taken into consideration in the multiple test process. One tick represents a number of alternatives taken into account on a unit test.

We can apply now a function that will compute series of unit tests according to the range of values of the parameters given.

We defined 3 types of functions to execute this bunch of tests with different experiment protocols :
   * ***exec_all_tests*** : it runs each unit test with a different random model and different set of alternatives.
   * **exec_all_tests2** : it runs each unit test with the same random model (same parameters), but with different set of alternatives.
   * **exec_all_tests3**  : it runs each unit test with the same random model and by progressively incrementing the sets of alternatives. For instance, after running a unit test with <u>n</u> alternatives, this procedure will keep these alternatives and add <u>n</u> new ones for the next unit test resulting a unit test with <u>2n</u> alternatives.

So, let's construct an instance of that sort and then run ***exec_all_tests*** :

In [24]:
tests_instance = MRSortLearningResults(directory, output_dir, nb_categories, nb_criteria,ticks_criteria,ticks_alternatives, \
                nb_tests, nb_models, meta_l, meta_ll, meta_nb_models)

In [25]:
tests_instance.exec_all_tests()

 ... unit test nb_alternatives = 50, nb_duplicated_criteria = 0
 ... unit test nb_alternatives = 50, nb_duplicated_criteria = 2
 ... unit test nb_alternatives = 50, nb_duplicated_criteria = 4
 ... unit test nb_alternatives = 50, nb_duplicated_criteria = 6
 ... unit test nb_alternatives = 100, nb_duplicated_criteria = 0
 ... unit test nb_alternatives = 100, nb_duplicated_criteria = 2
 ... unit test nb_alternatives = 100, nb_duplicated_criteria = 4
 ... unit test nb_alternatives = 100, nb_duplicated_criteria = 6
 ... unit test nb_alternatives = 150, nb_duplicated_criteria = 0
 ... unit test nb_alternatives = 150, nb_duplicated_criteria = 2
 ... unit test nb_alternatives = 150, nb_duplicated_criteria = 4
 ... unit test nb_alternatives = 150, nb_duplicated_criteria = 6


The results are compiled into several folders beginning by "rand_valid_test...". Each of them contains the result of a single unit test.

Finally and after having the results, we can run the program that shows the graphical representations.

In [26]:
tests_instance.plot_all_results()

The output is a folder named "learning_results_plots" containing different comparative plots.