implementation of MC4 and MCT Rank Aggregation algorithm using Python
This project is all about implementing two of the most popular rank aggregation algorithms, Markov Chain Type 4 or MC4 and MCT. In the field of Machine Learning and many other scientific problems, several items are often needed to be ranked based on some criterion. However, different ranking schemes order the items based on different preference criteria. Hence the rankings produced by them may differ greatly.
Therefore a rank aggregation technique is often used for combining the individual rank lists into a single aggregated ranking. Though there are many rank aggregation algorithms, MC4 and MCT are two of the most renowned ones.
Links to the original contents
For the latest release, pip install mc4
For a specific release, pip install mc4=={version}
such as pip install mc4==1.0.0
Using this package is very easy.
-
Prepare a dataset containing ranks of all the items provided by different algorithms. See here for sample datasets and more info.
-
Use following lines of code to use the package. Make sure to pass arguments according to your dataset otherwise answers will be incorrect.
from mc4.algorithm import mc4_aggregator
import pandas as pd
# Method 1
aggregated_ranks = mc4_aggregator('test_dataset_1.csv', header_row = 0, index_col = 0)
# or Method 2
df = pd.read_csv('test_dataset_1.csv', header = 0, index_col = 0)
aggregated_ranks = mc4_aggregator(df, header_row = 0, index_col = 0)
print(aggregated_ranks)
here test_dataset_1.csv
is a sample dataset containing ranks of different items provided by different algorithms.
mc4_aggregator
takes some mandatory and optional arguments -
algo (string)
: algorithm for rank aggregation,mc4
ormct
, default ismc4
order (string)
: order of the dataset,row
orcolumn
, default isrow
. More on this, here.header_row (int or None)
: row number of the dataset containing the header, default isNone
index_col (int or None)
: column number of the dataset containing the index, default isNone
precision (float)
: acceptable error margin for convergence, default is1e-07
iterations (int)
: number of iterations to reach stationary distribution, default is200
erg_number (float)
: small, positive number used to calculate ergodic transition matrix, default is0.15
You can directly use this package from command line if you have the dataset prepared already.
-
To get help and usage details,
~$ mc4_aggregator -h or --help
-
Use with default settings,
~$ mc4_aggregator dataset.csv
-
Specify the algorithm for rank aggregation using
-a
or--algo
, options:mc4
ormct
, default ismc4
~$ mc4_aggregator dataset.csv -a mct
-
Specify order using
-o
or--order
, options:row
orcolumn
, default isrow
~$ mc4_aggregator dataset.csv -o column
-
Specify header row using
-hr
or--header_row
, default isNone
~$ mc4_aggregator dataset.csv -hr 0
-
Specify index column using
-ic
or--index_col
, default isNone
~$ mc4_aggregator dataset.csv -ic 0
-
Specify precision using
-p
or--precision
, default is1e-07
~$ mc4_aggregator dataset.csv -p 0.000001
-
Specify iterations using
-i
or--iterations
, default is200
~$ mc4_aggregator dataset.csv -i 300
-
Specify ergodic number using
-e
or--erg_number
, default is0.15
~$ mc4_aggregator dataset.csv -e 0.20
-
All together,
~$ mc4_aggregator dataset.csv -a mct -o column -hr 0 -ic 0 -p 0.000001 -i 300 -e 0.20
Output of mc4_aggregator
will be a dictionary containing itemwise ranks. In absence of item names, items will be represented using integers.
- For full documentation, please visit Documentation Wiki
- To report issues or request a feature, please visit Issues
- Follow author, Ayan Kumar Saha - GitHub | LinkedIn