<a href="https://colab.research.google.com/github/fani-lab/OpeNTF/blob/main/ipynb/fair.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

`OpeNTF-Fair` via [`Adila`](https://github.com/fani-lab/Adila)

<p align="center"><img src='https://github.com/fani-lab/OpeNTF/blob/main/docs/figs/adila_.png?raw=true' width="400" ></p>

With [`Adila`](https://github.com/fani-lab/Adila)'s submodule in `OpeNTF`, we now have integrated fairness-aware reranking methods to rerank the final model's recommendation of experts for a team in order to dibiase `poularity` or `gender` disparities. To apply, `cmd=[..., fair]` and `fair.*` in [`./src/__config.yml`](https://github.com/fani-lab/OpeNTF/blob/main/src/__config__.yaml#L95C1-L110C24) should be set.

  - **fair**.`fgender`, contains the `female` member ids as the `minority` also `protected` group.

  > In `Adila`, we keep the labels of `minority` group for efficiency as they are very few relative to majority group. However, `protected` group could be the same as `minority` group, like in `gender` protected attribute, or the `majority` group (non-popular experts, who are more often than popular ones), like in `popularity` protected attribute.

  - **fair.**`algorithm`, list of reranking algorithms from `det_greedy`, `det_cons`, `det_relaxed`, `det_const_sort`, `fa-ir`
  - **fair.**`notion`, list of notions of fairness, from equality of odds (`eo`), demographic parity (`dp`)
  - **fair.**`attribute`, list of protected attributes from `popularity`, `gender`
  - **fair.**`is_popular_alg`, popularity status based on either `avg` teams per experts, i.e., whoever above this value is popular, or `auc` whoever is in the head of distribution figure, or both
  - **fair.**`k_max`, cutoff for the reranking algorithms
  - **fair.**`alpha`, the significance value for fa-ir algorithm
  - **fair.**`acceleration`, 'cpu' for `all` cpu cores but one, 'cpu:n' for `n` cores

```
fair:
  fgender: ../data/dblp/toy.dblp.v12.json.females.csv
  algorithm: [fa-ir, det_greedy, det_relaxed, det_const_sort, det_cons]
  notion: [eo,dp]
  attribute: [gender, popularity]
  is_popular_alg: [avg, auc]
  k_max: 5
  alpha: 0.1
  acceleration: 'cpu:1'
```


---


To **evaluate** the fair-accuracy trade-offs, fairness and accuracy metrics `before` and `after` applying reranking algorithms are calculated based on the `OpeNTF`'s evaluation settings in [`./src/__config.yml`](https://github.com/fani-lab/OpeNTF/blob/main/src/__config__.yaml#L84C1-L93C60)

- **eval.**`fair`, fairness measures `before` and `after` reranking using `ndkl` and `skew`
- **eval.**`topk`, cutoffs for measuring accuracy (utility) metrics
- **eval.**`trec`, measuring ranking-based accuracy (utility) metrics `before` and `after` reranking
- **eval.**`other`, measuring other utility metrics `before` and `after` reranking like `aucroc`, `skill_coverage`, ...
- **eval.**`per_epoch`, also apply rerankings on per-epoch predictions
- **eval.**`per_instance`, fairness and accuray metrics per each test team, needed for paired significance tests

```
eval:
  fair: [ndkl, skew]
  topk: '2,5,10'
  trec: [P_topk, recall_topk, ndcg_cut_topk, map_cut_topk, success_topk]
  other: [skill_coverage_topk, aucroc]
  per_epoch: 10
  per_instance: True
```

---


**Sample Command**
```
python main.py  "cmd=[prep,train,test,eval,fair]" \
                
                "models.instances=[mdl.rnd.Rnd]" \
                
                data.domain=cmn.publication.Publication \
                data.source=../data/dblp/toy.dblp.v12.json \
                data.output=../output/dblp/toy.dblp.v12.json \
                ~data.filter \
                
                fair.fgender=../data/dblp/toy.dblp.v12.json.females.csv \

```




**Setup & Quickstart**

From `OpeNTF`'s [`quickstart`](https://colab.research.google.com/github/fani-lab/OpeNTF/blob/main/ipynb/quickstart.ipynb) script:


In [1]:
# set up python 3.8
!sudo apt-get update -y
!sudo apt-get install -y python3.8 python3.8-venv python3.8-distutils python3-pip
!sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 10
!python --version

0% [Working]            Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease
0% [Waiting for headers] [Waiting for headers] [Connected to cloud.r-project.or                                                                               Get:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
                                                                               Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
0% [2 InRelease 24.3 kB/128 kB 19%] [3 InRelease 59.1 kB/129 kB 46%] [Connected                                                                               Get:4 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
0% [2 InRelease 67.7 kB/128 kB 53%] [3 InRelease 126 kB/129 kB 97%] [Waiting fo0% [2 InRelease 76.4 kB/128 kB 60%] [Waiting for headers] [Connected to ppa.lau                                                                               Get:5 https://cli.github.com/packages stabl

In [2]:
# get OpeNTF and Adila submodule
!rm -R opentf/
!git clone --recurse-submodules https://github.com/Fani-Lab/opentf
!pip install --upgrade pip setuptools
!pip install -r opentf/requirements.txt

rm: cannot remove 'opentf/': No such file or directory
Cloning into 'opentf'...
remote: Enumerating objects: 27105, done.[K
remote: Counting objects: 100% (280/280), done.[K
remote: Compressing objects: 100% (213/213), done.[K
remote: Total 27105 (delta 125), reused 159 (delta 66), pack-reused 26825 (from 3)[K
Receiving objects: 100% (27105/27105), 1.32 GiB | 20.94 MiB/s, done.
Resolving deltas: 100% (13396/13396), done.
Updating files: 100% (4379/4379), done.
Submodule 'src/Adila' (https://github.com/fani-lab/Adila.git) registered for path 'src/Adila'
Cloning into '/content/opentf/src/Adila'...
remote: Enumerating objects: 1903, done.        
remote: Counting objects: 100% (155/155), done.        
remote: Compressing objects: 100% (93/93), done.        
remote: Total 1903 (delta 110), reused 69 (delta 62), pack-reused 1748 (from 1)        
Receiving objects: 100% (1903/1903), 17.64 MiB | 16.87 MiB/s, done.
Resolving deltas: 100% (1038/1038), done.
Submodule path 'src/Adila': check

Collecting hydra-core==1.3.2 (from -r opentf/requirements.txt (line 3))
  Downloading hydra_core-1.3.2-py3-none-any.whl.metadata (5.5 kB)
Collecting scipy==1.10.1 (from -r opentf/requirements.txt (line 4))
  Downloading scipy-1.10.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (58 kB)
Collecting numpy==1.24.4 (from -r opentf/requirements.txt (line 5))
  Downloading numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.6 kB)
Collecting omegaconf<2.4,>=2.2 (from hydra-core==1.3.2->-r opentf/requirements.txt (line 3))
  Downloading omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)
Collecting antlr4-python3-runtime==4.9.* (from hydra-core==1.3.2->-r opentf/requirements.txt (line 3))
  Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting packaging (from hydra-core==1.3.2->-r opentf/requirements.txt (line 3))
  Downloading packaging-26.0-py3-none-any.whl.metadata (3.3 kB)
Col

In [3]:
%cd opentf/src/
!python main.py  "cmd=[prep,train,test,eval,fair]" "models.instances=[mdl.rnd.Rnd]" data.domain=cmn.publication.Publication data.source=../data/dblp/toy.dblp.v12.json data.output=../output/dblp/toy.dblp.v12.json ~data.filter

/content/opentf/src
[2026-02-19 18:13:43,036][cmn.team][INFO] - Loading teamsvecs matrices from ../output/dblp/toy.dblp.v12.json/teamsvecs.pkl ...
[2026-02-19 18:13:43,039][pkgmgr][INFO] - tqdm not found.
[2026-02-19 18:13:43,039][pkgmgr][INFO] - Installing tqdm...
[2026-02-19 18:13:44,395][pkgmgr][INFO] - Collecting tqdm==4.65.0
  Downloading tqdm-4.65.0-py3-none-any.whl.metadata (56 kB)
Downloading tqdm-4.65.0-py3-none-any.whl (77 kB)
Installing collected packages: tqdm
Successfully installed tqdm-4.65.0

[2026-02-19 18:13:44,400][cmn.team][INFO] - Loading indexes pickle from ../output/dblp/toy.dblp.v12.json/indexes.pkl ...
[2026-02-19 18:13:44,400][cmn.team][INFO] - Indexes pickle is loaded.
[2026-02-19 18:13:44,400][cmn.team][INFO] - Teamsvecs matrices and indexes for skills (31, 10), members (31, 13), and locations (31, 29) are loaded.
[2026-02-19 18:13:44,401][__main__][INFO] - Loading splits from ../output/dblp/toy.dblp.v12.json/splits.f3.r0.85.pkl ...
[2026-02-19 18:13:44,401][

**Output Folders and Files**

From `Adila`'s [`quickstart`](https://colab.research.google.com/github/fani-lab/Adila/blob/main/quickstart.ipynb) script:

- `adila/{popularity,gender}/stats.pkl`, the distribution of popular/non-popular, or male/female experts in the entire dataset
- `adila/{popularity,gender}/labels.csv`, the ids for popular or female experts. While the female ids are fixed and obtained from the `data.fgender` file, the popular ids depends on `is_popular_alg` and is calculated based on the data distribution

- `adila/{popularity,gender}/{notion}`, the subfolder containg the results based on the fairness notion
- `adila/{popularity,gender}/{notion}/{data.fpred}.{fair.algorithm}.{fair.k_max}.rerank.pred`, the reranked version of the recommended members from `data.fpred`
- `adila/{popularity,gender}/{notion}/{data.fpred}.{fair.algorithm}.{fair.k_max}.rerank.pred.eval.fair.instance.csv`, the fairness metric values for `data.fpred` (before) and the reranked version (after) per each team instance in the test set  
- `adila/{popularity,gender}/{notion}/{data.fpred}.{fair.algorithm}.{fair.k_max}.rerank.pred.eval.fair.mean.csv`, the average of fairness metric values for `data.fpred` (before) and the reranked version (after) over all teams of the test set  
- `adila/{popularity,gender}/{notion}/{data.fpred}.{fair.algorithm}.{fair.k_max}.rerank.pred.eval.utility.instance.csv`, the accuracy metric values for `data.fpred` (before) and the reranked version (after) per each team instance in the test set  
- `adila/{popularity,gender}/{notion}/{data.fpred}.{fair.algorithm}.{fair.k_max}.rerank.pred.eval.utility.mean.csv`, the average of accuracy metric values for `data.fpred` (before) and the reranked version (after) over all teams of the test set

In [4]:
!find ../output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000/adila/ -print | sed 's;[^/]*/;│   ;g;s;│   \([^│]\);├── \1;'

│   │   │   │   │   │   │   
│   │   │   │   │   │   ├── popularity.auc
│   │   │   │   │   │   │   ├── stats.pkl
│   │   │   │   │   │   │   ├── dp
│   │   │   │   │   │   │   │   ├── f0.test.pred.det_greedy.auc.5.rerank.pred.eval.fair.mean.csv
│   │   │   │   │   │   │   │   ├── f1.test.pred.det_relaxed.auc.5.rerank.pred
│   │   │   │   │   │   │   │   ├── f1.test.pred.det_const_sort.auc.5.rerank.pred.eval.fair.mean.csv
│   │   │   │   │   │   │   │   ├── f1.test.pred.det_const_sort.auc.5.rerank.pred.eval.utility.mean.csv
│   │   │   │   │   │   │   │   ├── f1.test.pred.det_cons.auc.5.rerank.pred.eval.fair.instance.csv
│   │   │   │   │   │   │   │   ├── f1.test.pred.det_relaxed.auc.5.rerank.pred.eval.utility.instance.csv
│   │   │   │   │   │   │   │   ├── f2.test.pred.det_cons.auc.5.rerank.pred.eval.utility.mean.csv
│   │   │   │   │   │   │   │   ├── f1.test.pred.det_const_sort.auc.5.rerank.pred.eval.utility.instance.csv
│   │   │   │   │   │   │   │   ├── f1.test.pred.fa-ir.auc.1

**Average fairness measures**






In [7]:
import pandas as pd
pd.read_csv('../output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000/adila/popularity.avg/eo/f0.test.pred.fa-ir.avg.10.5.rerank.pred.eval.fair.mean.csv', index_col=0)

Unnamed: 0_level_0,mean.after,mean.before
metrics,Unnamed: 1_level_1,Unnamed: 2_level_1
ndkl,0.171038,0.171038
skew.majority,0.025318,0.025318
skew.minority,-0.262364,-0.262364


**Fairness measures per each test team**

In [8]:
import pandas as pd
pd.read_csv('../output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000/adila/popularity.avg/eo/f0.test.pred.fa-ir.avg.10.5.rerank.pred.eval.fair.instance.csv')

Unnamed: 0,before.ndkl,after.ndkl,before.skew.minority,before.skew.majority,after.skew.minority,after.skew.majority
0,0.10035,0.10035,-0.262364,0.025318,-0.262364,0.025318
1,0.066075,0.066075,-0.262364,0.025318,-0.262364,0.025318
2,0.055013,0.055013,-0.262364,0.025318,-0.262364,0.025318
3,0.522018,0.522018,-0.262364,0.025318,-0.262364,0.025318
4,0.111735,0.111735,-0.262364,0.025318,-0.262364,0.025318


**Average accuracy values (Utility)**


In [9]:
import pandas as pd
pd.read_csv('../output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000/adila/popularity.avg/eo/f0.test.pred.fa-ir.avg.10.5.rerank.pred.eval.utility.mean.csv', index_col = 0)

Unnamed: 0_level_0,mean.before,mean.after
metric,Unnamed: 1_level_1,Unnamed: 2_level_1
P_2,0.2,0.2
P_5,0.12,0.12
P_10,0.14,0.14
recall_2,0.2,0.2
recall_5,0.3,0.3
recall_10,0.666667,0.666667
ndcg_cut_2,0.2,0.2
ndcg_cut_5,0.252814,0.252814
ndcg_cut_10,0.400669,0.400669
map_cut_2,0.15,0.15


**Accuracy values per each test team (Utility)**


In [10]:
import pandas as pd
pd.read_csv('../output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000/adila/popularity.avg/eo/f0.test.pred.fa-ir.avg.10.5.rerank.pred.eval.utility.instance.csv')

Unnamed: 0,P_2.before,P_5.before,P_10.before,recall_2.before,recall_5.before,recall_10.before,ndcg_cut_2.before,ndcg_cut_5.before,ndcg_cut_10.before,map_cut_2.before,...,ndcg_cut_10.after,map_cut_2.after,map_cut_5.after,map_cut_10.after,success_2.after,success_5.after,success_10.after,skill_coverage_2.after,skill_coverage_5.after,skill_coverage_10.after
0,0.0,0.0,0.1,0.0,0.0,0.5,0.0,0.0,0.19343,0.0,...,0.19343,0.0,0.0,0.0625,0.0,0.0,1.0,1.0,1.0,1.0
1,0.5,0.2,0.2,0.5,0.5,1.0,0.61315,0.61315,0.81753,0.5,...,0.81753,0.5,0.5,0.64286,1.0,1.0,1.0,1.0,1.0,1.0
2,0.0,0.0,0.1,0.0,0.0,0.33333,0.0,0.0,0.14804,0.0,...,0.14804,0.0,0.0,0.04167,0.0,0.0,1.0,1.0,1.0,1.0
3,0.0,0.2,0.1,0.0,0.5,0.5,0.0,0.26407,0.26407,0.0,...,0.26407,0.0,0.125,0.125,0.0,1.0,1.0,1.0,1.0,1.0
4,0.5,0.2,0.2,0.5,0.5,1.0,0.38685,0.38685,0.58028,0.25,...,0.58028,0.25,0.25,0.375,1.0,1.0,1.0,1.0,1.0,1.0
