<a href="https://colab.research.google.com/github/fani-lab/OpeNTF/blob/main/ipynb/fair.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

`OpeNTF-Fair` via [`Adila`](https://github.com/fani-lab/Adila)

<p align="center"><img src='https://github.com/fani-lab/OpeNTF/blob/main/docs/figs/adila_.png?raw=true' width="400" ></p>

With [`Adila`](https://github.com/fani-lab/Adila)'s submodule in `OpeNTF`, we now have integrated fairness-aware reranking methods to rerank the final model's recommendation of experts for a team in order to dibiase `poularity` or `gender` disparities. To apply, `cmd=[..., fair]` and `fair.*` in [`./src/__config.yml`](https://github.com/fani-lab/OpeNTF/blob/main/src/__config__.yaml#L95C1-L110C24) should be set.

  - **fair**.`fgender`, contains the `female` member ids as the `minority` also `protected` group.

  > In `Adila`, we keep the labels of `minority` group for efficiency as they are very few relative to majority group. However, `protected` group could be the same as `minority` group, like in `gender` protected attribute, or the `majority` group (non-popular experts, who are more often than popular ones), like in `popularity` protected attribute.

  - **fair.**`algorithm`, list of reranking algorithms from `det_greedy`, `det_cons`, `det_relaxed`, `det_const_sort`, `fa-ir`
  - **fair.**`notion`, list of notions of fairness, from equality of odds (`eo`), demographic parity (`dp`)
  - **fair.**`attribute`, list of protected attributes from `popularity`, `gender`
  - **fair.**`is_popular_alg`, popularity status based on either `avg` teams per experts, i.e., whoever above this value is popular, or `auc` whoever is in the head of distribution figure, or both
  - **fair.**`k_max`, cutoff for the reranking algorithms
  - **fair.**`alpha`, the significance value for fa-ir algorithm
  - **fair.**`acceleration`, 'cpu' for `all` cpu cores but one, 'cpu:n' for `n` cores

```
fair:
  fgender: ../data/dblp/toy.dblp.v12.json.females.csv
  algorithm: [fa-ir, det_greedy, det_relaxed, det_const_sort, det_cons]
  notion: [eo,dp]
  attribute: [gender, popularity]
  is_popular_alg: [avg, auc]
  k_max: 5
  alpha: 0.1
  acceleration: 'cpu:1'
```


---


To **evaluate** the fair-accuracy trade-offs, fairness and accuracy metrics `before` and `after` applying reranking algorithms are calculated based on the `OpeNTF`'s evaluation settings in [`./src/__config.yml`](https://github.com/fani-lab/OpeNTF/blob/main/src/__config__.yaml#L84C1-L93C60)

- **eval.**`fair`, fairness measures `before` and `after` reranking using `ndkl` and `skew`
- **eval.**`topk`, cutoffs for measuring accuracy (utility) metrics
- **eval.**`trec`, measuring ranking-based accuracy (utility) metrics `before` and `after` reranking
- **eval.**`other`, measuring other utility metrics `before` and `after` reranking like `aucroc`, `skill_coverage`, ...
- **eval.**`per_epoch`, also apply rerankings on per-epoch predictions
- **eval.**`per_instance`, fairness and accuray metrics per each test team, needed for paired significance tests

```
eval:
  fair: [ndkl, skew]
  topk: '2,5,10'
  trec: [P_topk, recall_topk, ndcg_cut_topk, map_cut_topk, success_topk]
  other: [skill_coverage_topk, aucroc]
  per_epoch: 10
  per_instance: True
```

---


**Sample Command**
```
python main.py  "cmd=[prep,train,test,eval,fair]" \
                
                "models.instances=[mdl.bnn.Bnn, mdl.emb.gnn.Gnn]" \
                
                data.domain=cmn.publication.Publication \
                data.source=../data/dblp/toy.dblp.v12.json \
                data.output=../output/dblp/toy.dblp.v12.json \
                ~data.filter \
                
                data.embedding.class_method=mdl.emb.gnn.Gnn_gs \
                "+data.embedding.model.gnn.graph.structure=[[[skill, to, team], [member, to, team], [loc, to, team]], stml]" \

                fair.fgender=../data/dblp/toy.dblp.v12.json.females.csv \

```




**Setup & Quickstart**

From `OpeNTF`'s [`quickstart`](https://colab.research.google.com/github/fani-lab/OpeNTF/blob/main/ipynb/quickstart.ipynb) script:


In [1]:
# set up python 3.8
!sudo apt-get update -y
!sudo apt-get install -y python3.8 python3.8-venv python3.8-distutils python3-pip
!sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 10
!python --version

0% [Working]            Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Get:2 https://cli.github.com/packages stable InRelease [3,917 B]
Get:3 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Get:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [2,202 kB]
Get:6 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Hit:7 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:8 https://cli.github.com/packages stable/main amd64 Packages [345 B]
Get:9 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:10 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ Packages [83.6 kB]
Get:11 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease [18.1 kB]
Hit:12 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:13 https://ppa.

In [5]:
# get OpeNTF and Adila submodule
!rm -R opentf/
!git clone --recurse-submodules https://github.com/Fani-Lab/opentf
!pip install --upgrade pip setuptools
!pip install -r opentf/requirements.txt

Cloning into 'opentf'...
remote: Enumerating objects: 26729, done.[K
remote: Counting objects: 100% (711/711), done.[K
remote: Compressing objects: 100% (341/341), done.[K
remote: Total 26729 (delta 440), reused 577 (delta 359), pack-reused 26018 (from 2)[K
Receiving objects: 100% (26729/26729), 1.32 GiB | 33.47 MiB/s, done.
Resolving deltas: 100% (13208/13208), done.
Updating files: 100% (4294/4294), done.
Submodule 'src/Adila' (https://github.com/fani-lab/Adila.git) registered for path 'src/Adila'
Cloning into '/content/opentf/src/Adila'...
remote: Enumerating objects: 1743, done.        
remote: Counting objects: 100% (301/301), done.        
remote: Compressing objects: 100% (179/179), done.        
remote: Total 1743 (delta 177), reused 170 (delta 99), pack-reused 1442 (from 2)        
Receiving objects: 100% (1743/1743), 17.51 MiB | 19.15 MiB/s, done.
Resolving deltas: 100% (930/930), done.
Submodule path 'src/Adila': checked out '48f3beb387725972660c08bdc78524923dcfe2ea'


In [7]:
%cd opentf/src/
!python main.py  "cmd=[prep,train,test,eval,fair]" "models.instances=[mdl.rnd.Rnd]" data.domain=cmn.publication.Publication data.source=../data/dblp/toy.dblp.v12.json fair.fgender=../data/dblp/toy.dblp.v12.json.females.csv data.output=../output/dblp/toy.dblp.v12.json ~data.filter data.embedding.class_method=mdl.emb.gnn.Gnn_gs "+data.embedding.model.gnn.graph.structure=[[[skill, to, team], [member, to, team], [loc, to, team]], stml]"

[Errno 2] No such file or directory: 'opentf/src/'
/content/opentf/src
[2025-12-11 03:51:07,504][cmn.team][INFO] - Loading teamsvecs matrices from ../output/dblp/toy.dblp.v12.json/teamsvecs.pkl ...
[2025-12-11 03:51:07,518][cmn.team][INFO] - Loading indexes pickle from ../output/dblp/toy.dblp.v12.json/indexes.pkl ...
[2025-12-11 03:51:07,519][cmn.team][INFO] - Indexes pickle is loaded.
[2025-12-11 03:51:07,519][cmn.team][INFO] - Teamsvecs matrices and indexes for skills (31, 10), members (31, 13), and locations (31, 29) are loaded.
[2025-12-11 03:51:07,520][__main__][INFO] - Loading splits from ../output/dblp/toy.dblp.v12.json/splits.f3.r0.85.pkl ...
[2025-12-11 03:51:07,521][cmn.team][INFO] - Loading member-skill co-occurrence matrix (13, 10) from ../output/dblp/toy.dblp.v12.json/splits.f3.r0.85/skillcoverage.pkl ...
[2025-12-11 03:51:07,577][pkgmgr][INFO] - torch not found.
[2025-12-11 03:51:07,577][pkgmgr][INFO] - Installing torch...
[2025-12-11 03:52:19,120][pkgmgr][INFO] - Collect

**Output Folders and Files**

From `Adila`'s [`quickstart`](https://colab.research.google.com/github/fani-lab/Adila/blob/main/quickstart.ipynb) script:

- `adila/{popularity,gender}/stats.pkl`, the distribution of popular/non-popular, or male/female experts in the entire dataset
- `adila/{popularity,gender}/labels.csv`, the ids for popular or female experts. While the female ids are fixed and obtained from the `data.fgender` file, the popular ids depends on `is_popular_alg` and is calculated based on the data distribution

- `adila/{popularity,gender}/{notion}`, the subfolder containg the results based on the fairness notion
- `adila/{popularity,gender}/{notion}/{data.fpred}.{fair.algorithm}.{fair.k_max}.rerank.pred`, the reranked version of the recommended members from `data.fpred`
- `adila/{popularity,gender}/{notion}/{data.fpred}.{fair.algorithm}.{fair.k_max}.rerank.pred.eval.fair.instance.csv`, the fairness metric values for `data.fpred` (before) and the reranked version (after) per each team instance in the test set  
- `adila/{popularity,gender}/{notion}/{data.fpred}.{fair.algorithm}.{fair.k_max}.rerank.pred.eval.fair.mean.csv`, the average of fairness metric values for `data.fpred` (before) and the reranked version (after) over all teams of the test set  
- `adila/{popularity,gender}/{notion}/{data.fpred}.{fair.algorithm}.{fair.k_max}.rerank.pred.eval.utility.instance.csv`, the accuracy metric values for `data.fpred` (before) and the reranked version (after) per each team instance in the test set  
- `adila/{popularity,gender}/{notion}/{data.fpred}.{fair.algorithm}.{fair.k_max}.rerank.pred.eval.utility.mean.csv`, the average of accuracy metric values for `data.fpred` (before) and the reranked version (after) over all teams of the test set

In [8]:
!find ../output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000/adila/ -print | sed 's;[^/]*/;│   ;g;s;│   \([^│]\);├── \1;'

│   │   │   │   │   │   │   
│   │   │   │   │   │   ├── gender
│   │   │   │   │   │   │   ├── dp
│   │   │   │   │   │   │   │   ├── f0.test.pred.fa-ir.10.5.rerank.pred.eval.utility.instance.csv
│   │   │   │   │   │   │   │   ├── f0.test.pred.fa-ir.10.5.rerank.pred.eval.utility.mean.csv
│   │   │   │   │   │   │   │   ├── f1.test.pred.fa-ir.10.5.rerank.pred.eval.fair.instance.csv
│   │   │   │   │   │   │   │   ├── f2.test.pred.fa-ir.10.5.rerank.pred.eval.fair.mean.csv
│   │   │   │   │   │   │   │   ├── f0.test.pred.fa-ir.10.5.rerank.pred
│   │   │   │   │   │   │   │   ├── f2.test.pred.fa-ir.10.5.rerank.pred.eval.utility.mean.csv
│   │   │   │   │   │   │   │   ├── f0.test.pred.fa-ir.10.5.rerank.pred.eval.fair.mean.csv
│   │   │   │   │   │   │   │   ├── f1.test.pred.fa-ir.10.5.rerank.pred
│   │   │   │   │   │   │   │   ├── f2.test.pred.fa-ir.10.5.rerank.pred
│   │   │   │   │   │   │   │   ├── f1.test.pred.fa-ir.10.5.rerank.pred.eval.fair.mean.csv
│   │   │   │   │   │   │   │  

**Average fairness measures**






In [9]:
import pandas as pd
pd.read_csv('../output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000/adila/popularity/eo/f0.test.pred.fa-ir.avg.10.5.rerank.pred.eval.fair.mean.csv', index_col=0)

Unnamed: 0_level_0,mean.after,mean.before
metrics,Unnamed: 1_level_1,Unnamed: 2_level_1
ndkl,0.088981,0.171038
skew.majority,0.025318,0.025318
skew.minority,-0.262364,-0.262364


**Fairness measures per each test team**

In [10]:
import pandas as pd
pd.read_csv('../output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000/adila/popularity/eo/f0.test.pred.fa-ir.avg.10.5.rerank.pred.eval.fair.instance.csv')

Unnamed: 0,before.ndkl,after.ndkl,before.skew.minority,before.skew.majority,after.skew.minority,after.skew.majority
0,0.10035,0.10035,-0.262364,0.025318,-0.262364,0.025318
1,0.066075,0.066075,-0.262364,0.025318,-0.262364,0.025318
2,0.055013,0.055013,-0.262364,0.025318,-0.262364,0.025318
3,0.522018,0.111735,-0.262364,0.025318,-0.262364,0.025318
4,0.111735,0.111735,-0.262364,0.025318,-0.262364,0.025318


**Average accuracy values (Utility)**


In [11]:
import pandas as pd
pd.read_csv('../output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000/adila/popularity/eo/f0.test.pred.fa-ir.avg.10.5.rerank.pred.eval.utility.mean.csv', index_col = 0)

Unnamed: 0_level_0,mean.before,mean.after
metric,Unnamed: 1_level_1,Unnamed: 2_level_1
P_2,0.2,0.2
P_5,0.12,0.12
P_10,0.14,0.14
recall_2,0.2,0.2
recall_5,0.3,0.3
recall_10,0.666667,0.666667
ndcg_cut_2,0.2,0.2
ndcg_cut_5,0.252814,0.252814
ndcg_cut_10,0.400669,0.400669
map_cut_2,0.15,0.15


**Accuracy values per each test team (Utility)**


In [12]:
import pandas as pd
pd.read_csv('../output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000/adila/popularity/eo/f0.test.pred.fa-ir.avg.10.5.rerank.pred.eval.utility.instance.csv')

Unnamed: 0,P_2.before,P_5.before,P_10.before,recall_2.before,recall_5.before,recall_10.before,ndcg_cut_2.before,ndcg_cut_5.before,ndcg_cut_10.before,map_cut_2.before,...,ndcg_cut_10.after,map_cut_2.after,map_cut_5.after,map_cut_10.after,success_2.after,success_5.after,success_10.after,skill_coverage_2.after,skill_coverage_5.after,skill_coverage_10.after
0,0.0,0.0,0.1,0.0,0.0,0.5,0.0,0.0,0.19343,0.0,...,0.19343,0.0,0.0,0.0625,0.0,0.0,1.0,1.0,1.0,1.0
1,0.5,0.2,0.2,0.5,0.5,1.0,0.61315,0.61315,0.81753,0.5,...,0.81753,0.5,0.5,0.64286,1.0,1.0,1.0,1.0,1.0,1.0
2,0.0,0.0,0.1,0.0,0.0,0.33333,0.0,0.0,0.14804,0.0,...,0.14804,0.0,0.0,0.04167,0.0,0.0,1.0,1.0,1.0,1.0
3,0.0,0.2,0.1,0.0,0.5,0.5,0.0,0.26407,0.26407,0.0,...,0.26407,0.0,0.125,0.125,0.0,1.0,1.0,1.0,1.0,1.0
4,0.5,0.2,0.2,0.5,0.5,1.0,0.38685,0.38685,0.58028,0.25,...,0.58028,0.25,0.25,0.375,1.0,1.0,1.0,1.0,1.0,1.0
