# What is OpeNTF?
OpeNTF is an open-source framework hosting large-scale training datasets and canonical neural team formation models that are trained using fairness-aware and time-sensitive methods.

## Prerequisite for OpeNTF
Before using OpeNTF, the following libraries are needed as a prerequisite, in addition to the libraries in `requirements.txt`:

In [None]:
pip install torch==1.9.0
pip install pytrec-eval-terrier==0.5.2
pip install gensim==3.8.3

In [None]:
git clone --recursive https://github.com/Fani-Lab/opentf
cd opentf
pip install -r requirements.txt

## Quickstart on OpeNTF

OpeNTF has the following required arguments:

- `-data`: the path of the input datasets.
- `-domain`: the domain the input dataset belongs in.
- `-model`: the neural team formation models to be used in the run.

As well, other optional arguments include:
- `-attribute`: the set of our sensitive attributes (e.g., popularity).
- `-fairness`: fairness metrics for reranking algorithms, used to minimize popularity bias.
- `-np-ratio`: desired ratio of non-popular experts after reranking.
- `-k_max`: cutoff for the reranking algorithms.
- `-filter`: remove outliers, if needed.
- `-future`: predict future, if needed.
- `-exp_id`: ID of the experiment.
- `-output`: path of the baseline output.

The following is a sample run of the OpeNTF codebase using a toy dataset `toy.dblp.v12.json`, which is modelled after the DBLP dataset: a dataset consisting of authorship and skill information on more than 4 million Computer Science research publications. Two neural models (`feedforward` and `Bayesian`) are used in this quickstart.

In [None]:
cd src
python -u main.py -data ../data/raw/dblp/toy.dblp.v12.json -domain dblp -model fnn bnn -fairness det_greedy -attribute popularity

## Setting Parameters
OpeNTF's codebase offers the following parameter to be set for each neural team formation methods:

### `model`
- Contains the baseline hyperparameters in the form of `'model-name' : { params }`, which allows the models to be integrated into the baseline with their unique parameters.
- Allows the customization of which stages of the system to be executed through `cmd`.
- Contains other training parameter for the models.

### `data`
- Contains parameters for manipulating datasets, including dataset filters (e.g., minimum team size) and bucket size for sparse matrix parallel generation.

### `fair`
- Contains parameters for the fairness metrics used in consideration during team formation.

A snippet of the parameters used in `param.py` is displayed as follows:

In [None]:
import random
import torch
import numpy as np

random.seed(0)
torch.manual_seed(0)
torch.cuda.manual_seed_all(0)

np.random.seed(0)

settings = {
    'model':{
        'baseline': {
            'random': {
                'b': 128
            },
            'fnn':{
                'l': [100],  # list of number of nodes in each layer
                'lr': 0.001,  # learning rate
                'b': 128,  # batch size
                'e': 10,  # epoch
                'nns': 3,  # number of negative samples
                'ns': 'none',  # 'none', 'uniform', 'unigram', 'unigram_b'
                'loss': 'SL',  # 'SL'-> superloss, 'DP' -> Data Parameters, 'normal' -> Binary Cross Entropy
            },
            'bnn':{
                'l': [128],  # list of number of nodes in each layer
                'lr': 0.1,  # learning rate
                'b': 128,  # batch size
                'e': 5,  # epoch
                'nns': 3,  # number of negative samples
                'ns': 'unigram_b',  # 'uniform', 'unigram', 'unigram_b'
                's': 1,  # # sample_elbo for bnn
                'loss': 'SL',  # 'SL'-> superloss, 'DP' -> Data Parameters, 'normal' -> Binary Cross Entropy
            },
        },
        'cmd': ['train', 'test', 'eval', 'fair'],  # 'train', 'test', 'eval', 'plot', 'agg', 'fair'
        'nfolds': 3,
        'train_test_split': 0.85,
        'step_ahead': 2,#for now, it means that whatever are in the last [step_ahead] time interval will be the test set!
    },
    'data':{
        'domain': {
            'dblp':{},
            'uspt':{},
            'imdb':{},
        },
        'location_type': 'country', #should be one of 'city', 'state', 'country' and represents the location of members in teams (not the location of teams)
        'filter': {
            'min_nteam': 5,
            'min_team_size': 2,
        },
        'parallel': 1,
        'ncore': 0,# <= 0 for all
        'bucket_size': 1000
    },
    'fair': {'np_ratio': None,
              'fairness': ['det_greedy',],
              'k_max': None,
              'fairness_metrics': {'ndkl'},
              'utility_metrics': {'map_cut_2,5,10'},
              'eq_op': False,
              'mode': 0,
              'core': -1,
              'attribute': ['gender', 'popularity']},
}

## Structure and Inheritance

<p align="center"><img src='../../../../src/cmn/dataset_hierarchy.png' width="500" ></p>
<p align="center"><img src='../../../../src/mdl/team_inheritance_hierarchy.png' width="500" ></p>