[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/havel_hakimi/TLA/blob/main/Run_HTLA.ipynb)


# Install dependencies

In [None]:
! pip install transformers

# Clone the repository

In [None]:
! git clone https://github.com/havelhakimi/TLA.git

# Change the working directory to TLA

In [3]:
import os

os.chdir('/content/TLA')

# Script to run our HTLA model
* we have used fixed random seed with value 3

In [None]:
! python train.py --name='ckp_htla' --seed 3 --batch 10 --data='wos' --graph 1 --graph_type='GPTrans' --edge_dim 30 --tla 1 --tl_temp 0.07 --device cuda

Some Important arguments: </br>
- `--name` name of directory in which your model will be saved. For e.g. the above model will be saved in `./TLA/data/wos/ckp_bert`
- `--data` name of dataset directory which contains your data and related files
- `--graph` whether to use graph encoder
- `--graph_type` type of graph encoder. Possible choices are 'GCN, 'GAT', 'graphormer' and 'GPTrans'. HTLA uses GPTrans as the graph encoder
- `--edge_dim` edge feature size for GPTrans (We use 30 as edge feature size for each dataset )
- `--tla` whether Text-Label Alignment (TLA) Loss required or not. If set to 0, the model will be optimized only on BCE loss, which we refer to as BERT-GPTrans in the paper.
- `--tl_temp` Temperature value for the TLA loss (We use 0.07 as the temp. value for all datasets)
- `--device` set as 'cpu' or 'cuda' for running on CPU/GPU device

# In paper we have reported average score of 8 random runs with `--seed=None` or unfixed seeds
* In `train.py` set the `--seed=None` for multiple random runs

# Other arguments to tune the results and improve performance scores
<i>(We have not used these argumenst have not been used in our paper to avoid complexity due to hyper-parameter tuning)</i>

In the train.py the following arguments can be tuned
- `--norm`: Set to 1 to normalize embeddings before applying the TLA loss.
- `--proj`: Set to 1 to apply a transformation to text and label embeddings before applying the TLA loss. The transformation is an 2-hidden layer FFN given in criterion.py
- `--hsize`: size of hidden layer in the transformation
- `tl_wt`: Weight of the TLA loss component, which can be tuned within the range (0, 1].


# The `train.py` can be used to train otehr model variants by setting different arguments.

### For BERT (does flat multi-label classification)


In [None]:
! python train.py --name='ckp_bert' --batch 10 --data='wos' --graph 0 --device cuda

###  For BERT+GPTrans without TLA loss; (does Hierarchical Text Classification)


In [None]:
!python train.py --name='ckp_bertgptrans' --batch 10 --data='wos' --graph 1 --graph_type='GPTrans' --edge_dim 30 --tla 0