RegHNT

Relational Graph enhanced Hybrid table-text Numerical reasoning model with Tree decoder.

This is the project containing source code for the paper Answering Numerical Reasoning Questions in Table-Text Hybrid Contents with Graph-based Encoder and Tree-based Decoder in COLING 2022.

Please kindly cite our work if you find our codes useful, thank you.

@inproceedings{lei2022answering,
  title={Answering Numerical Reasoning Questions in Table-Text Hybrid Contents with Graph-based Encoder and Tree-based Decoder},
  author={Lei, Fangyu and He, Shizhu and Li, Xiang and Zhao, Jun and Liu, Kang},
  booktitle={Proceedings of the 29th International Conference on Computational Linguistics},
  pages={1379--1390},
  year={2022}
}

requirements

To create an environment with conda and activate it.

conda create -n reghnt python==3.7
conda activate reghnt
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html     # Adjust according to your CUDA version
pip install allennlp==0.8.4 transformers==4.21.1 nltk==3.5 pandas==1.1.5 numpy==1.21.6
pip install dgl_cu110==0.6.0    # Adjust according your CUDA version
pip install sentencepiece

Next, you should install torch-scatter==2.0.5 (python3.7 CUDA11.1) by Download torch-scatter wheel. (already existed)

Or download other version according to your Python, Pytorch and CUDA version. Then move it to RegHNT/.

pip install torch_scatter-2.0.5-cp37-cp37m-linux_x86_64.whl

We adopt RoBERTa as our encoder to develop our RegHNT and use the following commands to prepare RoBERTa model

cd dataset_reghnt
mkdir roberta.large && cd roberta.large
wget -O pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-pytorch_model.bin
wget -O config.json https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json
wget -O vocab.json https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json
wget -O merges.txt https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt

Training

Preprocessing dataset

We use the preprocessed data by TagOp Model and they are already in this repository.

Prepare dataset

PYTHONPATH=$PYTHONPATH:$(pwd):$(pwd)/reg_hnt python reg_hnt/prepare_dataset.py --mode [train/dev/test]

Note: The result will be written into the folder ./reg_hnt/cache default.

Train

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=$PYTHONPATH:$(pwd) python reg_hnt/trainer.py --data_dir reg_hnt/cache/ \
--save_dir ./try --batch_size 48 --eval_batch_size 1 --max_epoch 100 --warmup 0.06 --optimizer adam --learning_rate 1e-4 \
--weight_decay 5e-5 --seed 42 --gradient_accumulation_steps 12 --bert_learning_rate 1e-5 --bert_weight_decay 0.01 \
--log_per_updates 50 --eps 1e-6 --encoder roberta_large --roberta_model dataset_reghnt/roberta.large

Evaluation

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=$PYTHONPATH:$(pwd) python reg_hnt/predictor.py --data_dir reg_hnt/cache/ \
--test_data_dir reg_hnt/cache/ --save_dir reg_hnt --eval_batch_size 1 --model_path ./try \
--encoder roberta_large --roberta_model dataset_reghnt/roberta.large --mode dev

python tatqa_eval.py --gold_path=dataset_reghnt/tatqa_dataset_dev.json --pred_path=reg_hnt/pred_result_on_dev.json

Testing

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=$PYTHONPATH:$(pwd) python reg_hnt/predictor.py \
--data_dir reg_hnt/cache/ --test_data_dir reg_hnt/cache/ --save_dir reg_hnt \
--eval_batch_size 1 --model_path ./try --encoder roberta_large --roberta_model dataset_reghnt/roberta.large --mode test

Note: The training process may take around 3 days using a single 24GB RTX3090.

Any Question?

For any issues please create an issue here or kindly email us at: Fangyu Lei 843265183@qq.com

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
dataset_reghnt		dataset_reghnt
reg_hnt		reg_hnt
README.md		README.md
__init__.py		__init__.py
tatqa_eval.py		tatqa_eval.py
tatqa_metric.py		tatqa_metric.py
tatqa_metric_test.py		tatqa_metric_test.py
tatqa_utils.py		tatqa_utils.py
tatqa_utils_test.py		tatqa_utils_test.py
torch_scatter-2.0.5-cp37-cp37m-linux_x86_64.whl		torch_scatter-2.0.5-cp37-cp37m-linux_x86_64.whl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RegHNT

requirements

Training

Preprocessing dataset

Prepare dataset

Train

Evaluation

Testing

Any Question?

About

Releases

Packages

Languages

lfy79001/RegHNT

Folders and files

Latest commit

History

Repository files navigation

RegHNT

requirements

Training

Preprocessing dataset

Prepare dataset

Train

Evaluation

Testing

Any Question?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages