The code and dataset of IJCAI 2023 paper "A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram". We propose a new neural solver PGPSNet, fusing multi-modal information through structural and semantic pre-training, data augmentation, and self-limited decoding. We also construct a large-scale dataset PGPS9K labeled with both fine-grained diagram annotation and interpretable solution program. Our PGPSNet outperforms existing neural solvers significantly and also achieves comparable results as well-designed symbolic solvers.
- Python version: 3.8
- CUDA version: 10.2
- Other settings refer to requirements.txt
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt
For all experiments, we use one GTX-RTX GPU or two TITAN Xp GPUs for training.
You could download the dataset from Dataset Homepage and it will be available before the IJCAI2023 conference.
In default, unzip the dataset file to the fold ./datasets
.
As to structural and semantic pre-training, you could train the language model from scratch at here, and we also provide the pre-trained language model LM_MODEL.pth
at BaiduYun link (keyword: tkbd) or GoogleDrive link. In default, unzip the file to the fold ./
.
The default parameter configurations are set in the config file ./config/config_default.py
and the
default training modes are displayed in ./sh_files/train.sh
, for example,
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch \
--nproc_per_node=1 \
--master_port=$((RANDOM + 10000)) \
start.py \
--dataset Geometry3K \
--use_MLM_pretrain
You could choose dataset (Geometry3K / PGPS9K) and whether to use the pre-training language model. The training records of the PGPSNet are saved in the folder ./log
.
The default parameter configurations are set in the config file ./config/config_default.py
and the
default test modes are displayed in ./sh_files/test.sh
, for example,
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch \
--nproc_per_node=1 \
--master_port=$((RANDOM + 10000)) \
start.py \
--dataset Geometry3K \
--use_MLM_pretrain \
--evaluate_only \
--eval_method completion \
--resume_model log/*/best_model.pth
You could choose datasets (Geometry3K / PGPS9K), whether to use the pre-training language model, and evaluation methods (completion / choice / top3). The test records are also saved in the folder ./log
(The results of this code are 2% higher than those reported in the paper due to fine-tuning of hyperparameters).
If the paper, the dataset, or the code helps you, please cite papers in the following format:
@inproceedings{Zhang2023PGPS,
title = {A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram},
author = {Zhang, Ming-Liang and Yin, Fei and Liu, Cheng-Lin},
booktitle = {IJCAI},
year = {2023},
}
@inproceedings{Zhang2022PGDP,
title = {Plane Geometry Diagram Parsing},
author = {Zhang, Ming-Liang and Yin, Fei and Hao, Yi-Han and Liu, Cheng-Lin},
booktitle = {Proceedings of the Thirty-First International Joint Conference on
Artificial Intelligence, {IJCAI-22}},
pages = {1636--1643},
year = {2022},
month = {7},
doi = {10.24963/ijcai.2022/228},
}
@article{Hao2022PGDP5KAD,
title={PGDP5K: A Diagram Parsing Dataset for Plane Geometry Problems},
author={Yihan Hao and Mingliang Zhang and Fei Yin and Linlin Huang},
journal={2022 26th International Conference on Pattern Recognition (ICPR)},
year={2022},
pages={1763-1769}
}
Please let us know if you encounter any issues. You could contact with the first author (zhangmingliang2018@ia.ac.cn) or leave an issue in the github repo.