Original Paper
Master Thesis adding different Methods
This repo expands the official implementation of CoMER and adds
- Support for PyTorch 1.13 and PyTorch Lightning 1.19
- Improved Beam-Search with Pruning Methods from Freitag et al (Beam Search Strategies for Neural Machine Translation)
- With the addition of Constant Pruning
- Improves inference speeds on CROHME19 Trainingdata roughly 7-fold
- Self-Training Methods like FixMatch
- Calibration Methods via learnable Temperature Scaling and LogitNorm
- Multiple new Confidence Measures for further improving Calibration
- RandAug with two augmentation lists, the modified version being better suited for long formulae
- Support for different vocabularies
- Support for synthetic Pre-Training with a generated NTCIR12 MathIR Dataset
- Support for HME100K Dataset
- A partial-labeling heuristic to replace a hard threshold while filtering generated pseudo-labels
- Multi-GPU Evaluation support
- Evaluation with Augmentations
- Tools & Scripts to Visualize the data, Test the implementation and benchmark the modified beam-search
The Features are included in branches:
- feature/ssl, no helpers (visualization), no HME100K support
- feature/ssl_hme, no helpers (visualization)
- feature/ssl_helpers, no HME100K support
├── README.md
├── comer # model definition folder
├── convert2symLG # official tool to convert latex to symLG format
├── lgeval # official tool to compare symLGs in two folder
├── config.yaml # config for CoMER hyperparameter
├── data.zip
├── eval_all.sh # script to evaluate model on all CROHME test sets
├── example
│ ├── UN19_1041_em_595.bmp
│ └── example.ipynb # HMER demo
├── lightning_logs # training logs
│ └── version_0
│ ├── checkpoints
│ │ └── epoch=151-step=57151-val_ExpRate=0.6365.ckpt
│ ├── config.yaml
│ └── hparams.yaml
├── requirements.txt
├── scripts # evaluation scripts
├── setup.cfg
├── setup.py
└── train.py
cd CoMER
# install project
# python >= 3.7 required. Tested with 3.7 & 3.10
conda create -y -n CoMER python=3.7
conda activate CoMER
# install pytorch >= 1.8 & torchvision >= 0.2 with cudatoolkit / rocm.
conda install pytorch=1.8.1 torchvision=0.2.2 cudatoolkit=11.1 -c pytorch -c nvidia
pip install -e .
# evaluating dependency
conda install pandoc=1.19.2.1 -c conda-forge
Next, navigate to CoMER folder and run train.py
. It may take 7~8 hours on 4 NVIDIA 2080Ti gpus using ddp.
# train CoMER(Fusion) model using 2 gpus and ddp
python train.py -c config.yaml fit
You may change the config.yaml
file to train different models
# train BTTR(baseline) model
cross_coverage: false
self_coverage: false
# train CoMER(Self) model
cross_coverage: false
self_coverage: true
# train CoMER(Cross) model
cross_coverage: true
self_coverage: false
# train CoMER(Fusion) model
cross_coverage: true
self_coverage: true
For single gpu
usage, you may edit the config.yaml
:
accelerator: 'gpu'
devices: 0
For single cpu
user, you may edit the config.yaml
:
accelerator: 'cpu'
# devices: 0
Metrics used in validation during the training process is not accurate.
For accurate metrics reported in the paper, please use tools officially provided by CROHME 2019 organizer:
A trained CoMER(Fusion) weight checkpoint has been saved in lightning_logs/version_0
perl --version # make sure you have installed perl 5
unzip -q data.zip
# evaluation
# evaluate model in lightning_logs/version_0 on all CROHME test sets
# results will be printed in the screen and saved to lightning_logs/version_0 folder
bash eval_all.sh 0