Third-Party-Supervised-Wordaligner

This is the implementation of our work Third-Party Supervised Fine-tuning for Neural Word Alignments.

Introduction

This work offer an simple and effective way to boost the existed aligner.With the signals from others aligners, pretrained model achieved lower AER.

Usage

Data Preparation

To get the data used in our paper, you can follow the instructions in here.

Get Adapted Subword-level Supervised Alignment From Other Aligner

In our preliminary experiment,to better use the alignment from the third party aligner,you have to get the alignment in the subword level.You need to tokenizer the words into subwords which is used by the pretrained model to be finetuned.

Here we offer an simply version that tokenizer the word into subword.

Then follow the guidence for the third party aligner.

Note that, some aligners usually convert subword alignment results to word alignment results,but you shouldn't convert subword alignment to word alignment.

Here we offer an subword alignment result coming from Maskalign which is used to finetune the mbert. We extract the first 80000 texts from the Chinese English ldc corpus as examples of fine-tuning training set.

Finetune the pretrained model

Fine tune the pretrained model(mbert) by run this script.

Eval the result

After fine tune the pretrained-model, then you can evaluate model performance by run this script.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
8w		8w
scripts		scripts
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8w

8w

scripts

scripts

src

src

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Third-Party-Supervised-Wordaligner

Introduction

Usage

Data Preparation

Get Adapted Subword-level Supervised Alignment From Other Aligner

Finetune the pretrained model

Eval the result

About

Releases

Packages

Languages

License

sdongchuanqi/Third-Party-Supervised-Aligner

Folders and files

Latest commit

History

Repository files navigation

Third-Party-Supervised-Wordaligner

Introduction

Usage

Data Preparation

Get Adapted Subword-level Supervised Alignment From Other Aligner

Finetune the pretrained model

Eval the result

About

Resources

License

Stars

Watchers

Forks

Languages