OpenMT : A Simple and Open Source Machine Translation Experiment Auxiliary Tools

Installation

All dependencies can be installed via:

pip3 install -r requirements.txt

Basic Environments

mkdir "Envs"
cd "Envs"

# fairseq
echo setup fairseq...
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

# moses
echo setup moses...
git clone https://github.com/moses-smt/mosesdecoder.git

MT-Scripts

Dictionary Generate: Generate new dictionary for prepared bi-text

python3 Scripts/generate_dict.py dicts_path

Example: 
# generate dict for pair data in Dictionaries/pair_dict
python3 Scripts/generate_dict.py Dictionaries/pair_dict

Sentences AA: Using bi-text Dictionary to translate a lang to another lang, , Note that the proportion is 100% in our script, you can design it in the script easily

python3 Scripts/add_AA.py src_lang tgt_lang dicts_path input_file output_file

Example: 
# translate train.en_XX to train.ro_RO by Dictionaries/pair_dict
python3 Scripts/add_AA.py EN RO Dictionaries/pair_dict train.en_XX train.ro_RO

Data Split: Divide source and target corpus to train, valid and test data, Note that valid and test data are both 2k, you can design it in the script easily

python3 Scripts/split_data.py src_lang tgt_lang src_corpus tgt_corpus

Example: 
# divide corpus.en and corpus.ro to train, valid and test data 
# valid and test data are both 2k
python3 Scripts/split_data.py en_XX ro_RO corpus.en corpus.ro

Sentencepiece Sub-word: Prepare the Sentencepiece Sub-word for experimental data

python3 Scripts/spm.py model_path < input_file > output_file

Example: 
SPM=spm.py
MODEL=mbart.cc25/sentence.bpe.model
DATA=prepare_data
TRAIN=train
VALID=valid
TEST=test
SRC=en_XX
TGT=ro_RO

python3 ${SPM} ${MODEL} < ${DATA}/${TRAIN}.${SRC} > ${DATA}/${TRAIN}.spm.${SRC} 
python3 ${SPM} ${MODEL} < ${DATA}/${TRAIN}.${TGT} > ${DATA}/${TRAIN}.spm.${TGT} 
python3 ${SPM} ${MODEL} < ${DATA}/${VALID}.${SRC} > ${DATA}/${VALID}.spm.${SRC} 
python3 ${SPM} ${MODEL} < ${DATA}/${VALID}.${TGT} > ${DATA}/${VALID}.spm.${TGT} 
python3 ${SPM} ${MODEL} < ${DATA}/${TEST}.${SRC} > ${DATA}/${TEST}.spm.${SRC} 
python3 ${SPM} ${MODEL} < ${DATA}/${TEST}.${TGT} > ${DATA}/${TEST}.spm.${TGT}

MT-Evaluation

To run the Python scripts and calculate the MT evaluation metrics on your machine translation output, you need to have two files:

ref.txt : It is the human translation (target) file of your test dataset.
hyp.txt: It is the MTed translation/prediction, generated by the machine translation model for the source of the same test dataset used for “Reference”.

Corpus BLEU: Calculates the BLEU score for the whole corpus and prints the result.

python3 MT-Evaluation/BLEU/compute-bleu.py ref.txt hyp.txt

Sentence BLEU: Calculates the BLEU score for sentence by sentence and saves the result to a file.

python3 MT-Evaluation/BLEU/compute-bleu-sentence.py ref.txt hyp.txt

Corpus TER: Calculates the TER score for the whole corpus and prints the result.

python3 MT-Evaluation/TER/compute-ter.py ref.txt hyp.txt

Sentence TER: Calculates the TER score for sentence by sentence and saves the result to a file.

python3 MT-Evaluation/TER/compute-ter-sentence.py ref.txt hyp.txt

Corpus CHRF: Calculates the CHRF score for the whole corpus and prints the result.

python3 MT-Evaluation/CHRF/compute-chrf.py ref.txt hyp.txt

Sentence CHRF: Calculates the CHRF score for sentence by sentence and saves the result to a file.

python3 MT-Evaluation/CHRF/compute-chrf-sentence.py ref.txt hyp.txt

Sentence METEOR: Note that METEOR works on the sentence level only.

python3 MT-Evaluation/METEOR/sentence-meteor.py ref.txt hyp.txt

Corpus WER: Calculates the WER score for the whole corpus and prints the result.

python3 MT-Evaluation/WER/corpus-wer.py ref.txt hyp.txt

Sentence WER: Calculate the WER score for sentence by sentence and saves the result to a file.

python3 MT-Evaluation/WER/sentence-wer.py ref.txt hyp.txt

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Corpus		Corpus
Dictionaries		Dictionaries
Examples		Examples
MT-Evaluation		MT-Evaluation
Scripts		Scripts
Wiktionary		Wiktionary
README.md		README.md
get_wiktionary.py		get_wiktionary.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corpus

Corpus

Dictionaries

Dictionaries

Examples

Examples

MT-Evaluation

MT-Evaluation

Scripts

Scripts

Wiktionary

Wiktionary

README.md

README.md

get_wiktionary.py

get_wiktionary.py

requirements.txt

requirements.txt

Repository files navigation

OpenMT : A Simple and Open Source Machine Translation Experiment Auxiliary Tools

Installation

Basic Environments

MT-Scripts

MT-Evaluation

About

Releases

Packages

Languages

junchaoIU/OpenMT

Folders and files

Latest commit

History

Repository files navigation

OpenMT : A Simple and Open Source Machine Translation Experiment Auxiliary Tools

Installation

Basic Environments

MT-Scripts

MT-Evaluation

About

Topics

Resources

Stars

Watchers

Forks

Languages