Skip to content

esalesky/NYFNN

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

project code for 11-747

Single step postprocessing

./external_scripts/run-bleu-score.sh

Usage: run-bleu-score.sh [-d] [-h] output ref.txt ref.xml src.xml tgt_lang
Options:
    -d Detokenize the test file after removing bpe splits
    -h Display this help
    Note: tgt_lang should be written out, eg czech or english

useful preprocessing/postprocessing commands

to de-xml data directory (runs on all files in a dir, currently):

python3 data_xml_to_txt.py -d dir

to tokenize/detokenize English:

perl external_scripts/tokenizer.perl (-l [en|cs|...]) (-threads 4) < textfile > tokenizedfile
perl external_scripts/detokenizer.perl (-l [en|cs|...]) < tokenizedfile > detokenizedfile

to tokenize Czech (runs on all files in a dir, currently):

python3 morphology/run_czech_transform.py -d dir -y morphodita_dict -t morphodita_parser_file

to generate bpe:

./subword-nmt/learn_bpe.py -s {num_operations} < {train_file} > {codes_file}
./subword-nmt/apply_bpe.py -c {codes_file} < {test_file}

to de-bpe:

sed -r 's/(@@ )|(@@ ?$)//g'

to score with BLEU:

perl external_scripts/multi-bleu.perl -lc ref < hyp

About

project code for 11-747

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages