To clean and tokenize a parallel corpus, use
nlp_preprocess clean_tok_para_corpus --help
To learn a subword tokenizer, use
nlp_preprocess learn_subword --help
To apply the learned subword tokenizer, user
nlp_preprocess apply_subword --help
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||
To clean and tokenize a parallel corpus, use
nlp_preprocess clean_tok_para_corpus --help
To learn a subword tokenizer, use
nlp_preprocess learn_subword --help
To apply the learned subword tokenizer, user
nlp_preprocess apply_subword --help