Skip to content

Split Broad Operational Language Translation corpus into train/dev/test set

Notifications You must be signed in to change notification settings

hankcs/bolt_splits

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

bolt_splits

Split Broad Operational Language Translation corpus into train/dev/test set.

The pseudo-code for splitting goes as follows:

For files in each genre:
  For files in each ext:
    For files in each length of filename:
      Sort files by filename
      Split files to trn, dev, tst with 8:1:1 ratio

About

Split Broad Operational Language Translation corpus into train/dev/test set

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages