First, have the OpusCleaner installed on your system.
Then, clone this repository and install the additional requirements (at this
point it's only urwid
beyond what you already need to install to get a
working install of OpusCleaner)
Set up the DATA_PATH
(and perhaps the SAMPLE_SIZE
) environment variables
(these are used by OpusCleaner as usual). Then, run the app with ./main.py
.
For example:
export DATA_PATH='/home/helcl/hplt/translation-models/en-cs/*.*.gz'
export SAMPLE_SIZE=100
cd path/to/clianer/
./main.py
Most of the controls are listed in the bottom bar of the app frame. However, there are some other controls depending the current application focus. Move focus between filter view and dataset view using left and right arrow.
These work independently or whether focus is in the filter view or in the dataset view.
- F2 opens up a new dataset
- F3 adds a new filter
- F6 show clean version of the data in the dataset view
- F7 assign categories to current dataset
- F10, q exit the application
- Down, Up move within the focused window (PgUp and PgDn also work)
- F4 edit filter
- F5 import filter pipeline from a different dataset (careful, this overwrites whatever is the current pipeline)
- F8 remove filter
- w, s move selected filter up or down
- d mark filter for diffing
- r reset diffing
- F4 show diff (select which filter steps to diff in the filter view)
- F5 show clean version of the data