What inputs does a QUETCH model take? #2

warlock2k · 2020-04-12T10:58:03Z

Could you point me to relevant documentation? If not would you be kind enough to explain how QUETCH works with the WMT dataset and what kind of inputs are required. The documentation available online is vague and unclear.

juliakreutzer · 2020-04-14T19:01:37Z

Hi @warlock2k, the pre-processing is described in the README. It works with data provided by WMT14 and WMT15. If the data format changed since then, you need to adjust it accordingly to match it. The additional pre-processing that is mentions uses the preprocessing scripts of the Mosesdecoder and fast-align for token alignments.

Please note that this implementation is based on a Theano version from 5y ago, so I don't know whether it will comply with newer versions.
For an up-to-date implementation in PyTorch, please use OpenKiwi.

warlock2k · 2020-04-15T11:19:52Z

Thanks for the response. However, from a consumers perspective - please correct me If I am wrong:

One needs to use WMT (source & target-translations) data to train the quetch model to generate a model and use this model with real MT output to generate a result.

I wanted to know what this result contains - a tag file showing OK & BAD tags?

juliakreutzer · 2020-04-16T03:05:05Z

Hi @warlock2k, one needs WMT QE data (source sentences, target sentences) as provided in the shared task, and token alignments, preprocessed as described in the README:

training source data, lowercased: WMT15-data/task2_en-es_train_comb/train.source.lc.comb: 0 0 we *, i.e. the sentence id, the word id, the source word and a placeholder

training target data, combined with features, lowercased: WMT15-data/task2_en-es_train_comb/train.target.lc.comb.feat: 0 0 sólo OK 6.0 5.0 1.2 sólo start utilizamos only we use 0 0 1 0 0, i.e. sentence id, word id, target word, word-level label, and features (here: WMT15 baseline features). The use of the features is optional and not required for the QUETCH model.

source to target alignments: WMT15-data/task2_en-es_train_comb/train.align: 0 1-0 2-1 3-2 4-3 5-4, i.e. the sentence id separated with a tab from the source-target alignment indices.

For testing, every MT output has to be processed in the same way. For each of the tokens, QUETCH will predict OK or BAD.
The exact output format is specified here: https://github.com/juliakreutzer/quetch/blob/master/src/QUETCH.py#L75 and here https://github.com/juliakreutzer/quetch/blob/master/src/QUETCH.py#L104, depending on the task (WMT14 or WMT15).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What inputs does a QUETCH model take? #2

What inputs does a QUETCH model take? #2

warlock2k commented Apr 12, 2020 •

edited

Loading

juliakreutzer commented Apr 14, 2020

warlock2k commented Apr 15, 2020

juliakreutzer commented Apr 16, 2020 •

edited

Loading

What inputs does a QUETCH model take? #2

What inputs does a QUETCH model take? #2

Comments

warlock2k commented Apr 12, 2020 • edited Loading

juliakreutzer commented Apr 14, 2020

warlock2k commented Apr 15, 2020

juliakreutzer commented Apr 16, 2020 • edited Loading

warlock2k commented Apr 12, 2020 •

edited

Loading

juliakreutzer commented Apr 16, 2020 •

edited

Loading