-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What inputs does a QUETCH model take? #2
Comments
Hi @warlock2k, the pre-processing is described in the README. It works with data provided by WMT14 and WMT15. If the data format changed since then, you need to adjust it accordingly to match it. The additional pre-processing that is mentions uses the preprocessing scripts of the Mosesdecoder and fast-align for token alignments. Please note that this implementation is based on a Theano version from 5y ago, so I don't know whether it will comply with newer versions. |
Thanks for the response. However, from a consumers perspective - please correct me If I am wrong: One needs to use WMT (source & target-translations) data to train the quetch model to generate a model and use this model with real MT output to generate a result. I wanted to know what this result contains - a tag file showing OK & BAD tags? |
Hi @warlock2k, one needs WMT QE data (source sentences, target sentences) as provided in the shared task, and token alignments, preprocessed as described in the README:
For testing, every MT output has to be processed in the same way. For each of the tokens, QUETCH will predict OK or BAD. |
Could you point me to relevant documentation? If not would you be kind enough to explain how QUETCH works with the WMT dataset and what kind of inputs are required. The documentation available online is vague and unclear.
The text was updated successfully, but these errors were encountered: