## Quality Estimation for Machine/Human Translation (Supervised Approach)

**Author:** Jessica Silva

**Keywords:** Quality Estimation; Machine Translation; Word-level; Sentence-level

**Date:** 20/07/2020

## SUMMARY <a class="tocSkip">

### Context <a class="tocSkip">

The aim of this research is to understand and evaluate the quality estimation task for machine/human translation. The goal of this task is to assess the quality of a translation without access to reference translations.

### Questions <a class="tocSkip">

1) Which strategies can be used for measuring quality of translations? In which cases can they be applied? 

2) Which kind of data and how much data is necessary to train this approaches? 

3) Gains with the new approach.

### Main Outcomes <a class="tocSkip">

1) Which strategies can be used for measuring quality of machine translations? In which cases can they be applied? 

-

2) Which kind of data and how much data is necessary to train this approaches?

-

3) Gains with the new approach.

-

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc">
	<ul class="toc-item">
		<li>
			<span><a href="#Problem-Statement" data-toc-modified-id="Problem-Statement-1">
				<span class="toc-item-num">1&nbsp;&nbsp;</span>Problem Statement</a>
			</span>
		</li>
		<li>
			<span><a href="#Experimental-Setup" data-toc-modified-id="Experimental-Setup-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Experimental Setup</a></span>
		</li>
		<li>
			<span><a href="#Predictor-Estimator" data-toc-modified-id="Predictor-Estimator-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Predictor-Estimator</a></span>
			<ul class="toc-item">
				<li>
					<span><a href="#Dataset" data-toc-modified-id="Dataset-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Dataset</a></span>
				</li>
                <li>
					<span><a href="#TrainPredict" data-toc-modified-id="TrainPredict-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Train and Predict</a></span>
				</li>
                <li>
                    <span><a href="#Evaluation" data-toc-modified-id="Evaluation-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Evaluation</a></span>
                </li>
			</ul>
		</li>
		<li>
			<span><a href="#Benchmarks" data-toc-modified-id="Benchmarks-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Benchmarks</a></span>
		</li>
		<li>
			<span><a href="#Future-Work" data-toc-modified-id="Future-Work-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Future Work</a></span>
		</li>
	</ul>
</div>

## 1 - Problem Statement

Assess the quality of a translation system/human without access to reference translations

## 2 - Experimental Setup 

Before being able to run Quality Estimation Tutorial, there is a small setup required. Please check the README.md before continue

# 3 - Predictor-Estimator

The Predictor-Estimator architecture can measure the quality in the word and sentence level:

* **Word-level**:
The goal of the word-level QE task using the Predictor-Estimator is to assign quality labels (OK or BAD) to each translated word, as well as to gaps between words (to account for context that needs to be inserted), and source words (to denote words in the original sentence that have been mistranslated or omitted in the target).

* **Sentence-level**:
The goal of the Sentence-level QE task using the Predictor-Estimator is to predict the quality of the whole translated sentence, based on how many edit operations are required to fix it, in terms of HTER (Human Translation Error Rate).

### Architecture

* Predictor: trained to predict each token of the target sentence given the source and the left and right context of the target sentence (one biLSTM)

* Estimator: takes features produced by the predictor and uses them to classify each word as OK or BAD (two LSTMs)

* Multi-task architecture for sentence-level HTER scores

<img src='images/predictor-estimator.png' width='400'>

[Paper](https://www.aclweb.org/anthology/W17-4763.pdf)

## 3.1 - Dataset

For the Quality Estimation task, we need two types of corpus, one to train the Predictor model and another to train the Estimator model.

### Predictor data

The Predictor is trained to predict each token of the target sentence given the source and the left and right context of the target sentence (one biLSTM)

* **Format**: ( _src, tgt_ )

_src: Sentences in the source language_

_tgt: Sentences in the target language_

* **tags**: _no tags (raw data)_

<img src='images/dataset-table1.png' width='500'>

### Estimator data

The Estimator takes features produced by the predictor and uses them to classify each word as OK or BAD (two LSTMs) and to predict HTER.

* **Format**: ( _src, mt, pe_ )

_src: Sentences in the source language_

_mt: Machine translated sentences (target language)_

_pe: post-edited sentences (target language)_

* **tags**: _binary tags (OK and BAD), HTER score_

<img src='images/dataset-table2.png' width='800'>

**Ways to create the post-edited data**
* By human translators
* By Automatic post-editing systems




## 3.2 - Train and Predict

In [3]:
from src import utils
import yaml
import kiwi
from ipywidgets import interact, fixed, Textarea
from functools import partial
%load_ext yamlmagic

[nltk_data] Downloading package punkt to
[nltk_data]     /home/pedrobalage/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [6]:
!wget https://github.com/Unbabel/KiwiCutter/releases/download/v1.0/estimator_en_de.torch.zip -P ../data/interim/

--2020-08-25 16:15:41--  https://github.com/Unbabel/KiwiCutter/releases/download/v1.0/estimator_en_de.torch.zip
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/207818502/4a7c2b80-dab8-11e9-8d9f-716248c800da?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200825%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200825T151410Z&X-Amz-Expires=300&X-Amz-Signature=dcc1e0a1ddb2f46cef3c48a15a792d1924e22a216468e3a5e9eea9d4a1ace5e6&X-Amz-SignedHeaders=host&actor_id=0&repo_id=207818502&response-content-disposition=attachment%3B%20filename%3Destimator_en_de.torch.zip&response-content-type=application%2Foctet-stream [following]
--2020-08-25 16:15:41--  https://github-production-release-asset-2e65be.s3.amazonaws.com/207818502/4a7c2b80-dab8-11e9-8d9f-716248c800da?X-Amz-Algorithm=AWS4-HM

In [14]:
!unzip ../data/interim/estimator_en_de.torch.zip -d ../data/interim/

Archive:  ../data/interim/estimator_en_de.torch.zip
  inflating: ../data/interim/estimator_en_de.torch  


In [16]:
model = kiwi.load_model('../data/interim/estimator_en_de.torch')

In [17]:
source = ['to convert a smooth point to a corner point without direction lines , click the smooth point .']
target = ['soll ein Übergangspunkt in einen Eckpunkt ohne Grifflinien umgewandelt werden , klicken Sie auf den Glättungspunkt .']
examples = {'source': source,'target': target}

In [18]:
predictions = model.predict(examples)

In [19]:
predictions

{'tags': [[0.06895697116851807,
   0.4229596257209778,
   0.21621333062648773,
   0.05693240091204643,
   0.14355583488941193,
   0.08891278505325317,
   0.2814420461654663,
   0.7454012036323547,
   0.45772698521614075,
   0.21983414888381958,
   0.0320119746029377,
   0.04847194254398346,
   0.024298403412103653,
   0.2455146610736847,
   0.25810056924819946,
   0.6393271684646606,
   0.011077080853283405]],
 'gap_tags': [[0.005333846900612116,
   0.5178709030151367,
   0.010363646782934666,
   0.2994779050350189,
   0.009245308116078377,
   0.0038670392241328955,
   0.33894485235214233,
   0.3496887683868408,
   0.056504152715206146,
   0.0010272195795550942,
   0.012129525654017925,
   0.02781379036605358,
   0.00022671371698379517,
   0.16732671856880188,
   0.019436508417129517,
   0.18383103609085083,
   0.09625448286533356,
   0.008517248556017876]],
 'sentence_scores': [0.1539330631494522]}

In [20]:
SOURCE = Textarea(value=source[0], layout={'width': '90%'})
MT = Textarea(value=target[0], layout={'width': '90%'})
_interact = interact(utils.KiwiViz, model=fixed(model), source=SOURCE, mt=MT, threshold=(0.0, 1.0))

HTER: 0.14097408950328827


 <span style='color:green'>soll</span> <span style='color:green'>ein</span> <span style='color:red'>*Übergangspunkt*</span> <span style='color:green'>in</span> <span style='color:green'>einen</span> <span style='color:green'>Eckpunkt</span> <span style='color:green'>ohne</span> <span style='color:red'>*Grifflinien*</span> <span style='color:green'>umgewandelt</span> <span style='color:green'>werden</span> <span style='color:green'>,</span> <span style='color:green'>klicken</span> <span style='color:green'>Sie</span> <span style='color:green'>auf</span> <span style='color:green'>den</span> <span style='color:red'>*Glättungspunkt*</span> <span style='color:green'>.</span>

## 3.3 - Evaluation

## Readings

I suggest the following complementary readings on Quality Estimation

* A really good [book](https://www.morganclaypool.com/doi/abs/10.2200/S00854ED1V01Y201805HLT039) of Quality Estimation from Lucia Specia et al.
* [Predictor-Estimator architecture paper](https://www.aclweb.org/anthology/W17-4763.pdf)
* The [main conference on Machine Translation](http://www.statmt.org/wmt20/) and the [Quality Estimation](http://www.statmt.org/wmt20/quality-estimation-task.html) shared task.

## References

[1] [Quality Estimation for Machine Translation](https://www.morganclaypool.com/doi/abs/10.2200/S00854ED1V01Y201805HLT039) 

[2] [Unsupervised Quality Estimation for Neural Machine Translation](https://arxiv.org/abs/2005.10608)

[3] [Unbabel's Participation in the WMT19 Translation Quality Estimation Shared Task](https://arxiv.org/abs/1907.10352)

[4] [OpenKiwi: An Open Source Framework for Quality Estimation](https://arxiv.org/abs/1902.08646)

[5] [Quality In, Quality Out: Learning from Actual Mistakes](https://fredblain.org/transfer-learning-qe.html)

[6] [Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation](https://arxiv.org/pdf/2004.09813.pdf)

[7] [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)

[8] [Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond](https://arxiv.org/pdf/1812.10464.pdf)

[9] [BERTSCORE: EVALUATING TEXT GENERATION WITH BERT](https://arxiv.org/pdf/1904.09675.pdf)