# Machine Translation [MT]

***Machine Translation (MT)*** is a technology that automatically translates text using termbases and advanced grammatical, syntactic and semantic analysis techniques

# Types of Machine Trasalation

1. Rule Based Machine Translation[RBMT]
2. Statistical Machine Translation[SMT]
3. Example based Machine Translation [EBMT]
4. Neural Machine Translation[NMT]
5. Hybrid Methods

## Rule Based Machine Translation

Also called Knowledge Based Machine Translation,it works by parsing a source sentence to identify words and analyze its structure, and then converting it into the target language based on a manually determined set of rules encoded by linguistic experts. The rules attempt to define correspondences between the structure of the source language and that of the target language.

**Disadvantages**
1. Development of an RBMT system is time-consuming and labor-intensive and may take several years for one language pair.
2. Human-encoded rules are unable to cover all possible linguistic phenomena and conflicts between existing rules may lead to poor translation quality when facing real-life texts.RBMT engines don’t deal well with slang or metaphorical texts

There are three types of RBMT systems:

**1. Direct Method (DIctionary Based Machine Translation)**
<br>
Source language text are translated without passing through an intermediary representation. The words will be
translated as a dictionary does word by word, usually without much correlation of meaning between them

**2 Transfer Rule Based Machine Translation** 
<br>
Morphological and syntactical analysis is the fundamental approaches in Transfer based systems. Here source
language text is converted into less language specific representation and same level of abstraction is generated
with the help of grammar rules and bilingual dictionaries

Eg- [Mantra](https://mantra-rajbhasha.rb-aai.in/) -A transfer based tool by Indian Govt

**3 Interlingual RBMT Systems**
<br>
 In this method, source language is translated into an intermediary representation which does not depends on any languages. Target language is derived from this auxiliary form of representation

## Statistical Machine Translation[SMT]

SMT works by training the translation engine with a very large volume of bilingual (source texts and their translations) and monolingual corpora. The system looks for statistical correlations between source texts and translations, both for entire segments and for shorter phrases within each segment, building a so-called translation model. It then generates confidence scores for how likely it is that a given source text will map to a translation. The translation engine itself has no notion of rules or grammar. SMT is the core of systems used by Google Translate and Bing Translator

**Disadvantage**

1. It requires very large and well-organized bilingual corpora for each language pair
2. SMT engines fail when presented with texts that are not similar to material in the training corpora.Therefore, it is important to train the engine with texts that are similar to the material that will be translated

## Example Based Machine Translation

In an EBMT system, a sentence is translated by analogy. A number of existing translation pairs of source and target sentences are used as examples. When a new source sentence is to be translated, the examples are retrieved to find similar ones in the source, then the target sentence is generated by imitating the translation of the matched examples. Because the hit rate for long sentences is very low, usually the examples and the source sentence are broken down into small fragments.

**Diasdvantage**

1. It requires large amount of examples for translation
2. When there is no similar example found, the translation quality may be very low

## Neural Machine Translation

Neural machine translation (NMT) is based on the paradigm of machine learning and is the newest approach to MT. NMT uses neural networks that consist of nodes which can hold single words, phrases, or longer segments and relate to each other in a web of complex relationships based on bilingual texts used to train the system.

## Hybrid  Machine Translation

All the above methods have their shortcomings, and many hybrid MT approaches have been proposed. The two main categories of hybrid systems are:

1. Rule-based engines using statistical translation for post processing and cleanup,
2. Statistical systems guided by rule-based engines.
3. Either of the above with some input from neural machine translation system.

Almost all the practical MT systems adopt hybrid approaches to a certain extent, combining rule-based and statistical approaches. Most recently, more and more systems also take advantage of NMT to different degrees.

# Evaluation Metric for Machine Translation

1. **Word error rate (WER)** is defined based on the distance between the system output and the reference translation at the word level.
2. **Position-independent error rate (PER)** calculates the word error rate by treating each sentence as a bag of words and ignoring the word order.
3. **Bilingual Evaluation Understudy (BLEU)** computes the n-gram precision rather than word error rate.
4. **Metric for Evaluation of Translation with Explicit Ordering (METEOR)** takes stemming and synonyms into consideration.

## Neural Machine Translation

Recurrent Neural Network has been the heart for Machine Translation.Different Architectures using RNN has been proposed over the years.In this section, we will learn some of the famous architectures,understand their working and later implement them from scratch in Pytorch.

## 1. Seq2Seq Model


Research Paper -[Sequence to Sequence Learning with Neural Networks paper](https://arxiv.org/abs/1409.3215)<br>
Implementation Code - [Colab link](colab link)

The most common sequence-to-sequence (seq2seq) models are encoder-decoder models, which (commonly) use a recurrent neural network (RNN) to encode the source (input) sentence into a single vector. In this notebook, we'll refer to this single vector as a context vector. You can think of the context vector as being an abstract representation of the entire input sentence. This vector is then decoded by a second RNN which learns to output the target (output) sentence by generating it one word at a time.

![](images/encoder_decoder.png)

**Encoder Sequence**
At each time step,
1. Input to the encoder 
    - current word, $x_t$ 
    - hidden state from the previous time-step, $h_{t-1}$
2. Output from the encoder RNN
    - a new hidden state $h_{t}$
    
We can represent the encoder as
$$h_t = \text{EncoderRNN}(x_t, h_{t-1})$$
 
Once the final word, $x_T$, has been passed into the RNN, we use the final hidden state, $h_T$, as the context vector, i.e. $h_T = z$. This is a vector representation of the entire source sentence

**Decoder Sequence**
 At each time-step,
 1. Input to the decoder 
     - current word, $y_t$ 
     - the hidden state from the previous time-step, $s_{t-1}$ where the initial decoder hidden state, $s_0$, is the context vector, $s_0 = z = h_T$, i.e. the initial decoder hidden state is the final encoder hidden state
 2. Output from the decoder
     - $s_t$ to predict (by passing it through a Linear layer, shown in purple) what we think is the next word in the sequence, $\hat{y}_t$.
We can represent the decoder as 
$$s_t = \text{DecoderRNN}(y_t, s_{t-1})$$

### 

In this notebook, we'll visit all the concepts involved in implementing the model from the **Sequence to Sequence Learning with Neural Networks paper**.Refer this notebook as a lookup note to undersand any part of the implementation code.

The model implementation is done using PyTorch and TorchText. This will be done on German to English translations, but the models can be applied to any problem that involves going from one sequence to another, such as summarization



# References

1. [A STUDY OF MACHINE TRANSLATION METHODS AND THEIR CHALLENGES](https://www.ijarse.com/images/fullpdf/320.pdf)
2. [Machine Translation-Introduction](https://www.andovar.com/machine-translation/)
3. [Comparison_of_different_machine_translation_approaches](https://en.wikipedia.org/wiki/Comparison_of_different_machine_translation_approaches)
4. [Develop Machine Learning Translation using Keras](https://machinelearningmastery.com/develop-neural-machine-translation-system-keras/)
5. [Perplexity](https://towardsdatascience.com/perplexity-intuition-and-derivation-105dd481c8f3)
[cs124](https://web.stanford.edu/class/cs124/)