<a href="https://colab.research.google.com/github/samyxandz/NMT-English-To-Portugeese/blob/main/NMT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Machine Translation

English-to-Portuguese neural machine translation (NMT) model using Long Short-Term Memory (LSTM) networks with attention.

Implementing this using just a Recurrent Neural Network (RNN) with LSTMs can work for short to medium length sentences but can result in vanishing gradients for very long sequences. To help with this, we will be adding an attention mechanism to allow the decoder to access all relevant parts of the input sentence regardless of its length.


## Following are the main steps in the program:
- Implement an encoder-decoder system with attention
- Build the NMT model from scratch using Tensorflow
- Generate translations using greedy and Minimum Bayes Risk (MBR) decoding


#### Importing the required libraries

In [1]:
!pip install numpy tensorflow tensorflow_text

Collecting tensorflow_text
  Downloading tensorflow_text-2.15.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.2/5.2 MB[0m [31m20.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tensorflow_text
Successfully installed tensorflow_text-2.15.0


In [2]:
import numpy as np
import tensorflow as tf
import tensorflow_text as tf_text
import pathlib


from collections import Counter


In [3]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

path_to_file = pathlib.Path("por-eng/por.txt")#/content/por-eng/por.txt

In [4]:
np.random.seed(1234)
tf.random.set_seed(1234)

#### Adding helper functions


In [5]:

def load_data(path):
    text = path.read_text(encoding="utf-8")

    lines = text.splitlines()
    pairs = [line.split("\t") for line in lines]

    context = np.array([context for target, context, _ in pairs])
    target = np.array([target for target, context, _ in pairs])

    return context, target


In [6]:
portuguese_sentences, english_sentences = load_data(path_to_file)
sentences = (portuguese_sentences, english_sentences)

Did u know ?

Due to some problem the above code doent work on a cpu environment,

but when swited to a gpu environmnet it did

may be coz it had 19000+ line ☠

## Data Processing wrapper functions

##### Dividing into batches

In [7]:
BUFFER_SIZE = len(english_sentences)
BATCH_SIZE = 64
is_train = np.random.uniform(size=(len(portuguese_sentences),)) < 0.8

train_raw = (
    tf.data.Dataset.from_tensor_slices(
        (english_sentences[is_train], portuguese_sentences[is_train])
    )
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE)
)
val_raw = (
    tf.data.Dataset.from_tensor_slices(
        (english_sentences[~is_train], portuguese_sentences[~is_train])
    )
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE)
)

#### Normalising the Text

In [8]:
def tf_lower_and_split_punct(text):
    text = tf_text.normalize_utf8(text, "NFKD")
    text = tf.strings.lower(text)
    text = tf.strings.regex_replace(text, "[^ a-z.?!,¿]", "")
    text = tf.strings.regex_replace(text, "[.?!,¿]", r" \0 ")
    text = tf.strings.strip(text)
    text = tf.strings.join(["[SOS]", text, "[EOS]"], separator=" ")
    return text

In [9]:
max_vocab_size = 12000

####Vectorizing the Sentences

In [10]:
english_vectorizer = tf.keras.layers.TextVectorization(standardize=tf_lower_and_split_punct, max_tokens=max_vocab_size, ragged=True)
english_vectorizer.adapt(train_raw.map(lambda context, target: context))

portuguese_vectorizer = tf.keras.layers.TextVectorization(standardize=tf_lower_and_split_punct, max_tokens=max_vocab_size, ragged=True)
portuguese_vectorizer.adapt(train_raw.map(lambda context, target: target))


#### Text Processor wrapper

In [11]:
def process_text(context, target):
    context = english_vectorizer(context).to_tensor()
    target = portuguese_vectorizer(target)
    targ_in = target[:, :-1].to_tensor()
    targ_out = target[:, 1:].to_tensor()
    return (context, targ_in), targ_out

## Data Preparation

checking the above functions

In [12]:
train_data = train_raw.map(process_text, tf.data.AUTOTUNE)
val_data = val_raw.map(process_text, tf.data.AUTOTUNE)

In [13]:

print(f"English (to translate) sentence:\n\n{english_sentences[-5]}\n")
print(f"Portuguese (translation) sentence:\n\n{portuguese_sentences[-5]}")

English (to translate) sentence:

No matter how much you try to convince people that chocolate is vanilla, it'll still be chocolate, even though you may manage to convince yourself and a few others that it's vanilla.

Portuguese (translation) sentence:

Não importa o quanto você tenta convencer os outros de que chocolate é baunilha, ele ainda será chocolate, mesmo que você possa convencer a si mesmo e poucos outros de que é baunilha.


In [14]:
print(f"First 10 words of the english vocabulary:\n\n{english_vectorizer.get_vocabulary()[:10]}\n")
print(f"First 10 words of the portuguese vocabulary:\n\n{portuguese_vectorizer.get_vocabulary()[:10]}")

First 10 words of the english vocabulary:

['', '[UNK]', '[SOS]', '[EOS]', '.', 'tom', 'i', 'to', 'you', 'the']

First 10 words of the portuguese vocabulary:

['', '[UNK]', '[SOS]', '[EOS]', '.', 'tom', 'que', 'o', 'nao', 'eu']


In [15]:
vocab_size = portuguese_vectorizer.vocabulary_size()

 Using [tf.keras.layers.StringLookup](https://www.tensorflow.org/api_docs/python/tf/keras/layers/StringLookup) objects help map from words to ids and vice versa.

In [16]:
word_to_id = tf.keras.layers.StringLookup(
    vocabulary=portuguese_vectorizer.get_vocabulary(),
    mask_token="",
    oov_token="[UNK]"
)


id_to_word = tf.keras.layers.StringLookup(
    vocabulary=portuguese_vectorizer.get_vocabulary(),
    mask_token="",
    oov_token="[UNK]",
    invert=True,
)

In [32]:
unk_id = word_to_id("[UNK]")
sos_id = word_to_id("[SOS]")
eos_id = word_to_id("[EOS]")
baunilha_id = word_to_id("baunilha")

In [17]:
for (to_translate, sr_translation), translation in train_data.take(1):
    print(f"Tokenized english sentence:\n{to_translate[0, :].numpy()}\n\n")
    print(f"Tokenized portuguese sentence (shifted to the right):\n{sr_translation[0, :].numpy()}\n\n")
    print(f"Tokenized portuguese sentence:\n{translation[0, :].numpy()}\n\n")

Tokenized english sentence:
[   2  210    9  146  123   38    9 1672    4    3    0    0    0    0]


Tokenized portuguese sentence (shifted to the right):
[   2 1085    7  128   11  389   37 2038    4    0    0    0    0    0
    0]


Tokenized portuguese sentence:
[1085    7  128   11  389   37 2038    4    3    0    0    0    0    0
    0]




# NMT model with attention


*   The model you will uses an encoder-decoder architecture.




This **Recurrent Neural Network (RNN)** takes in a tokenized version of a sentence in its *encoder*, then passes it on to the *decoder* for translation.

You can picture it like the figure below where all of the context of the input sentence **is compressed into one vector** that is passed into the decoder block.
You can see how this will be an issue for very long sentences (e.g. 100 tokens or more) because the context of the first parts of the input will have very little effect on the final vector passed to the decoder.

<img src='/images/plain_rnn.png'>


Adding an **attention layer** to this model avoids this problem by giving the decoder access to all parts of the input sentence.

To illustrate, let's just use a 4-word input sentence as shown below. These are all passed to the attention layer and each are given a score given the current activation (i.e. hidden state) of the decoder.

To produce the next prediction, the attention layer will first receive all the encoder hidden states as well as the decoder hidden state when producing the word "como" .

 Given this information, it will score each of the encoder hidden states to know which one the decoder should focus on to produce the next word. As a result of training, the model might have learned that it should align to the second encoder hidden state and subsequently assigns a high probability to the word "você".

<img src='images/attention_overview.png'>


There are different ways to implement attention and the one we'll use for is the Scaled Dot Product Attention which has the form:

$$Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V$$

 This context vector is fed to the decoder RNN to get a set of probabilities for the next predicted word. The division by square root of the keys dimensionality ($\sqrt{d_k}$) is for improving model performance and you'll also learn more about it next week.

In [18]:
VOCAB_SIZE = 12000
UNITS = 256

## Encoder
Making a c**ustom Encoder Layer**

The encoder consists of the following layers:

- [Embedding](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding)
    
+ [Bidirectional](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Bidirectional) [LSTM](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM): bidirectional behaviour for RNN-like layers.   
  


In [19]:
class Encoder(tf.keras.layers.Layer):
    def __init__(self, vocab_size, units):
        super(Encoder, self).__init__()

        self.embedding = tf.keras.layers.Embedding(
            input_dim=vocab_size,
            output_dim=units,
            mask_zero=True
        )

        self.rnn = tf.keras.layers.Bidirectional(
            merge_mode="sum",
            layer=tf.keras.layers.LSTM(
                units=units,
                return_sequences=True
            ),
        )


    def call(self, context):

        x = self.embedding(context)
        x = self.rnn(x)

        return x

In [20]:
#testing if any breaks
encoder = Encoder(VOCAB_SIZE, UNITS)

## Cross Attention

The cross attention consists of the following layers:

- [MultiHeadAttention](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention). The reason why this layer is preferred over [Attention](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention) is that it allows simpler code during the forward pass.
    

- Need to pass both the output of the attention alongside the shifted-to-the-right translation (since this cross attention happens in the decoder side). For this you will use an [Add](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Add) layer so that the original dimension is preserved
+ Layer normalization is also performed for better stability of the network by using  [LayerNormalization](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LayerNormalization) .


In [21]:
class CrossAttention(tf.keras.layers.Layer):
    def __init__(self, units):
        super().__init__()
        self.mha = (
            tf.keras.layers.MultiHeadAttention(
                key_dim=units,
                num_heads=1
            )
        )

        self.layernorm = tf.keras.layers.LayerNormalization()
        self.add = tf.keras.layers.Add()

    def call(self, context, target):
        attn_output = self.mha(
            query=target,
            value=context
        )

        x = self.add([target, attn_output])
        x = self.layernorm(x)

        return x

In [22]:
#testing if any breaks
attention_layer = CrossAttention(UNITS)

## Decoder



The decoder consists of the following layers:

- [Embedding](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding)
  
  
+ Pre-attention [LSTM](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM)

  *   A vanilla LSTM
  *   The LSTM returns the full sequence and not only the last output
  *   It is very important that this layer returns the state since this will be needed for inference

- The attention layer that performs cross attention between the sentence to translate and the right-shifted translation. Here you need to use the `CrossAttention` layer you defined in the previous exercise.

+ Post-attention [LSTM](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM)

- [Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) layer.
 Make sure to use a `logsoftmax` activation function for this one, which you

  *   Make sure to use a `logsoftmax` activation function for this one, which you
  can get as [tf.nn.log_softmax](https://www.tensorflow.org/api_docs/python/tf/nn/log_softmax).



In [23]:
class Decoder(tf.keras.layers.Layer):
    def __init__(self, vocab_size, units):
        super(Decoder, self).__init__()
        self.embedding =tf.keras.layers.Embedding(
            input_dim=vocab_size,
            output_dim=units,
            mask_zero=True
        )
        # The RNN before attention
        self.pre_attention_rnn = tf.keras.layers.LSTM(
            units=units,
            return_sequences=True,
            return_state=True
        )
        self.attention = CrossAttention(units=units)

        # The RNN after attention
        self.post_attention_rnn = tf.keras.layers.LSTM(
            units=units,
            return_sequences=True
        )

        # The dense layer with logsoftmax activation
        self.output_layer = tf.keras.layers.Dense(
            units=vocab_size,
            activation=tf.nn.log_softmax
        )

    def call(self, context, target, state=None, return_state=False):
        x = self.embedding(target)
        x, hidden_state, cell_state = self.pre_attention_rnn(x, initial_state=state)

        x = self.attention(context, x)
        x = self.post_attention_rnn(x)

        # Compute the logits
        logits = self.output_layer(x)

        if return_state:
            return logits, [hidden_state, cell_state]

        return logits

In [34]:
decoder = Decoder(VOCAB_SIZE, UNITS)

## Translator

Putting together all of the layers you previously coded into an actual model.

 The Translator class inherits from `tf.keras.layers.Model`.

Remember that `train_data` will yield a tuple with the sentence to translate and the shifted-to-the-right translation, which are the "features" of the model.

In [24]:
class Translator(tf.keras.Model):
    def __init__(self, vocab_size, units):

        super().__init__()
        self.encoder = Encoder(vocab_size=vocab_size, units=units)
        self.decoder = Decoder(vocab_size=vocab_size, units=units)
    def call(self, inputs):
        context, target = inputs
        encoded_context= self.encoder(context)

        logits = self.decoder(encoded_context,target)
        return logits

In [25]:
translator = Translator(VOCAB_SIZE, UNITS)
logits = translator((to_translate, sr_translation))

# Training

took a long time man

  Also Hi there !

  If u reading this come say Hi at discord(samyxandy) or [LinkedIn](https://www.linkedin.com/in/tamaghna-choudhuri/) or [Github](https://github.com/samyxandz)

In [26]:
def masked_loss(y_true, y_pred):

    loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')
    loss = loss_fn(y_true, y_pred)

    # Check which elements of y_true are padding
    mask = tf.cast(y_true != 0, loss.dtype)

    loss *= mask
    # Return the total.
    return tf.reduce_sum(loss)/tf.reduce_sum(mask)


def masked_acc(y_true, y_pred):
    y_pred = tf.argmax(y_pred, axis=-1)
    y_pred = tf.cast(y_pred, y_true.dtype)
    match = tf.cast(y_true == y_pred, tf.float32)
    mask = tf.cast(y_true != 0, tf.float32)

    return tf.reduce_sum(match)/tf.reduce_sum(mask)


def tokens_to_text(tokens, id_to_word):
    words = id_to_word(tokens)
    result = tf.strings.reduce_join(words, axis=-1, separator=" ")
    return result


In [27]:
def compile_and_train(model, epochs=20, steps_per_epoch=500):
    model.compile(optimizer="adam", loss=masked_loss, metrics=[masked_acc, masked_loss])

    history = model.fit(
        train_data.repeat(),
        epochs=epochs,
        steps_per_epoch=steps_per_epoch,
        validation_data=val_data,
        validation_steps=50,
        callbacks=[tf.keras.callbacks.EarlyStopping(patience=3)],
    )

    return model, history

In [28]:
trained_translator, history = compile_and_train(translator)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


# Model Inference


The model trained is used for inference.

This function is meant to be used inside a for-loop, so you feed to it the information of the previous step to generate the information of the next step

In [29]:
def generate_next_token(decoder, context, next_token, done, state, temperature=0.0):

    logits, state = decoder(context, next_token, state=state, return_state=True)
    logits = logits[:, -1, :]

    # If temp is 0 then next_token is the argmax of logits
    if temperature == 0.0:
        next_token = tf.argmax(logits, axis=-1)

    # If temp is not 0 then next_token is sampled out of logits
    else:
        logits = logits / temperature
        next_token = tf.random.categorical(logits, num_samples=1)

    logits = tf.squeeze(logits)
    next_token = tf.squeeze(next_token)
    logit = logits[next_token].numpy()

    # Reshape to (1,1) since this is the expected shape for text encoded as TF tensors
    next_token = tf.reshape(next_token, shape=(1,1))

    # If next_token is End-of-Sentence token you are done
    if next_token == eos_id:
        done = True

    return next_token, logit, state, done

In [35]:
eng_sentence = "I love languages"
texts = tf.convert_to_tensor(eng_sentence)[tf.newaxis]
# Vectorize it and pass it through the encoder
context = english_vectorizer(texts).to_tensor()
context = encoder(context)

next_token = tf.fill((1,1), sos_id)

state = [tf.random.uniform((1, UNITS)), tf.random.uniform((1, UNITS))]
done = False

next_token, logit, state, done = generate_next_token(decoder, context, next_token, done, state, temperature=0.5)
print(f"Next token: {next_token}\nLogit: {logit:.4f}\nDone? {done}")

Next token: [[6365]]
Logit: -18.7164
Done? False


# Translation

 This function will take care of the following steps:

- Process the sentence to translate and encode it

+ Set the initial state of the decoder

- Get predictions of the next token (starting with the \<SOS> token) for a maximum of iterations (in case the \<EOS> token is never returned)
    
+ Return the translated text (as a string), the logit of the last iteration (this helps measure how certain was that the sequence was translated in its totality) and the translation in token format.



In [36]:
def translate(model, text, max_length=50, temperature=0.0):
    tokens, logits = [], []
    text = tf.constant([text])
    # Vectorize the text using the correct vectorizer
    context = english_vectorizer(text).to_tensor()
    context = model.encoder(context)
    next_token = tf.constant([[2]])

    state = [tf.zeros((1,UNITS)), tf.zeros((1,UNITS))]

    done = False

    # Iterate for max_length iterations
    for _ in range(max_length):
        next_token, logit, state, done =generate_next_token(decoder, context, next_token, done, state, temperature=0.5)
        if done:
            break

    # Add next_token to the list of tokens
    tokens.append(next_token)

    # Add logit to the list of logits
    logits.append(logit)

    # Concatenate all tokens into a tensor
    tokens = tf.concat(tokens, axis=-1)

    translation = tf.squeeze(tokens_to_text(tokens, id_to_word))
    translation = translation.numpy().decode()

    return translation, logits[-1], tokens

In [39]:

temp = 0.0
original_sentence = "I love languages"

translation, logit, tokens = translate(trained_translator, original_sentence, temperature=temp)

print(f"Temperature: {temp}\n\nOriginal sentence: {original_sentence}\nTranslation: {translation}\nTranslation tokens:{tokens}\nLogit: {logit:.3f}")

Temperature: 0.0

Original sentence: I love languages
Translation: desmontar
Translation tokens:[[6923]]
Logit: -18.764



# Minimum Bayes-Risk Decoding

Getting the most probable token at each step may not necessarily produce the best results. Another approach is to do Minimum Bayes Risk Decoding or MBR.

 The general steps to implement this are:

- Take several random samples
+ Score each sample against all other samples
- Select the one with the highest score


In [40]:
# This function will return any desired number of candidate translations alongside the log-probability for each one
def generate_samples(model, text, n_samples=4, temperature=0.6):

    samples, log_probs = [], []
    for _ in range(n_samples):

        _, logp, sample = translate(model, text, temperature=temperature)

        # Save the translated tensors
        samples.append(np.squeeze(sample.numpy()).tolist())

        # Save the logits
        log_probs.append(logp)

    return samples, log_probs

In [41]:
samples, log_probs = generate_samples(trained_translator, 'I love languages')

for s, l in zip(samples, log_probs):
    print(f"Translated tensor: {s} has logit: {l:.3f}")

Translated tensor: 2576 has logit: -18.799
Translated tensor: 7195 has logit: -18.798
Translated tensor: 10243 has logit: -18.781
Translated tensor: 1336 has logit: -18.772


## Similarity Scores


In order to evaluate multiple translations effectively, it's essential to establish a method for measuring their quality. One approach is to compare each translation sample with the others.

Several metrics, particularly focusing on unigram overlaps, can aid in this assessment.

### Jaccard Similarity

One such metric is the widely used Jaccard similarity, which calculates the intersection over union of two sets. The jaccard_similarity function is dedicated to providing this metric for any pair of candidate and reference translations

In [42]:
def jaccard_similarity(candidate, reference):
    candidate_set = set(candidate)
    reference_set = set(reference)
    common_tokens = candidate_set.intersection(reference_set)
    all_tokens = candidate_set.union(reference_set)
    overlap = len(common_tokens) / len(all_tokens)

    return overlap


### rouge1_similarity

Jaccard similarity is good but a more commonly used metric in machine translation is the ROUGE score. For unigrams, this is called ROUGE-1 you can output the scores for both precision and recall when comparing two samples.

To get the final score, compute the F1-score as given by:



$$score = 2* \frac{(precision * recall)}{(precision + recall)}$$


In [50]:
def rouge1_similarity(candidate, reference):
    candidate = str(candidate)
    reference = str(reference)
    candidate_word_counts = Counter(candidate)
    reference_word_counts = Counter(reference)

    overlap = 0
    for token in candidate_word_counts.keys():
        token_count_candidate = candidate_word_counts[token]
        token_count_reference = reference_word_counts[token]

        overlap += min(token_count_candidate, token_count_reference)
    precision = overlap / len(candidate)

    recall = overlap / len(reference)

    if precision + recall != 0:
        f1_score = 2 * (precision * recall) / (precision + recall)
        return f1_score



    return 0

## Computing the Overall Score

 the steps to generate the scores of a 4-sample list.

- Get similarity score between sample 1 and sample 2
+ Get similarity score between sample 1 and sample 3
- Get similarity score between sample 1 and sample 4
+ Get average score of the first 3 steps. This will be the overall score of sample 1
- Iterate and repeat until samples 1 to 4 have overall scores.


The results will be stored in a dictionary for easy lookups.



In [51]:
def average_overlap(samples, similarity_fn):
    scores = {}

    # Iterate through all samples
    for index_candidate, candidate in enumerate(samples):

        overlap = 0

        for index_sample, sample in enumerate(samples):
            if index_candidate == index_sample:
                continue

            # Get the overlap between candidate and sample using the similarity function
            sample_overlap = similarity_fn(candidate, sample)


            overlap += sample_overlap

        score = overlap / (len(samples) - 1)
        score = round(score, 3)
        scores[index_candidate] = score

    return scores

In [52]:
def weighted_avg_overlap(samples, log_probs, similarity_fn):
    scores = {}
    for index_candidate, candidate in enumerate(samples):
        overlap, weight_sum = 0.0, 0.0
        for index_sample, (sample, logp) in enumerate(zip(samples, log_probs)):
            if index_candidate == index_sample:
                continue

            sample_p = float(np.exp(logp))
            weight_sum += sample_p
            sample_overlap = similarity_fn(candidate, sample)
            overlap += sample_p * sample_overlap

        score = overlap / weight_sum
        score = round(score, 3)
        scores[index_candidate] = score

    return scores

## mbr_decode


In [53]:
def mbr_decode(model, text, n_samples=5, temperature=0.6, similarity_fn=rouge1_similarity):
    samples, log_probs = generate_samples(model, text, n_samples=n_samples, temperature=temperature)
    scores = weighted_avg_overlap(samples, log_probs, similarity_fn)
    decoded_translations = [tokens_to_text(s, id_to_word).numpy().decode('utf-8') for s in samples]

    # Find the key with the highest score
    max_score_key = max(scores, key=lambda k: scores[k])
    translation = decoded_translations[max_score_key]

    return translation, decoded_translations

In [55]:
english_sentence = "I love languages"

translation, candidates = mbr_decode(trained_translator, english_sentence, n_samples=10, temperature=0.6)

print("Translation candidates:")
for c in candidates:
    print(c)

print(f"\nSelected translation: {translation}")


Selected translation: eu amo línguas
