# Fancier NLP

Continuing our study of NLP methods, this notebook is based on the [Chapter 16](https://github.com/ageron/handson-ml3/blob/main/16_nlp_with_rnns_and_attention.ipynb) notebook from the Scikit-learn book.

In [None]:
# Connect google drive for persistence
from google.colab import drive
from pathlib import Path

drive.mount("/content/drive")
model_root = Path("/content/drive/MyDrive/SavedModels/")

Mounted at /content/drive


In [None]:
# import stuff and check for GPU
import tensorflow as tf
import tensorflow_datasets as tfds

if not tf.config.list_physical_devices('GPU'):
    print("No GPU was detected. Neural nets can be very slow without a GPU.")
    if "google.colab" in sys.modules:
        print("Go to Runtime > Change runtime and select a GPU hardware "
              "accelerator.")
    if "kaggle_secrets" in sys.modules:
        print("Go to Settings > Accelerator and select GPU.")

In [None]:
# Load the IMDB dataset again
raw_train_set, raw_valid_set, raw_test_set = tfds.load(
    name="imdb_reviews",
    split=["train[:90%]", "train[90%:]", "test"],
    as_supervised=True
)
tf.random.set_seed(42)
train_set = raw_train_set.shuffle(5000, seed=42).batch(16).prefetch(1)
valid_set = raw_valid_set.batch(16).prefetch(1)
test_set = raw_test_set.batch(16).prefetch(1)



Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/3 [00:00<?, ? splits/s]

Generating train examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/incomplete.B2DZHA_1.0.0/imdb_reviews-train.tfrecor…

Generating test examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/incomplete.B2DZHA_1.0.0/imdb_reviews-test.tfrecord…

Generating unsupervised examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/incomplete.B2DZHA_1.0.0/imdb_reviews-unsupervised.…

Dataset imdb_reviews downloaded and prepared to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0. Subsequent calls will reuse this data.


In [None]:
# look at some of the data
batch = next(iter(train_set))

# batch is a tuple of X, y tensors, each length of the batch size
print(f"Positive review: {batch[1][0]}")
print(batch[0][0])


Positive review: 1
tf.Tensor(b'This is a documentary that came out of the splendid work of a Canadian landscape photographer whose interest has long been in the ravages left on earth by the excavations or buildings of man. It begins with a vast factory complex crammed with people making a great variety of little things, parts of high-tech equipment presumably; it isn\'t really made very clear. The emphasis is on how big the place is and how many people are there and how they\'re herded around outside in little yellow jackets. The film also shows the photographer working on a tall structure to do a still of the array of these people outside the factory, and talking with his crew as he does so. This is a world of relentless industrialization. It\'s a relief at least to know these soulless images aren\'t going to be presented without a human voice, as is the case in Nikolaus Geyrhalter\'s gleefully cold documentary about the food industry, \'Our Daily Bread.\' \'Manufactured Landscapes\' 

## Reusing Pretrained Embeddings and Language Models
As with image processing, we can use transfer learning to try to leverage someone else' hard work, which may or may not be adventageous depending on the task.

Again, the move to Keras 3 means that this doesn't just work the way it did in the source notebook. I've tried to modify it to use [KerasHub](https://keras.io/keras_hub/) instead.

In [None]:
!pip install --upgrade --quiet keras-hub-nightly keras-nightly

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/728.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m184.3/728.6 kB[0m [31m5.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m728.6/728.6 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.4 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m46.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import os
os.environ["KERAS_BACKEND"] = "tensorflow"  # or "jax" or "torch"
import keras_hub

bert_path = model_root / "tiny_bert.keras"
if bert_path.exists():
    classifier = tf.keras.models.load_model(bert_path)
else:
    arch = "bert_tiny_en_uncased"

    classifier = keras_hub.models.TextClassifier.from_preset(
        arch,
        num_classes=1,
        load_weights=True,
    )

    classifier.compile(loss="binary_crossentropy", optimizer="nadam",
                metrics=["accuracy"])

    classifier.fit(train_set, validation_data=valid_set, epochs=5)
    classifier.save(bert_path)

classifier.summary()

  instance.compile_from_config(compile_config)
  saveable.load_own_variables(weights_store.get(inner_path))


In [None]:
def is_positive(review):
    return tf.keras.activations.sigmoid(classifier.predict(tf.constant([review]))) > 0.5

is_positive("I am Groot")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 938ms/step


<tf.Tensor: shape=(1, 1), dtype=bool, numpy=array([[ True]])>

## A DIY bidirectional RNN for sentiment analysis
Keras makes it almost too easy to convert an RNN to bidirectional - just wrap in a [tf.keras.layers.Bidirectional](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Bidirectional) layer.


In [None]:
vocab_size = 1000

sb_path = model_root / "sentiment_bidir.keras"
if sb_path.exists():
    sentiment_bidir = tf.keras.models.load_model(sb_path)
else:
    text_vec_layer = tf.keras.layers.TextVectorization(max_tokens=vocab_size)
    text_vec_layer.adapt(train_set.map(lambda reviews, labels: reviews))
    print(text_vec_layer(["Great movie!", "This is DiCaprio's best role."]))

    embed_size = 128
    tf.random.set_seed(42)
    sentiment_bidir = tf.keras.Sequential([
        text_vec_layer,
        tf.keras.layers.Embedding(vocab_size, embed_size, mask_zero=True),
        tf.keras.layers.Bidirectional(tf.keras.layers.GRU(128)),
        tf.keras.layers.Dense(1, activation="sigmoid")
    ])

    sentiment_bidir.compile(loss="binary_crossentropy", optimizer="nadam",
                metrics=["accuracy"])
    history = sentiment_bidir.fit(train_set, validation_data=valid_set, epochs=5)

sentiment_bidir.summary()

tf.Tensor(
[[ 86  18   0   0   0]
 [ 11   7   1 116 217]], shape=(2, 5), dtype=int64)
Epoch 1/5
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m73s[0m 49ms/step - accuracy: 0.6895 - loss: 0.5554 - val_accuracy: 0.8272 - val_loss: 0.3759
Epoch 2/5
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m67s[0m 47ms/step - accuracy: 0.8567 - loss: 0.3391 - val_accuracy: 0.8660 - val_loss: 0.3146
Epoch 3/5
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 47ms/step - accuracy: 0.8847 - loss: 0.2774 - val_accuracy: 0.8668 - val_loss: 0.3086
Epoch 4/5
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m66s[0m 47ms/step - accuracy: 0.8989 - loss: 0.2454 - val_accuracy: 0.8696 - val_loss: 0.3165
Epoch 5/5
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m66s[0m 47ms/step - accuracy: 0.9168 - loss: 0.2122 - val_accuracy: 0.8616 - val_loss: 0.3325


In [None]:
sentiment_bidir.predict(tf.constant(["I don't not like this movie"])) > 0.5

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 261ms/step


array([[False]])

## An Encoder–Decoder Network for Neural Machine Translation

In this section we're loading a set of corresponding English/Spanish phrases and training an encoder/decoder to perform translation.

In [None]:
from pathlib import Path
dl_path = tf.keras.utils.get_file(
    fname="spa-eng.zip",
    origin="http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip",
    extract=True,
    cache_dir="sample_data"
)
print(f"Downloaded to {dl_path}")
text_file = Path(dl_path) / "spa-eng" / "spa.txt"

Downloading data from http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip
[1m2638744/2638744[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Downloaded to sample_data/datasets/spa-eng_extracted


In [None]:
with open(text_file) as f:
    text = f.read()

In [None]:
import numpy as np

text = text.replace("¡", "").replace("¿", "")
pairs = [line.split("\t") for line in text.splitlines()]
np.random.seed(42)  # extra code – ensures reproducibility on CPU
np.random.shuffle(pairs)
sentences_en, sentences_es = zip(*pairs)  # separates the pairs into 2 lists

In [None]:
for i in range(10):
    print(sentences_en[i], "=>", sentences_es[i])

How boring! => Qué aburrimiento!
I love sports. => Adoro el deporte.
Would you like to swap jobs? => Te gustaría que intercambiemos los trabajos?
My mother did nothing but weep. => Mi madre no hizo nada sino llorar.
Croatia is in the southeastern part of Europe. => Croacia está en el sudeste de Europa.
I have never eaten a mango before. => Nunca he comido un mango.
Tell the taxi driver to drive faster. => Decile al taxista que maneje más rápido.
Tom and I work together. => Tom y yo trabajamos juntos.
I would prefer an honorable death. => Preferiría una muerte honorable.
Tom married a much younger woman. => Tom se ha casado con una mujer mucho más joven.


In [None]:
# Tokenize both the English and Spanish sentences, including start/end tokens for Spanish
vocab_size = 1000
max_length = 50
text_vec_layer_en = tf.keras.layers.TextVectorization(
    vocab_size, output_sequence_length=max_length)
text_vec_layer_es = tf.keras.layers.TextVectorization(
    vocab_size, output_sequence_length=max_length)
text_vec_layer_en.adapt(sentences_en)
text_vec_layer_es.adapt([f"startofseq {s} endofseq" for s in sentences_es])

In [None]:
text_vec_layer_en.get_vocabulary()[:10]

['',
 '[UNK]',
 np.str_('the'),
 np.str_('i'),
 np.str_('to'),
 np.str_('you'),
 np.str_('tom'),
 np.str_('a'),
 np.str_('is'),
 np.str_('he')]

In [None]:
text_vec_layer_es.get_vocabulary()[:10]

['',
 '[UNK]',
 np.str_('startofseq'),
 np.str_('endofseq'),
 np.str_('de'),
 np.str_('que'),
 np.str_('a'),
 np.str_('no'),
 np.str_('tom'),
 np.str_('la')]

In [None]:
# tf.constant just converts to a Tensor
X_train = tf.constant(sentences_en[:100_000])
X_valid = tf.constant(sentences_en[100_000:])
# the _dec stuff is the actual Spanish for teacher forcing
X_train_dec = tf.constant([f"startofseq {s}" for s in sentences_es[:100_000]])
X_valid_dec = tf.constant([f"startofseq {s}" for s in sentences_es[100_000:]])
Y_train = text_vec_layer_es([f"{s} endofseq" for s in sentences_es[:100_000]])
Y_valid = text_vec_layer_es([f"{s} endofseq" for s in sentences_es[100_000:]])

In [None]:
tf.random.set_seed(42)  # extra code – ensures reproducibility on CPU
# Let the model take plain strings as inputs
encoder_inputs = tf.keras.layers.Input(shape=[], dtype=tf.string)
decoder_inputs = tf.keras.layers.Input(shape=[], dtype=tf.string)

In [None]:
# fairly arbitrary size for the word embeddings
# You could probably sub in pre-trained embeddings here
embed_size = 128
encoder_input_ids = text_vec_layer_en(encoder_inputs)
decoder_input_ids = text_vec_layer_es(decoder_inputs)
encoder_embedding_layer = tf.keras.layers.Embedding(vocab_size, embed_size,
                                                    mask_zero=True)
decoder_embedding_layer = tf.keras.layers.Embedding(vocab_size, embed_size,
                                                    mask_zero=True)
encoder_embeddings = encoder_embedding_layer(encoder_input_ids)
decoder_embeddings = decoder_embedding_layer(decoder_input_ids)

In [None]:
# return_state means that that the output from the encoder includes the
# long-term and short-term hidden states of the LSTM
encoder = tf.keras.layers.LSTM(512, return_state=True)
# * is the unpacking operator, so this is just splitting apart the output (y(t))
# from the other hidden states (c(t) and h(t))
# Interestingly, encoder_outputs isn't actually used, we just want the hidden state
encoder_outputs, *encoder_state = encoder(encoder_embeddings)

In [None]:
# Decoder is symmetric to the encoder, but we want to return the entire sequence
# this time (from t0 to tn)
decoder = tf.keras.layers.LSTM(512, return_sequences=True)
# Pass the hidden state along to the decoder
decoder_outputs = decoder(decoder_embeddings, initial_state=encoder_state)

In [None]:
# As usual, a fully connected head that uses a softmax to find the most likely
# word in the vocabulary
output_layer = tf.keras.layers.Dense(vocab_size, activation="softmax")
Y_proba = output_layer(decoder_outputs)

**Warning**: the following cell will take a while to run (possibly a couple hours if you are not using a GPU).

In [None]:
# Finally, mash it all together and train
ed_path = model_root / "basic_encoder_decoder.keras"
if ed_path.exists():
    enc_dec_model = tf.keras.models.load_model(ed_path)
else:
    enc_dec_model = tf.keras.Model(inputs=[encoder_inputs, decoder_inputs],
                        outputs=[Y_proba])
    enc_dec_model.compile(loss="sparse_categorical_crossentropy", optimizer="nadam",
                metrics=["accuracy"])
    enc_dec_model.fit((X_train, X_train_dec), Y_train, epochs=10,
            validation_data=((X_valid, X_valid_dec), Y_valid))

enc_dec_model.summary()

Epoch 1/10
[1m3125/3125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m84s[0m 26ms/step - accuracy: 0.0522 - loss: 3.4681 - val_accuracy: 0.0765 - val_loss: 2.0602
Epoch 2/10
[1m3125/3125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m76s[0m 24ms/step - accuracy: 0.0808 - loss: 1.8916 - val_accuracy: 0.0889 - val_loss: 1.5604
Epoch 3/10
[1m3125/3125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m85s[0m 25ms/step - accuracy: 0.0929 - loss: 1.4149 - val_accuracy: 0.0935 - val_loss: 1.3848
Epoch 4/10
[1m3125/3125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m85s[0m 26ms/step - accuracy: 0.1000 - loss: 1.1637 - val_accuracy: 0.0953 - val_loss: 1.3312
Epoch 5/10
[1m3125/3125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m80s[0m 25ms/step - accuracy: 0.1053 - loss: 0.9872 - val_accuracy: 0.0953 - val_loss: 1.3369
Epoch 6/10
[1m3125/3125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 25ms/step - accuracy: 0.1098 - loss: 0.8458 - val_accuracy: 0.0950 - val_loss: 1.3633
Epoc

In [None]:
# encode/decode one word at a time until we predict endofseq
def translate(model, sentence_en):
    translation = ""
    for word_idx in range(max_length):
        X = tf.constant([sentence_en])  # encoder input
        X_dec = tf.constant(["startofseq " + translation])  # decoder input
        y_proba = model.predict((X, X_dec))[0, word_idx]  # last token's probas
        predicted_word_id = np.argmax(y_proba)
        predicted_word = text_vec_layer_es.get_vocabulary()[predicted_word_id]
        if predicted_word == "endofseq":
            break
        translation += " " + predicted_word
    return translation.strip()

translate(enc_dec_model, "Where is the beach?")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 359ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step


'dónde está la playa'

Nice! However, the model struggles with longer sentences:

In [None]:
translate(enc_dec_model, "I love to go to the beach, do you know where I can find it?")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 48

'me gusta ir a la persona porque ves cualquier cosa que está buscando'

## Bidirectional RNNs

Just like with the sentiment analysis model, we can wrap Bidirectional around our encoder RNN.

❓ Why can't the decoder RNN be Bidirectional?

In [None]:
tf.random.set_seed(42)  # extra code – ensures reproducibility on CPU
encoder = tf.keras.layers.Bidirectional(
    tf.keras.layers.LSTM(256, return_state=True))

In [None]:
encoder_outputs, *encoder_state = encoder(encoder_embeddings)

# concatenate the bidirectional hidden states to initialize a standard LSTM that's twice as big
encoder_state = [tf.keras.layers.Concatenate(axis=-1)([encoder_state[0], encoder_state[2]]),  # short-term (0 & 2)
                 tf.keras.layers.Concatenate(axis=-1)([encoder_state[1], encoder_state[3]])]  # long-term (1 & 3)

**Warning**: the following cell will take a while to run (possibly a couple hours if you are not using a GPU).

In [None]:
bidir_path = model_root / "bidir_model.keras"
if bidir_path.exists():
    bidir_model = tf.keras.models.load_model(bidir_path)
else:

    # extra code — completes the model and trains it
    decoder = tf.keras.layers.LSTM(512, return_sequences=True)
    decoder_outputs = decoder(decoder_embeddings, initial_state=encoder_state)
    output_layer = tf.keras.layers.Dense(vocab_size, activation="softmax")
    Y_proba = output_layer(decoder_outputs)
    bidir_model = tf.keras.Model(inputs=[encoder_inputs, decoder_inputs],
                        outputs=[Y_proba])
    bidir_model.compile(loss="sparse_categorical_crossentropy", optimizer="nadam",
                metrics=["accuracy"])
    bidir_model.fit((X_train, X_train_dec), Y_train, epochs=10,
            validation_data=((X_valid, X_valid_dec), Y_valid))

bidir_model.summary()

Epoch 1/10
[1m3125/3125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m85s[0m 26ms/step - accuracy: 0.0670 - loss: 2.7271 - val_accuracy: 0.0913 - val_loss: 1.4649
Epoch 2/10
[1m3125/3125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m76s[0m 24ms/step - accuracy: 0.0948 - loss: 1.3480 - val_accuracy: 0.0963 - val_loss: 1.2827
Epoch 3/10
[1m3125/3125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 24ms/step - accuracy: 0.1019 - loss: 1.0950 - val_accuracy: 0.0976 - val_loss: 1.2384
Epoch 4/10
[1m3125/3125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m79s[0m 25ms/step - accuracy: 0.1070 - loss: 0.9294 - val_accuracy: 0.0977 - val_loss: 1.2430
Epoch 5/10
[1m3125/3125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m79s[0m 24ms/step - accuracy: 0.1112 - loss: 0.7989 - val_accuracy: 0.0976 - val_loss: 1.2719
Epoch 6/10
[1m3125/3125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m76s[0m 24ms/step - accuracy: 0.1148 - loss: 0.6916 - val_accuracy: 0.0971 - val_loss: 1.3184
Epoc

In [None]:
translate(bidir_model, "Where is the beach?")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 487ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step


'dónde está la playa'

In [None]:
translate(bidir_model, "I love to go to the beach, do you know where I can find it?")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step


'me vayas a [UNK] a la playa yo lo que sé'