# Proyecto NLP Quijote

**Objetivo**
Creación de un modelo de Deep Learning entrenado con los primeros **50 capitulos del Quijote**, para crear una contextualización artificial del contenido del libro y de esta manera poder predecir un nuevo texto en función a unas palabras dadas.

**Requisitos**
* Python 3.8
* Tensorflow 2.x

**Pasos de creación**
* Adquirir el libro del Quijote en formato digital. https://www.gutenberg.org/ebooks/search/?query=quijote&submit_search=Go%21
* Tratamiento de los datos
* Creación del modelo usando redes LSTM (Large Short Term Memory)
* Entrenamiento
* Resultados y validación
* Prediccón
* Salvar modelo

In [2]:
#Cargamos las librerias correspondientes
import tensorflow as tf

from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.layers import Embedding, LSTM, Dense, Bidirectional
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dropout
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint
import numpy as np
import re

from codecarbon import EmissionsTracker

### Tokenizer
Como la palabra sugiere tokenizar significa dividir la oración en una serie de tokens o en palabras simples, podemos decir que cada vez que hay un espacio en una oración agregamos una coma entre ellos para que nuestra oración se divida en tokens y cada palabra tenga un valor único de un número entero.

In [3]:
tokenizer = Tokenizer()

data = open('Dataset/quijote_Lite.txt', encoding="utf8").read()
#data = open('/content/drive/My Drive/Colab Notebooks/Datasets/Vuelta_al_mundo.txt').read()

#Limpiar de simbolos 
data = re.sub('[^a-zA-Z0-9á-ú\¿\?\n\.]', ' ', data)
corpus = data.lower().split("\n")

#Mostramos el cuerpo del texto ya limpio
#print(corpus)
corpus = sorted(list(set(corpus)))

tokenizer.fit_on_texts(corpus)
total_words = len(tokenizer.word_index) + 1



### Resultado despues de realizar los **tokens**

In [4]:
#print(tokenizer.word_index)
#print(total_words)
#print(corpus)
#print(len(corpus))

#### Preparación de los datos
Para mejorar el entrenamiento del modelo y poder tener más datos a partir de los obtenidos, se realiza una tecnica llamda **secuencia**. Que consiste en dividir cada oración en una más pequeña en forma de escalera, de esta forma se hará una predicción de entrenamiento por cada subdivisión de esa oración.

<img src="images/secuencias.png">

Posteriormente se realiza un **Padding**. Es un método para convertir la matriz de enteros de longitud variable en una longitud fija, ya sea truncando (si la longitud es mayor que la longitud_máxima definida que trunca la matriz) o rellenando (si la longitud es más corta que la longitud_máxima, rellene la matriz con 0).

<img src="images/Padding.png">

In [5]:
input_sequences = []
#Marcamos los tokens por cada frase
for line in corpus:
	token_list = tokenizer.texts_to_sequences([line])[0]
	#Creamos frases más pequeñas en función a al original
	#for i in range(1, len(token_list)):
	for i in range(1, len(token_list)): #usamos cada frase para aumentar el train en modo de escalera
		n_gram_sequence = token_list[:i+1]
		input_sequences.append(n_gram_sequence)

# pad sequences 
#Buscamos la frase más larga y las igualamos todas con ceros en la parte de adelante
max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))

# create predictors and label
#Cogemos el ultimo valor como etiqueta (y) y el resto como (x)
xs, labels = input_sequences[:,:-1],input_sequences[:,-1]

ys = tf.keras.utils.to_categorical(labels, num_classes=total_words)

**Mostramos algunos ejemplos del proceso**

In [6]:
print(xs[5])
print(ys[5])
print(ys.shape)
print(xs.shape)

[   0    0    0    0    0    0    0    0    0    0    0    0  119 1725
    1 4532    5   25]
[0. 0. 0. ... 0. 0. 0.]
(153835, 13850)
(153835, 18)


### Creación del modelo

* La primera capa incluye el número de palabras a entrenar y la salida de predicción que queremos mostrar, en este caso se hará una predicción de 50 palabras en función a la dada.

* Usamos como optimizador Adam, aunque tambien se optienen buenos resultados con RMSprop

In [7]:
model = Sequential()
model.add(Embedding(total_words, 50, input_length=max_sequence_len-1))
#model.add(Bidirectional(LSTM(128)))
#model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(100))
model.add(Dense(100,activation='relu'))
model.add(Dense(total_words, activation='softmax'))
adam = Adam(lr=0.001)
rms=RMSprop(learning_rate=0.01)

model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
#earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='auto')
#filepath="/content/drive/My Drive/Colab Notebooks/Projects/QuijoteNLP/model/weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5"
filepath="model/weights-QuijoteNLP.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='accuracy', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]

model.summary()



2024-01-15 21:45:37.468869: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
	 [[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2024-01-15 21:45:37.470901: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
	 [[{{node gradients/split_grad/concat/split/split_dim}}]]
2024-01-15 21:45:37.472077: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You mus

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 18, 50)            692500    
                                                                 
 lstm (LSTM)                 (None, 18, 100)           60400     
                                                                 
 lstm_1 (LSTM)               (None, 100)               80400     
                                                                 
 dense (Dense)               (None, 100)               10100     
                                                                 
 dense_1 (Dense)             (None, 13850)             1398850   
                                                                 
Total params: 2,242,250
Trainable params: 2,242,250
Non-trainable params: 0
_________________________________________________________________


In [8]:
with EmissionsTracker() as tracker:
    history = model.fit(xs, ys, epochs=10, batch_size=128, verbose=1,callbacks=callbacks_list)

[codecarbon INFO @ 21:45:47] [setup] RAM Tracking...
[codecarbon INFO @ 21:45:47] [setup] GPU Tracking...
[codecarbon INFO @ 21:45:47] No GPU found.
[codecarbon INFO @ 21:45:47] [setup] CPU Tracking...
[codecarbon INFO @ 21:45:48] CPU Model on constant consumption mode: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
[codecarbon INFO @ 21:45:48] >>> Tracker's metadata:
[codecarbon INFO @ 21:45:48]   Platform system: Linux-6.5.0-14-generic-x86_64-with-glibc2.35
[codecarbon INFO @ 21:45:48]   Python version: 3.10.9
[codecarbon INFO @ 21:45:48]   CodeCarbon version: 2.3.2
[codecarbon INFO @ 21:45:48]   Available RAM : 15.458 GB
[codecarbon INFO @ 21:45:48]   CPU count: 8
[codecarbon INFO @ 21:45:48]   CPU model: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
[codecarbon INFO @ 21:45:48]   GPU count: None
[codecarbon INFO @ 21:45:48]   GPU model: None


Epoch 1/10


2024-01-15 21:45:51.896197: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 8522459000 exceeds 10% of free system memory.
2024-01-15 21:45:58.930291: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
	 [[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2024-01-15 21:45:58.932211: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
	 [[{{node gradients/split_grad/concat/split/split_dim}}]]
2024-01-15 21:45:58.933611: I tensorflow/core/common_runtime/executor.cc:1197] [/de

  76/1202 [>.............................] - ETA: 1:18 - loss: 7.5570 - accuracy: 0.0591

[codecarbon INFO @ 21:46:06] Energy consumed for RAM : 0.000024 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:46:06] Energy consumed for all CPUs : 0.000031 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:46:06] 0.000055 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:46:21] Energy consumed for RAM : 0.000048 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:46:21] Energy consumed for all CPUs : 0.000063 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:46:21] 0.000111 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:46:36] Energy consumed for RAM : 0.000072 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:46:36] Energy consumed for all CPUs : 0.000094 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:46:36] 0.000166 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:46:51] Energy consumed for RAM : 0.000097 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:46:51] Energy consumed for all CPUs : 0.000125 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:46:51] 0.000222 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:47:06] Energy consumed for RAM : 0.000121 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:47:06] Energy consumed for all CPUs : 0.000156 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:47:06] 0.000277 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:47:21] Energy consumed for RAM : 0.000145 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:47:21] Energy consumed for all CPUs : 0.000187 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:47:21] 0.000332 kWh of electricity used since the beginning.


Epoch 1: accuracy improved from -inf to 0.06107, saving model to model/weights-QuijoteNLP.hdf5
Epoch 2/10
 123/1202 [==>...........................] - ETA: 1:26 - loss: 6.1891 - accuracy: 0.0681

[codecarbon INFO @ 21:47:36] Energy consumed for RAM : 0.000169 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:47:36] Energy consumed for all CPUs : 0.000219 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:47:36] 0.000388 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:47:51] Energy consumed for RAM : 0.000193 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:47:51] Energy consumed for all CPUs : 0.000250 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:47:51] 0.000443 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:48:06] Energy consumed for RAM : 0.000217 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:48:06] Energy consumed for all CPUs : 0.000281 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:48:06] 0.000499 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:48:21] Energy consumed for RAM : 0.000241 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:48:21] Energy consumed for all CPUs : 0.000312 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:48:21] 0.000554 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:48:36] Energy consumed for RAM : 0.000266 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:48:36] Energy consumed for all CPUs : 0.000344 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:48:36] 0.000609 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:48:51] Energy consumed for RAM : 0.000290 kWh. RAM Power : 5.7967658042907715 W




[codecarbon INFO @ 21:48:51] Energy consumed for all CPUs : 0.000375 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:48:51] 0.000665 kWh of electricity used since the beginning.


Epoch 2: accuracy improved from 0.06107 to 0.07401, saving model to model/weights-QuijoteNLP.hdf5
Epoch 3/10
 199/1202 [===>..........................] - ETA: 1:10 - loss: 5.8607 - accuracy: 0.0921

[codecarbon INFO @ 21:49:06] Energy consumed for RAM : 0.000314 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:49:06] Energy consumed for all CPUs : 0.000406 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:49:06] 0.000720 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:49:21] Energy consumed for RAM : 0.000338 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:49:21] Energy consumed for all CPUs : 0.000437 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:49:21] 0.000775 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:49:36] Energy consumed for RAM : 0.000362 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:49:36] Energy consumed for all CPUs : 0.000469 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:49:36] 0.000831 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:49:51] Energy consumed for RAM : 0.000386 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:49:51] Energy consumed for all CPUs : 0.000500 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:49:51] 0.000886 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:50:06] Energy consumed for RAM : 0.000410 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:50:06] Energy consumed for all CPUs : 0.000531 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:50:06] 0.000942 kWh of electricity used since the beginning.


Epoch 3: accuracy improved from 0.07401 to 0.09495, saving model to model/weights-QuijoteNLP.hdf5
Epoch 4/10
  25/1202 [..............................] - ETA: 1:20 - loss: 5.6630 - accuracy: 0.1050

[codecarbon INFO @ 21:50:21] Energy consumed for RAM : 0.000435 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:50:21] Energy consumed for all CPUs : 0.000562 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:50:21] 0.000997 kWh of electricity used since the beginning.


 237/1202 [====>.........................] - ETA: 1:07 - loss: 5.6780 - accuracy: 0.1053

[codecarbon INFO @ 21:50:36] Energy consumed for RAM : 0.000459 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:50:36] Energy consumed for all CPUs : 0.000594 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:50:36] 0.001052 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:50:51] Energy consumed for RAM : 0.000483 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:50:51] Energy consumed for all CPUs : 0.000625 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:50:51] 0.001108 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:51:06] Energy consumed for RAM : 0.000507 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:51:06] Energy consumed for all CPUs : 0.000656 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:51:06] 0.001163 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:51:21] Energy consumed for RAM : 0.000531 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:51:21] Energy consumed for all CPUs : 0.000687 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:51:21] 0.001219 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:51:36] Energy consumed for RAM : 0.000555 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:51:36] Energy consumed for all CPUs : 0.000719 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:51:36] 0.001274 kWh of electricity used since the beginning.


Epoch 4: accuracy improved from 0.09495 to 0.10636, saving model to model/weights-QuijoteNLP.hdf5
Epoch 5/10
  99/1202 [=>............................] - ETA: 1:21 - loss: 5.5624 - accuracy: 0.1160

[codecarbon INFO @ 21:51:51] Energy consumed for RAM : 0.000579 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:51:51] Energy consumed for all CPUs : 0.000750 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:51:51] 0.001329 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:52:06] Energy consumed for RAM : 0.000604 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:52:06] Energy consumed for all CPUs : 0.000781 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:52:06] 0.001385 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:52:21] Energy consumed for RAM : 0.000628 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:52:21] Energy consumed for all CPUs : 0.000812 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:52:21] 0.001440 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:52:36] Energy consumed for RAM : 0.000652 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:52:36] Energy consumed for all CPUs : 0.000844 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:52:36] 0.001496 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:52:51] Energy consumed for RAM : 0.000676 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:52:51] Energy consumed for all CPUs : 0.000875 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:52:51] 0.001551 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:53:06] Energy consumed for RAM : 0.000700 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:53:06] Energy consumed for all CPUs : 0.000906 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:53:06] 0.001606 kWh of electricity used since the beginning.


Epoch 5: accuracy improved from 0.10636 to 0.11760, saving model to model/weights-QuijoteNLP.hdf5
Epoch 6/10
 209/1202 [====>.........................] - ETA: 1:07 - loss: 5.3701 - accuracy: 0.1278

[codecarbon INFO @ 21:53:21] Energy consumed for RAM : 0.000724 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:53:21] Energy consumed for all CPUs : 0.000937 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:53:21] 0.001662 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:53:36] Energy consumed for RAM : 0.000748 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:53:36] Energy consumed for all CPUs : 0.000969 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:53:36] 0.001717 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:53:51] Energy consumed for RAM : 0.000773 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:53:51] Energy consumed for all CPUs : 0.001000 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:53:51] 0.001772 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:54:06] Energy consumed for RAM : 0.000797 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:54:06] Energy consumed for all CPUs : 0.001031 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:54:06] 0.001828 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:54:21] Energy consumed for RAM : 0.000821 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:54:21] Energy consumed for all CPUs : 0.001062 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:54:21] 0.001883 kWh of electricity used since the beginning.


Epoch 6: accuracy improved from 0.11760 to 0.12596, saving model to model/weights-QuijoteNLP.hdf5
Epoch 7/10
  93/1202 [=>............................] - ETA: 1:15 - loss: 5.2543 - accuracy: 0.1365

[codecarbon INFO @ 21:54:36] Energy consumed for RAM : 0.000845 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:54:36] Energy consumed for all CPUs : 0.001094 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:54:36] 0.001939 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:54:51] Energy consumed for RAM : 0.000869 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:54:51] Energy consumed for all CPUs : 0.001125 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:54:51] 0.001994 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:55:06] Energy consumed for RAM : 0.000893 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:55:06] Energy consumed for all CPUs : 0.001156 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:55:06] 0.002049 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:55:21] Energy consumed for RAM : 0.000918 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:55:21] Energy consumed for all CPUs : 0.001187 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:55:21] 0.002105 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:55:36] Energy consumed for RAM : 0.000942 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:55:36] Energy consumed for all CPUs : 0.001219 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:55:36] 0.002160 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:55:51] Energy consumed for RAM : 0.000966 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:55:51] Energy consumed for all CPUs : 0.001250 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:55:51] 0.002216 kWh of electricity used since the beginning.


Epoch 7: accuracy improved from 0.12596 to 0.13271, saving model to model/weights-QuijoteNLP.hdf5
Epoch 8/10
 183/1202 [===>..........................] - ETA: 1:10 - loss: 5.1641 - accuracy: 0.1398

[codecarbon INFO @ 21:56:06] Energy consumed for RAM : 0.000990 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:56:06] Energy consumed for all CPUs : 0.001281 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:56:06] 0.002271 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:56:21] Energy consumed for RAM : 0.001014 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:56:21] Energy consumed for all CPUs : 0.001312 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:56:21] 0.002326 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:56:36] Energy consumed for RAM : 0.001038 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:56:36] Energy consumed for all CPUs : 0.001344 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:56:36] 0.002382 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:56:51] Energy consumed for RAM : 0.001062 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:56:51] Energy consumed for all CPUs : 0.001375 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:56:51] 0.002437 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:57:06] Energy consumed for RAM : 0.001087 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:57:06] Energy consumed for all CPUs : 0.001406 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:57:06] 0.002493 kWh of electricity used since the beginning.


Epoch 8: accuracy improved from 0.13271 to 0.13818, saving model to model/weights-QuijoteNLP.hdf5
Epoch 9/10
  59/1202 [>.............................] - ETA: 1:17 - loss: 5.0554 - accuracy: 0.1427

[codecarbon INFO @ 21:57:21] Energy consumed for RAM : 0.001111 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:57:21] Energy consumed for all CPUs : 0.001437 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:57:21] 0.002548 kWh of electricity used since the beginning.


 273/1202 [=====>........................] - ETA: 1:04 - loss: 5.0585 - accuracy: 0.1430

[codecarbon INFO @ 21:57:36] Energy consumed for RAM : 0.001135 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:57:36] Energy consumed for all CPUs : 0.001469 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:57:36] 0.002603 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:57:51] Energy consumed for RAM : 0.001159 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:57:51] Energy consumed for all CPUs : 0.001500 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:57:51] 0.002659 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:58:06] Energy consumed for RAM : 0.001183 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:58:06] Energy consumed for all CPUs : 0.001531 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:58:06] 0.002714 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:58:21] Energy consumed for RAM : 0.001207 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:58:21] Energy consumed for all CPUs : 0.001562 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:58:21] 0.002770 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:58:36] Energy consumed for RAM : 0.001231 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:58:36] Energy consumed for all CPUs : 0.001594 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:58:36] 0.002825 kWh of electricity used since the beginning.


Epoch 9: accuracy improved from 0.13818 to 0.14218, saving model to model/weights-QuijoteNLP.hdf5
Epoch 10/10
 142/1202 [==>...........................] - ETA: 1:14 - loss: 4.9903 - accuracy: 0.1487

[codecarbon INFO @ 21:58:51] Energy consumed for RAM : 0.001256 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:58:51] Energy consumed for all CPUs : 0.001625 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:58:51] 0.002880 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:59:06] Energy consumed for RAM : 0.001280 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:59:06] Energy consumed for all CPUs : 0.001656 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:59:06] 0.002936 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:59:21] Energy consumed for RAM : 0.001304 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:59:21] Energy consumed for all CPUs : 0.001687 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:59:21] 0.002991 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:59:36] Energy consumed for RAM : 0.001328 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:59:36] Energy consumed for all CPUs : 0.001718 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:59:36] 0.003046 kWh of electricity used since the beginning.




[codecarbon INFO @ 21:59:51] Energy consumed for RAM : 0.001352 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 21:59:51] Energy consumed for all CPUs : 0.001750 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 21:59:51] 0.003102 kWh of electricity used since the beginning.


Epoch 10: accuracy improved from 0.14218 to 0.14637, saving model to model/weights-QuijoteNLP.hdf5


[codecarbon INFO @ 22:00:05] Energy consumed for RAM : 0.001375 kWh. RAM Power : 5.7967658042907715 W
[codecarbon INFO @ 22:00:05] Energy consumed for all CPUs : 0.001779 kWh. Total CPU Power : 7.5 W
[codecarbon INFO @ 22:00:05] 0.003153 kWh of electricity used since the beginning.


In [None]:
!carbonboard --filepath="emissions.csv" --port=3333

Dash is running on http://127.0.0.1:3333/

 * Serving Flask app 'codecarbon.viz.carbonboard'
 * Debug mode: off
 * Running on http://127.0.0.1:3333
[33mPress CTRL+C to quit[0m
127.0.0.1 - - [15/Jan/2024 22:01:54] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [15/Jan/2024 22:01:55] "GET /_dash-layout HTTP/1.1" 200 -
127.0.0.1 - - [15/Jan/2024 22:01:55] "GET /_dash-dependencies HTTP/1.1" 200 -
127.0.0.1 - - [15/Jan/2024 22:01:55] "GET /_favicon.ico?v=2.12.1 HTTP/1.1" 200 -
127.0.0.1 - - [15/Jan/2024 22:01:55] "[36mGET /_dash-component-suites/dash/dcc/async-dropdown.js HTTP/1.1[0m" 304 -
127.0.0.1 - - [15/Jan/2024 22:01:55] "[36mGET /_dash-component-suites/dash/dcc/async-graph.js HTTP/1.1[0m" 304 -
127.0.0.1 - - [15/Jan/2024 22:01:55] "[36mGET /_dash-component-suites/dash/dcc/async-plotlyjs.js HTTP/1.1[0m" 304 -
127.0.0.1 - - [15/Jan/2024 22:01:55] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [15/Jan/2024 22:01:55] "[36mGET /_dash-component-suites/dash/dash_table/async-high

In [None]:
import matplotlib.pyplot as plt


def plot_graphs(history, string):
  plt.plot(history.history[string])
  plt.xlabel("Epochs")
  plt.ylabel(string)
  plt.show()

In [None]:
plot_graphs(history, 'accuracy')


## Predicción de un texto nuevo
* Le damos unas palabras para que pueda crear unas lineas contextualizadas en el Quijote.

In [None]:
seed_text = "leer libros de caballerías"
next_words = 50
  
for _ in range(next_words):
	token_list = tokenizer.texts_to_sequences([seed_text])[0]
	token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
	predicted = model.predict_classes(token_list, verbose=0)
	output_word = ""
	for word, index in tokenizer.word_index.items():
		if index == predicted:
			output_word = word
			break
	seed_text += " " + output_word
print(seed_text)

#### Se guarda la predicción en un archivo de texto

In [None]:
file = open("prediction.txt", "w") 
file.write(seed_text) 
file.close() 

In [None]:
e = model.layers[0]
weights = e.get_weights()[0]
print(weights.shape) # shape: (vocab_size, embedding_dim)

In [None]:
#cantidad de palabras unicas
len(tokenizer.word_index.keys())

In [None]:
#Convertimos el diccionario en una lista, obteniendo solo las palabras
words_list = [(k) for k in tokenizer.word_index.keys()]

In [None]:
import io

out_v = io.open('vecs.tsv', 'w', encoding='utf-8')
out_m = io.open('meta.tsv', 'w', encoding='utf-8')

for num, word in enumerate(words_list):
  vec = weights[num+1] # skip 0, it's padding.
  out_m.write(word + "\n")
  out_v.write('\t'.join([str(x) for x in vec]) + "\n")
out_v.close()
out_m.close()

try:
  from google.colab import files
except ImportError:
   pass
else:
  files.download('vecs.tsv')
  files.download('meta.tsv')