# Generación de Texto con LSTM
En este Notebook se llevará a cabo el ejemplo práctico de generación de textos basados en escritos de Nietzsche del capítulo 8 del libro "Deep Learning con Python", de François Chollet, titulado "Deep Learning generativo".

## Implementación de generación de texto a nivel de carácter
Para este ejemplo haremos uso de uno de los textos que están incluidos en keras, en su módulo *utils*. En concreto, será un texto de Nietzsche en inglés. La clave para la generación de textos es tener un texto o conjunto de ellos lo suficientemente grande como para que se pueda llegar a aprender algún modelo de estilo de redacción.

In [1]:
import keras
import numpy as np

path = keras.utils.get_file('nietzsche.txt', origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Longitud del corpus:', len(text))

Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
Longitud del corpus: 600901


Ahora extraeremos secuencias que se solapen parcialmente. Estas secuencias tendrán longitud *maxlen* y serán tomadas cada *step* caracteres respecto al inicio de la anterior. Tras tomar las secuencias, les aplicaremos codificación one-hot y las colocaremos en una matriz 3D *x* con forma (*sequences*, *maxlen*, *unique_characters*). También prepararemos una matriz *y* que contenga los objetivos correspondientes.

In [4]:
maxlen = 60
step = 3
sentences = []
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Número de secuencias:', len(sentences))

#Se comprueban los caracteres únicos del texto.
chars = sorted(list(set(text)))
print('Caracteres únicos:', len(chars))
#Creamos un diccionario que mapee los caracteres únicos con su índice en 'chars'
char_indices = dict((char, chars.index(char)) for char in chars)

#Ahora realizamos codificación one-hot de los caracteres
print('Vectorización...')

x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)

for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1
    
print('\nEjemplo de muestra en X:\n')
print(x[0])
print('\nEjemplo de objetivo en Y:\n')
print(y[0])

Número de secuencias: 200281
Caracteres únicos: 59
Vectorización...

Ejemplo de muestra en X:

[[False False False ... False False False]
 [False False False ... False False False]
 [False False False ... False False False]
 ...
 [False False False ... False False False]
 [False False False ... False False False]
 [False False False ... False False False]]

Ejemplo de objetivo en Y:

[False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False  True False False False
 False False False False False False False False False False False]


## Creación de la red
La red que vamos a utilizar es sencilla, utilizará una capa *LSTM* seguida de un clasificador *Dense* con función *softmax* para todos los caracteres posibles.

In [5]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

Dado que los caracteres están codificados haciendo uso de one-hot, podemos utilizar *categorical_crossentropy* como función de pérdida.

In [6]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

## Entrenamiento del modelo y muestreo
Dado un modelo entrenado y un fragmento de texto como semilla, podemos generar nuevo texto haciendo esto repetidas veces:

1. Extraer del modelo la distribución de probabilidad del siguiente caracter, teniendo en cuenta el texto generado hasta ahora.
2. Cambiar los pesos de la distribución en función de la temperatura determinada.
3. Muestrear el siguiente caracter al azar según la distribución con los pesos modificados.
4. Añadir el caracter al final del texto disponible

A continuación definiremos la función de muestreo de caracteres.

In [7]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

Ahora vamos a entrenar el modelo. El siguiente bucle entrena y genera texto en cada época. En cada una de ellas generará texto con distintas temperaturas, de esta manera podremos comprobar la evolución del modelo y cómo afecta la temperatura en la generación de texto.

**Aviso**: Las 60 épocas configuradas por defecto pueden tardar bastante tiempo en completarse

In [8]:
import random
import sys

for epoch in range(1, 60):
    print('Época ', epoch)
    # Entrena al modelo para 1 época con los datos de entrenamiento
    model.fit(x, y,
              batch_size=128,
              epochs=1)

    # Selecciona una semilla de texto aleatoriamente
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generando con la semill: "' + generated_text + '"')
    
    #Para cada época, genera texto con temperatura 0.2, 0.5, 1.0 y 1.2
    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ Temperatura:', temperature)
        sys.stdout.write(generated_text)

        # Generamos 400 caracteres cada vez
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

Época  1
--- Generando con la semill: "l of women, in the marriage customs, in the
relations of old"
------ Temperatura: 0.2
l of women, in the marriage customs, in the
relations of old and the something the self-he still the considerate of the will the self-profers and with the considerate of the consideration of the interpical and still the considerate of the considerate of the considerate of the considerate of the will the self-chand of the self the self-profers of the self-profers of the self-profering in the self-profertion of the considerate of the self-man and the stringt
------ Temperatura: 0.5
rofertion of the considerate of the self-man and the stringtion of the fires to in its a profer it self-men and the worlds he self har a that is the sufferent that the entitions that the desprofering and the standination of the man a thing and strongs and the man the herself believe and though of that our the interpian and them from the some the conseds the conscience the conseas the cha

simwand it gives on itfle"hay what something which
in.ten.s of intellict with
the "tance, which onled
some
physiolosrical seemed, alone! rat embed and ready relig
------ Temperatura: 1.2
some
physiolosrical seemed, alone! rat embed and ready religious gre sints permitt umonul
extest, effect discawace themselves as because oigs termire as ceoulations has notrised, at here, may make a
viexteen"!


fromest oncoushe
mus, to be be the
happiness, andsquarous fatharn moral, which he same to loved it who in the heart for inderines--estemoning all possible is malticry and goutager
joyed it finest, bottochs, with 1 is not
sectlestal of onouth, verft
Época  9
--- Generando con la semill: " also implies
the same as "more complete beasts").

258. cor"
------ Temperatura: 0.2
 also implies
the same as "more complete beasts").

258. corresponded the strength of the strength of the strengther of the strength of the strength of the superiority of the most problem of the strength of the superiority of t

th of the strength of the person of the personal person of the considerable emotions the persisonces has also itself a command, until a thing and the fact the experience of the comparent in itself, not mankind and to have been the desire which the greatest strength of presence of the distingurable them that is all supernes in the well have them as the opposing the sense. the destruction of far too passions of the whole sense of the artists of what worth th
------ Temperatura: 1.0
 passions of the whole sense of the artists of what worth theriotion, people with here of feeling is a fortions and period, lardwwze comes o takes them, as eevariates and value
still and christianity in himself, with his ospulality: matter in contardet met--fact, his will--it distibe of the
world
of melaphiln to cast itself now-pertwich them nejewt our one's and science world how may go proved thepe itamefice, but "allearing and power there in times to men
------ Temperatura: 1.2
pe itamefice, but "allearing a

  preds = np.log(preds) / temperature


ety" that the sight to suffers as to the fundamental still, the feeling and nothing
as in the
------ Temperatura: 1.0
 to the fundamental still, the feeling and nothing
as in the way, always, esy only of the unconviction, of laws, "flwudly have
been generations upon
only ust social immens, and in musicapes, would
really, and to ? : always a
smis as in fact, is it with the clashing in great relarding, very question little on woman of things of the deplacantry hard the
garduration the were rough in every day he suight to cruerrty, when in fell through in the religious ficul
------ Temperatura: 1.2
ght to cruerrty, when in fell through in the religious ficula commonpleasings of just always sain of a notic
phenomenanceben of
the volundation: la.--but shof longer of thexc--by her
seaseful, and decayds-moralationy to vennly (and tilling the , is one from previoutle the tim as, "old last pire, as act will ray with
are
too "falve alundly occasion
to germanyt
of nocquiorly the cose togation" wh

and present the same and promise and standard of the conscience and account that the conscience, and present the standard of the power and the standard to the standard in the standard and still and self-conscience of the standard to the reason to the word is always as a man of the standard and all the present the philosopher of the standard and promise and promise and standard of the strength the cond
------ Temperatura: 0.5
nd promise and promise and standard of the strength the condition of the command, with stupidity and standard of the entire to him who has been morality for the power that we thinks the saints its moral my share is according to many the world with enough to the strength of the sense and more destruction of the standander that should be the complaited to the philosopher who is lacking in soul is only the philosopher of the word is such a would be the should
------ Temperatura: 1.0
ly the philosopher of the word is such a would be the should be the sunvernge,--and is

with remace, upon the power with deemls of taried on ; not what each wqurates's of congraved a testar, whics have he? been, which we
have promonedment and thit is if an apgroseation: alter, in dealiant
son, and that struct! it is it, ye pvility has objective
is (poiny-clyite! there is be after ionow-ecrpace of which in sort of according
moridable worlds!
buteollouth will be fol, it is
on, religious, will in wh-venitate or eispive, e
Época  34
--- Generando con la semill: "d also intelligible enough; what is more difficult
to unders"
------ Temperatura: 0.2
d also intelligible enough; what is more difficult
to understand the superiority and subjection of the subjection of the strength of the subjections of the conscience of the strength of the strength of the standard of the strength of the subjecticoments of the stronger and strength, and the standard of the subjection of the subjection of the stand instinct of the secret the same that the senses of the subjection of the strength of th

in the sense of the commands of the commanding that it is all the deeper the perspectify of the influence of a innaus of a still the every reasonably a concealed to artists, in the superioists of his spirit of the sense and profoundly and there is a merely one is althro and soul and deside thereof--what is profound himself of the constant man with the senses of this advantage and history of the end of the world about the deep of all the delieners of endure
------ Temperatura: 1.0
d of the world about the deep of all the delieners of endure.


39

=sanct of
religious and bad dissequedly
gain perhaps one is would likes with eye in gee some beast of all praid of subtle; which therefore--there is there is one must a" certain hows
little insorable and humble waily in old sray, in
time will-overan one is
act one be fine, no morality, thus jesult
and liom-wond
fines of machding and only attendly a stone of a fain outkle, in slie had find 
------ Temperatura: 1.2
nd only attendly a stone of a 

something to himself, and in a loud voice; the subjection and the self-relic and deeper a christian and subject of the suffering and self-all the sense of the present desires and subject of the suffering and sublim the sense of the most spirit in the suffering and perspection of the suffering that the sense of the subjection of the most desire and although they are not the suffering and something who are something of the spirit is not that
------ Temperatura: 0.5
ng and something who are something of the spirit is not that it was one must faith seems the other hand again lack of the supertfully influences and properes of the problem and experience of the delicate and consequence of the faith
are that beings of the real with the suffering all the part of the profound and discoveres and self-philosopher and subject and the faith
own type of the faith which is not being a thing--and realists in the similar, and perhap
------ Temperatura: 1.0
s not being a thing--and realists in the simila

--the sensuate when saked for us. the being outs. present being the invaluations and way which they antasy which
woses
whichh view only it manned; the wos
its flaster, gramp--for
"hosquems delightly kantists of thith, hatey the boodomres world util inwardly to delacks
of philosophek possessed from
surrluse
-much-aristoqtix whict. the entirality "nature. to significances,
in
order to the german is self-phenomening and badly, asroatuous
deem thoughts
alvorwa
Época  57
--- Generando con la semill: "atters.

295. the genius of the heart, as that great mysteri"
------ Temperatura: 0.2
atters.

295. the genius of the heart, as that great mysterious intellectual that the same desire the transcient, the sense to the profound to the spirit of the sense of the spirit the contradict of the spirit of the sense of the sense of the spirit of the superiors of the philosophers of the spirit of the strength the soul and the same to be a man is a man is the same believe as the same the commander the sam

## Discusión
Como podemos ver, un valor bajo de temperatura resulta en un texto extremadamente predecible y repetitivo, pero todas las palabras son reales. Conforme subimos la temperatura, el texto se vuelve más creativo, creando a veces palabras totalmente inventadas pero que suenan verosímiles como "dorphe" o "huther".

Con temperaturas altas el texto se descompone y pierde el sentido casi por completo, creando cadenas semialeatorias de caracteres.

Sin duda, es importante escoger un buen valor de temperatura para conseguir tener cierta aleatoriedad sin que se pierda el sentido del texto. Para este ejemplo en concreto, parece que 0.5 podría ser una buena temperatura.