<h1>Szöveg generálás LSTM alkalmazásával<h1>

### **Parókai Dominik - R9XG1T**

## Feladat rövid leírása
A feladat során LSTM neurális háló segítségével képzünk szöveget Dan Brown: The Da Vinci Code  című művét felhasználva.

### Beimportáljuk a szükséges könyvtárakat

In [26]:
import numpy as np
import pandas as pd
import keras
import os

### A kezdeti szövegfájl betöltése és elemzése

In [27]:
with open("../content/sample_data/davincicode.txt", "r", encoding="utf8") as f:
    text = f.read().lower()
print('Szöveg hossza:',len(text))

Szöveg hossza: 90761


### Karaktersorozatok vektorizálása

In [28]:
maxlen = 40

step = 3

sentences = []

next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])

print('Number of sequences:', len(sentences))
chars = sorted(list(set(text)))
print('Unique characters', len(chars))
char_indices = dict((char, chars.index(char)) for char in chars)

print('Vectorization...')

x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)

for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 30241
Unique characters 53
Vectorization...


Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  y = np.zeros((len(sentences), len(chars)), dtype=np.bool)


## A hálózat kiépítése
### Single-layer LSTM modell a következő karakter megjóslására

In [19]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

### Model Compilation Konfigurálása

In [20]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)



### Function, amely a modell előrejelzései alapján mintát vesz a következő karakterből

In [21]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds)/ temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

### Szöveg Generáló Loop

In [25]:
import random
import sys

for epoch in range(1, 30):
    print('\n epoch', epoch)
    model.fit(x, y, batch_size=128, epochs=1)
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('Generating with seed:  "' + generated_text + '"')

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('\n temperature:', temperature)
        sys.stdout.write(generated_text)

        for i in range(100):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)


 epoch 1
Generating with seed:  "urch's elimination of the sacred feminin"

 temperature: 0.2
urch's elimination of the sacred feminin the the the the the the the the the coule the the the the s an the the the the the the was an the 

 temperature: 0.5
 an the the the the the the was an the 







" 

" 

the the we tho nof an ae tol pe feco the cos the the ad he the mad an we al the fal t
 temperature: 1.0
the the ad he the mad an we al the fal th ves inin ocess bota alod le de be watl emr.e" 
lito" ae nzt lathett —mviling mes hr iri s ofrial t
 temperature: 1.2
t lathett —mviling mes hr iri s ofrial tes aomec wstoted ? oas. 
oade.- "eles 
vowine, "ocl'n ea, mhautre coe. "al. "eing havy rous ontyepto
 epoch 2
Generating with seed:  "w commanding seemed 
impossible. "but th"

 temperature: 0.2
w commanding seemed 
impossible. "but the the the saring an and the was the the bat an the salled an the the the sared the the the pare the 
 temperature: 0.5
 the the the sared the the the par