# БФИ2001 Фаттахов Тагир

## Лабораторная работа №8 (Генерация текста на основе "Алисы в стране чудес")

In [40]:
import numpy
import sys
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint, Callback, TensorBoard
from keras.utils import np_utils

Загрузка текста и преобразования в нижний регистр

In [2]:
filename = "wonderland.txt"
raw_text = open(filename, encoding="utf-8").read()
raw_text = raw_text.lower()

Создания словаря символ - целочисленное значение

In [3]:
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))

Суммирование набора данных

In [4]:
n_chars = len(raw_text)
n_vocab = len(chars)
print("Total Characters:", n_chars)
print("Total Vocab:", n_vocab)

Total Characters: 144679
Total Vocab: 49


Разделение книги на последовательности по 100 значений

In [5]:
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])

n_patterns = len(dataX)
print("Total Patterns:", n_patterns)

Total Patterns: 144579


Нормализация данных для обучения

In [6]:
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
X = X / float(n_vocab)
y = np_utils.to_categorical(dataY)

Задаём архитектуру ИНС

In [8]:
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

Запись сетевых весов для каждой эпохи

In [11]:
filepath= "weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

Обучение ИНС

In [12]:
model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)

Epoch 1/20
Epoch 1: loss improved from inf to 3.02829, saving model to weights-improvement-01-3.0283.hdf5
Epoch 2/20
Epoch 2: loss improved from 3.02829 to 2.81682, saving model to weights-improvement-02-2.8168.hdf5
Epoch 3/20
Epoch 3: loss improved from 2.81682 to 2.70670, saving model to weights-improvement-03-2.7067.hdf5
Epoch 4/20
Epoch 4: loss improved from 2.70670 to 2.63173, saving model to weights-improvement-04-2.6317.hdf5
Epoch 5/20
Epoch 5: loss improved from 2.63173 to 2.56454, saving model to weights-improvement-05-2.5645.hdf5
Epoch 6/20
Epoch 6: loss improved from 2.56454 to 2.50734, saving model to weights-improvement-06-2.5073.hdf5
Epoch 7/20
Epoch 7: loss improved from 2.50734 to 2.45442, saving model to weights-improvement-07-2.4544.hdf5
Epoch 8/20
Epoch 8: loss improved from 2.45442 to 2.41100, saving model to weights-improvement-08-2.4110.hdf5
Epoch 9/20
Epoch 9: loss improved from 2.41100 to 2.36279, saving model to weights-improvement-09-2.3628.hdf5
Epoch 10/20
Ep

<keras.callbacks.History at 0x1a6379a52a0>

Загрузка наилучшего результата

In [9]:
filename = "weights-improvement-20-1.9834.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')
int_to_char = dict((i, c) for i, c in enumerate(chars))

Вывод сгенерированного текста

In [12]:
start = numpy.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
for i in range(1000):
    x = numpy.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_vocab)
    prediction = model.predict(x, verbose=0)
    index = numpy.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]
print("\nDone.")

Seed:
" so _very_ remarkable in that; nor did alice think it
so _very_ much out of the way to hear the rabbi "
t soee and the tabbit soeered an in sar  “hn whu  whl  you knew thet would bell ho the was  soen i sesuld thing you dan  iy was an in ”

“h whst io ”ou saad the mertee dirse to tee seet ”ou aan,” said the maccht  “io would hes here to tee shet mote you dan  io do weu,”

“ho mo wou dan ”our seit tou dan,” said the katter. “io soedt the seaet saad to the sabbit soee if the had so the taabi saad 
“hn mo would hane toet io the haree so tee seat soedl then ”ou whue tel ohee ano the sabe th the seae the sabd th the sabbit soee if the had so the tabli. and she was soin a gin fren ti thel  and see toon the tabdit to tee shet sare all the was so tie sare bnd the tan oo the sabbit so the shoee the was soe kintee  and see then she was soe winle taadi to tee shet sere the was so tay an the cauerrirl raatit she saadit soee in the tael  she had not the har hn the wan oo the taile  and saed 

Создание собсвенного callback

In [30]:
class MyCallback(Callback):
    def __init__(self, data, int_to_char):
        self.dataX = dataX
        self.int_to_char = int_to_char
    
    def text_generation(self):
        start = numpy.random.randint(0, n_patterns-1)
        pattern = self.dataX[start]
        text = []
        for i in range(100):
            x = numpy.reshape(pattern, (1, len(pattern), 1))
            x = x / float(n_vocab)
            prediction = model.predict(x, verbose=0)
            index = numpy.argmax(prediction)
            result = self.int_to_char[index]
            text.append(result)
            pattern.append(index)
            pattern = pattern[1:len(pattern)]
        return "".join(text)
        
    def on_epoch_end(self, epoch, logs):
        if epoch % 5 == 0:
            print("Epoch", epoch, "\n")
            text_gen = self.text_generation()
            print("Generated text:", text_gen, "\n")

Создаём callback для TensorBoard

In [22]:
CallTB = TensorBoard(log_dir="tb_logs", histogram_freq=1)

Объединяем созданные callbacks

In [31]:
filepath= "myCallback-weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint, CallTB, MyCallback(dataX, int_to_char)]

Создаём архитектуру ИНС

In [32]:
model1 = Sequential()
model1.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model1.add(Dropout(0.2))
model1.add(Dense(y.shape[1], activation='softmax'))
model1.compile(loss='categorical_crossentropy', optimizer='adam')

In [33]:
model1.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)

Epoch 1/20
Epoch 1: loss improved from inf to 3.01823, saving model to myCallback-weights-improvement-01-3.0182.hdf5
Epoch 0 

Generated text:  *      *      *      *      *      *      *      *      *      *      *      *      *      *      * 

Epoch 2/20
Epoch 2: loss improved from 3.01823 to 2.80620, saving model to myCallback-weights-improvement-02-2.8062.hdf5
Epoch 3/20
Epoch 3: loss improved from 2.80620 to 2.70609, saving model to myCallback-weights-improvement-03-2.7061.hdf5
Epoch 4/20
Epoch 4: loss improved from 2.70609 to 2.63157, saving model to myCallback-weights-improvement-04-2.6316.hdf5
Epoch 5/20
Epoch 5: loss improved from 2.63157 to 2.56693, saving model to myCallback-weights-improvement-05-2.5669.hdf5
Epoch 6/20
Epoch 6: loss improved from 2.56693 to 2.50932, saving model to myCallback-weights-improvement-06-2.5093.hdf5
Epoch 5 

Generated text:                                                                                                      

Epoch 7/20
Epoch 7: 

<keras.callbacks.History at 0x1459280a050>

Загрузка наилучшего результата 

In [35]:
filename = "myCallback-weights-improvement-15-2.1428.hdf5"
model1.load_weights(filename)
model1.compile(loss='categorical_crossentropy', optimizer='adam')
int_to_char = dict((i, c) for i, c in enumerate(chars))

Вывод сгенерированного текста

In [38]:
start = numpy.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
for i in range(1000):
    x = numpy.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_vocab)
    prediction = model1.predict(x, verbose=0)
    index = numpy.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]
print("\nDone.")

Seed:
" eir heads are gone, if it please your majesty!” the soldiers shouted
in reply.

“that’s right!” shou "
ght alice, “in wou doe’t know the toiee ”h
toedk th the looe tf the lorte ”

“he mo thet ao a loog taad ” said the caterpillar.

“ier ai _ou d toop then toe seit ”ou ”enl the seit ” said the qoeen, “and the more tf tha lott an in the toiee ”h
toenk the mooe tf the loot ”oth ier hend ”ou toone ”hu  the moee tu the toiee ”huh the wou sf the loote ”

“he mo thet ao a loog taad ” said the caterpillar.

“ier ai _ou d toop then toe seit ”ou ”enl the seit ” said the qoeen, “and the more tf tha lott an in the toiee ”h
toenk the mooe tf the loot ”oth ier hend ”ou toone ”hu  the moee tu the toiee ”huh the wou sf the loote ”

“he mo thet ao a loog taad ” said the caterpillar.

“ier ai _ou d toop then toe seit ”ou ”enl the seit ” said the qoeen, “and the more tf tha lott an in the toiee ”h
toenk the mooe tf the loot ”oth ier hend ”ou toone ”hu  the moee tu the toiee ”huh the wou sf the lo

Временные ряды для слоя Dense

![](dense.png)

Временные ряды для слоя LSTM

![](lstm.png)

График потерь для нашей модели

![](epoch.png)

Гистограмма распредления значения на слое Dense

![](dist_dense.png)

Гистограмма распредления значения на слое LSTM

![](dist_lstm.png)