Your task is to build language generative model on Armenian text. This is a great [blogpost](https://machinelearningmastery.com/gentle-introduction-generative-long-short-term-memory-networks/) on generative models, so look through it at first.
You should:
* collect data<br>
you can collect it from anywhere, it's up to you. I can suggest some sources, such as:
    - [https://hy.wikisource.org/wiki/%D4%BF%D5%A1%D5...](https://hy.wikisource.org/wiki/%D4%BF%D5%A1%D5%BF%D5%A5%D5%A3%D5%B8%D6%80%D5%AB%D5%A1:%D5%80%D5%B8%D5%BE%D5%B0%D5%A1%D5%B6%D5%B6%D5%A5%D5%BD_%D4%B9%D5%B8%D6%82%D5%B4%D5%A1%D5%B6%D5%B5%D5%A1%D5%B6%D5%AB_%D5%B0%D5%A5%D6%84%D5%AB%D5%A1%D5%A9%D5%B6%D5%A5%D6%80)
    - [http://grapaharan.org/%D4%BF%D5%A1%D5%BF%...](http://grapaharan.org/%D4%BF%D5%A1%D5%BF%D5%A5%D5%A3%D5%B8%D6%80%D5%AB%D5%A1:%D5%80%D5%A5%D6%84%D5%AB%D5%A1%D5%A9)
* preprocess data
* find word embeddings (or train it simultaneously with the model)
* build a model and train it
* impress me with creative sentences!

In [2]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils
import numpy as np
import string

Using TensorFlow backend.


In [3]:
# load ascii text and covert to lowercase
filename = "stories.txt"
raw_text = open(filename).read()
raw_text = raw_text.lower()

In [4]:
raw_text = raw_text.replace('՞','')

In [5]:
raw_text = raw_text.replace('։','')

In [6]:
raw_text = raw_text.replace('՜','')
raw_text = raw_text.replace(',','')
raw_text = raw_text.replace('․','')
raw_text = raw_text.replace('«','')
raw_text = raw_text.replace('»','')

In [7]:
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))

In [8]:
n_chars = len(raw_text)
n_vocab = len(chars)
print ("Total Characters: ", n_chars)
print ("Total Vocab: ", n_vocab)

Total Characters:  203114
Total Vocab:  74


In [9]:
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print ("Total Patterns: ", n_patterns)

Total Patterns:  203014


In [10]:
# reshape X to be [samples, time steps, features]
X = np.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

In [11]:
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(256))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

In [12]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 100, 256)          264192    
_________________________________________________________________
dropout_1 (Dropout)          (None, 100, 256)          0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 256)               525312    
_________________________________________________________________
dropout_2 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 74)                19018     
Total params: 808,522
Trainable params: 808,522
Non-trainable params: 0
_________________________________________________________________


In [13]:
# define the checkpoint
filepath="weights/weights-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

In [14]:
model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)

Epoch 1/20

Epoch 00001: loss improved from inf to 2.97606, saving model to weights/weights-01-2.9761.hdf5
Epoch 2/20

Epoch 00002: loss improved from 2.97606 to 2.68954, saving model to weights/weights-02-2.6895.hdf5
Epoch 3/20

Epoch 00003: loss improved from 2.68954 to 2.53720, saving model to weights/weights-03-2.5372.hdf5
Epoch 4/20

Epoch 00004: loss improved from 2.53720 to 2.42751, saving model to weights/weights-04-2.4275.hdf5
Epoch 5/20

Epoch 00005: loss improved from 2.42751 to 2.34044, saving model to weights/weights-05-2.3404.hdf5
Epoch 6/20

Epoch 00006: loss improved from 2.34044 to 2.27297, saving model to weights/weights-06-2.2730.hdf5
Epoch 7/20

Epoch 00007: loss improved from 2.27297 to 2.21633, saving model to weights/weights-07-2.2163.hdf5
Epoch 8/20

Epoch 00008: loss improved from 2.21633 to 2.16938, saving model to weights/weights-08-2.1694.hdf5
Epoch 9/20

Epoch 00009: loss improved from 2.16938 to 2.12830, saving model to weights/weights-09-2.1283.hdf5
Epoch

<keras.callbacks.History at 0x7eff9f85c588>

In [17]:
# load the network weights
filename = "weights/weights-20-1.8777.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')

In [18]:
int_to_char = dict((i, c) for i, c in enumerate(chars))

In [32]:
import sys
# pick a random seed
start = np.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print ("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
# generate characters
for i in range(250):
    x = np.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_vocab)
    prediction = model.predict(x, verbose=0)
    index = np.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]
print ("\"")
print ("The fanfic-inator's quest has been completed")

" երկար սև պարեգոտը մադամ մալբինը հարրիին կանգնեցրեց տղայի կողքին ՝ մեկ ուրիշ աթոռակի վրա հարրիի գլխով "
 անցած մի բան չէ կարող եր անն էր որ նա անհանգիստ աննպաս հարու համար աննպաս մարդ էր այն արալի անաամ մարդ առան մեջ մակատի մեջ մակատի մեծ կարարել էր որ նա արաե կարարել էր որ նա արաե կարարել էր որ նա արաե կարարել էր որ նա արաե կարարել էր որ նա արաե կարար"
The fanfic-inator's quest has been completed
