# PART 3- DEEP NETWORK & TEXT GENERATION

### 3.0 Preparations

first, we will import the following libraries:  
*theano*- use it in order to optimize and evaluate mathematical expressions involving multi-dimensional arrays.  
*keras*- we will use it to build high-level neural networks API.  
(about the others we explained at the previous parts)

In [3]:
import numpy as np
import theano
import keras
import matplotlib.pyplot as plt
import pandas as pd
import csv

Using TensorFlow backend.


we will use theano as backend of keras (for define,optimize and evaluate multi-dimensional arrays.)

In [4]:
from keras import backend as K
K.set_image_dim_ordering('th')

we also print some tables in this part, and we found it more easy using tabulate, in order to install it:  
(install by conda prompt)

In [3]:
# $ pip install tabulate

In [4]:
from tabulate import tabulate

### 3.1 Import the data

let's read the csv file of the original data we imported by gmail API at part 1

In [6]:
df = pd.read_csv("./saveData/orginalData.csv")

convert to a list for our comfort:

In [7]:
lst=list(df.values.flatten())

let's take a look on the data:

In [8]:
lst

['illonashachar78@gmail.com',
 ' Hi [image: =F0=9F=99=82] no, I will be free for almost all Day!\r\nShavit I am not going out today but If You would like to go I would\r\nrecomend You club called Rado=C5=9B=C4=87 for tonight!\r\n',
 'illonashachar78@gmail.com',
 ' You can go free before 10 pm. At 10 people will start letting a lot of\r\nlanterns in the air and then will be a concert .\r\nPeople just want to beat a record (a lot of lanterns) without any reason.\r\nDo what are Your plans for tomorrow? Are You going to Kazimierz?\r\n',
 'illonashachar78@gmail.com',
 ' What time are You going to Kazimierz?\r\n',
 'illonashachar78@gmail.com',
 ' I think it is better for You to go there by bus because my parents will\r\nstart travel around 8.20. IT is to early!\r\nI will check schedul of buses for You now\r\nBus to Kazimierz Dolny 9.25 - You will be in Kazimierz at 10.35 (big bus )\r\nor 10.20 -You will be in Kazimierz at 11.40 (small bus )\r\nThe places where the bus start is : 9.25 : ul. R

### 3.2 Preprocessing the text to fit into a network¶

we will divide the preprocessing to 2 parts, preprocessing for ALL the data, and preprocrssing to each one of the senders. (that will be more suitable to kind of the model we will use).  
so, first general preprocessing: we can notice that many of the messages includes links, we will clean it: 

In [9]:
import re
lst1=lst
for i in  np.arange(1,len(lst1),2):
            lst1[i] = re.sub(r'\[.*?\]|\(.*http.+\)|\<.*http.+\>','',lst1[i])
            lst1[i] = re.sub(r'Rado([^\s]+)|Skarp([^\s]+)', '', lst1[i])
            lst1[i] = re.sub(r'\=[A-Z|0-9][A-Z|0-9]|\=', '', lst1[i])

EXPLAIN: we use np.arange(1,len(lst1),2) in order to clean in jump of 2 cells starting with the second cell in order to clean ONLY the messages and not change the addresses. for cleaning we use regex and therefore import re library.

In [10]:
lst1

['illonashachar78@gmail.com',
 ' Hi  no, I will be free for almost all Day!\r\nShavit I am not going out today but If You would like to go I would\r\nrecomend You club called  for tonight!\r\n',
 'illonashachar78@gmail.com',
 ' You can go free before 10 pm. At 10 people will start letting a lot of\r\nlanterns in the air and then will be a concert .\r\nPeople just want to beat a record (a lot of lanterns) without any reason.\r\nDo what are Your plans for tomorrow? Are You going to Kazimierz?\r\n',
 'illonashachar78@gmail.com',
 ' What time are You going to Kazimierz?\r\n',
 'illonashachar78@gmail.com',
 ' I think it is better for You to go there by bus because my parents will\r\nstart travel around 8.20. IT is to early!\r\nI will check schedul of buses for You now\r\nBus to Kazimierz Dolny 9.25 - You will be in Kazimierz at 10.35 (big bus )\r\nor 10.20 -You will be in Kazimierz at 11.40 (small bus )\r\nThe places where the bus start is : 9.25 : ul. Ruska 7/pod  10.\r\n20:\r\nul. Ruska\r

ok, now the text is much better... but there are still some characters that can make noises to our model, like punctuations and newLine signs. but we will not remove them in advance, because different senders have different message structure and they use this characters in different way, so in order to try to generate most similat message as we can, we have to keep them. (and if the model result will not be good enough for us, we will try to remove some of them...)  

so.. before we start buliding models and train them, we have to divide the data to 5 lists, one for each sender. let's do this:

In [11]:
def classify_lists(email,lst):
    newlst=""
    for i in  np.arange(0,len(lst),2):
        if lst[i]==email:
              newlst = newlst + lst[i+1] + " " + "\r\n" + " "
    return newlst

EXPLAIN: we defined a function that get list and email and return list of all the messages of this email in the list of got as parameter. (in order to be coherent with the data, we used \r\n as new line character between messages)

In [12]:
ilonaText=classify_lists('illonashachar78@gmail.com',lst1)
dvirText=classify_lists('dvirnimrod84@gmail.com',lst1)
asafText=classify_lists('asafdavid08@gmail.com',lst1)
itaiText=classify_lists('itaicohen266@gmail.com',lst1)
leaText=classify_lists('leapeturel@gmail.com',lst1)

### 3.3 building model for each type

in this subpart, we will explain for each type our preprocessing we made (if any) , and how we define the model for it.  
for all the models we will use n-sequence-architecture as described in class. at the first type we will explain each step, after it we will do all the steps together. and use this function with changing the parameters in the next types.

we will define the following function that will print us some general details  about the text:

In [12]:
def get_general_details(txt):
    header = ["Punctuation", "Appearance "]
    rows = [(":",txt.count(":")),(",",txt.count(",")),("-",txt.count("-")),("=",txt.count("=")),(";",txt.count(";")),
            (".",txt.count(".")),("!",txt.count("!")),("?",txt.count("?")),("(",txt.count("(")),(")",txt.count(")")),
           ("'",txt.count("'")),("#",txt.count("#")),("\\",txt.count("\\"))]
    print(tabulate(rows,headers=header,numalign="center",tablefmt="grid") +"\n")
    print("Number of words: " , len(leaText.split(' ')))
    print("Number of characters: " , len(txt))
    print("Capital letters appearance: " , sum(1 for c in txt if c.isupper())/len(txt))
    print("Uniqe capital letters: " , len(set(list(c for c in leaText if c.isupper()))))
    print("Uniqe lower letters: " , len(set(list(c for c in leaText if c.islower()))))
    print("digits characters appearance: " , sum(1 for c in txt if c.isdigit())/len(txt))


### Type 1: sender address - leapeturel@gmail.com

*examine the data*

In [13]:
leaText

'Hey you ! how are you ? you don\'t feel depressed ? and presents for your\r\ngirl friend, she liked ?! bisous\r\n \r\n I will take more time later to answer you because I have to take a boat to\r\ngo to Koh tao.. give me some news about you, and everything that you want\r\nto say ( remember, I like when you speak, even if it is better when I can\r\nwatch you..)\r\n \r\n oh...hard for you... I think a lot about you, and I remember all moments\r\ntogether.. for exemple, one week ago, we were watching girls thai who made\r\nsport near to the river !\r\nok, not a good idea to think about that.. but I hope we could have news\r\nsouvenirs ( good word?!) together ?in an other city / country ?!\r\nSo, about me, I\'m fine, I\'m in koh Pahgan, it\'s nice but a little\r\nexpensive, and too much tourists..! I drink beers but it\'s not the same\r\ntaste without you..\r\nI will think about you tomorow when you will begin your study.. when is the\r\nend ? (I\'m sorry, I forget..)\r\neh, send me some

In [14]:
get_general_details(leaText)

+---------------+---------------+
| Punctuation   |  Appearance   |
| :             |       0       |
+---------------+---------------+
| ,             |      105      |
+---------------+---------------+
| -             |       5       |
+---------------+---------------+
| =             |       0       |
+---------------+---------------+
| ;             |       1       |
+---------------+---------------+
| .             |      100      |
+---------------+---------------+
| !             |      58       |
+---------------+---------------+
| ?             |      43       |
+---------------+---------------+
| (             |       8       |
+---------------+---------------+
| )             |       8       |
+---------------+---------------+
| '             |      50       |
+---------------+---------------+
| #             |       0       |
+---------------+---------------+
| \             |       1       |
+---------------+---------------+

Number of words:  1442
Number of characters:  7

first, we can notice that we have 7671 words in total, this amount of words is small. hence, we decided to use model that read each 10 characters seperately, and not a model that read each word seperately, because it will be more difficult to get good result with small amount of practise samples.  
because we treat at the character level, there is important for each character that appear in the text, because each one will make the vocabulary bigger. so, before we define the model, we will clean the text a little bit more.  
first we will clean punctuation that appear rarely:

In [15]:
leaText_=re.sub(r'[\(\)\;\-\\]','',leaText)

we also want to represent new line so we can preserve the structure of the message after generation. at the plain text newLine represent as: \r\n , we will substitute it to another sign that will take one character only. we will choose "#" character. 

In [16]:
leaText_=re.sub(r'\r\n','#',leaText_)

*remark* - before each " ' " character there is '\' but '\' is only because the way python print the text.  
as you can notice by the following snippet:

In [17]:
leaText_.count('\\')

0

we can notice that there are 203 instances of capital letters, that include 21 uniqe letters. this is only ~0.02% from all the characters in the text, so if we will lowercase all the characters the change in the text will not be significant, and our vocabulary will be smaller, so it will be more easy for train.  
let's do that:

In [18]:
leaText_=leaText_.lower()

let's take a look again at the text before we build our model:

In [19]:
leaText_

'hey you ! how are you ? you don\'t feel depressed ? and presents for your#girl friend, she liked ?! bisous# # i will take more time later to answer you because i have to take a boat to#go to koh tao.. give me some news about you, and everything that you want#to say  remember, i like when you speak, even if it is better when i can#watch you..# # oh...hard for you... i think a lot about you, and i remember all moments#together.. for exemple, one week ago, we were watching girls thai who made#sport near to the river !#ok, not a good idea to think about that.. but i hope we could have news#souvenirs  good word?! together ?in an other city / country ?!#so, about me, i\'m fine, i\'m in koh pahgan, it\'s nice but a little#expensive, and too much tourists..! i drink beers but it\'s not the same#taste without you..#i will think about you tomorow when you will begin your study.. when is the#end ? i\'m sorry, i forget..#eh, send me some photos  and photos of you !  !# # hey shavit ! i\'m in kao 

alright, we think we can try to build our mode and check the results:

*building the model*

In [18]:
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils

first, we want to build vocabulary of unique characters, the easiest way to do so is by convert the text to set.

In [21]:
chars = sorted(list(set(leaText_)))
''.join(chars)

' !"#\',./01234568?abcdefghijklmnopqrstuvwxyz'

let's check the size of the vocabulary:

In [22]:
n_chars = len(leaText_)
n_vocab = len(chars)
print("Total Characters: ", n_chars)
print("Total Vocab: ", n_vocab)

Total Characters:  7479
Total Vocab:  43


there are 43 different uniqe characters. this is much more than letters in english language.

second, we have to make correlation between chars and indices:

In [23]:
char_to_int = dict((c, i) for i, c in enumerate(chars))
int_to_char = dict((i, c) for i, c in enumerate(chars))

now we create our input and output data for the model, seq_length holds the length of sequence of characters that our model read in order to predict the next character.  
seq_in will holds 10 sequence of characters and seq_y the character to predict after this sequence for each i.  
and dataX and dataY will hold the sequences (patterns) and the character to predict respectively.

In [24]:
seq_length = 10
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = leaText_[i:i + seq_length]
    seq_out = leaText_[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print("number of patterns: ", n_patterns)

number of patterns:  7469


now we will reshape X to be [samples, time steps, features], and normalize it


In [25]:
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
X = X / float(n_vocab)

Converts a class vector to binary class matrix.

In [26]:
y = np_utils.to_categorical(dataY)

#### define LSTM sequential model:  
**Parameter tuning:**  
we define a single hidden LSTM layer with 256 memory units. (units recommended to be in jump of 32 units)   
The network uses dropout probability of 0.2.  
The output layer is a Dense layer using the softmax activation, this function calculate for each character a prediction between 0 and 1.

In [27]:
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))

**compile the model:**  
we use rmsprop function as optimizer  
and we define loss and metrics, so we can calculate the loss and accuracy of the model. because this is not classification problem , and we are not interested to acheive the most accurate model, the loss function is more important for us, our goal will be to minimize the loss function in order to get generalization.

In [28]:
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

**define the checkpoint:**  
because training the model will take a lot of time , and we train it in the labs, it's more secure if we will save the best weight result each time, so in case something happen, the results will be saved.



In [29]:
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

**fit the model:**  
epochs=350 , we set it with high value, In case we notice that the value is not small we will stop the run and save us the best result with checkPoint.  
batch_size- we read that in general case, larger batch_size result in faster progress in training, but don't always converge as fast. Smaller batch sizes train slower, but can converge faster. our data set is small so we decided to set this parameter with value of 32 character each time (small value).
verbose- set with 1 so it will plot the proccess.

In [None]:
model.fit(X, y, epochs=350, batch_size=32,verbose=1, callbacks=callbacks_list)

load the best result weights:

In [242]:
model.load_weights('./weights/weights-improvement-348-0.1319-leapetural.hdf5')

let's make evaluation to find the loss and score:

In [32]:
model.evaluate(X,y)



[0.082437949676940972, 0.96465390279823271]

these results are really good! 0.96 accuracy, and low lose rate. let's generate new text by this model!!!

*Text Generation*

we are picking seed, this seed will be in length of one sequence (10 characters in this case), and for this seed we generate the next characters in order to get new similar message.  
message length will be with 350 characters, and 8 messages in total (30%).  
we decided these values as a result of the analysis done in part one.

In [29]:
def generate_text(numOfMess,length,email,model,int_to_char,dataX,n_vocab):
    lst=[]
    for j in range(0,numOfMess):
           # pick a random seed
            start = numpy.random.randint(0, len(dataX)-1)
            pattern = dataX[start]
            message=''.join([int_to_char[value] for value in pattern])
            # generate characters
            for i in range(length):
                x = numpy.reshape(pattern, (1, len(pattern), 1))
                #normalize
                x = x / float(n_vocab)
                #predict
                prediction = model.predict(x, verbose=0)[0]
                prediction = np.log(prediction)
                pv = np.exp(prediction)/np.sum(np.exp(prediction)) 
                rangev = range(len(prediction)) 
                index = np.random.choice(rangev, p=pv)
                #mapping to char
                result = int_to_char[index]
                seq_in = [int_to_char[value] for value in pattern]
                message=message+result
                pattern.append(index)
                pattern = pattern[1:len(pattern)]
            lst.append([email,message])
    return lst

*** for the next types we will use this function in generate new text by n-seq-architecture model ***

In [None]:
lst_generate=generate_text(1,350,"leapeturel@gmail.com",model,int_to_char,dataX,n_vocab)

the result:

In [2313]:
lst_generate

[['leapeturel@gmail.com',
  " long  well, comparing to my 5 months  the regtln oo ieee a ireteh itiend wlce bot,a let ab ut miiek. anh mo maneong . i we nean anirue mo tans aapnndell   shley, th'oem tou yolrdod # #eemr anhathe io ioatef  #  ah ah  iacap !#   rea seollt and aod  nowtrs aih tfr shne   cd wrurio snay roiee arw n lepe d woe tiil wour atu o leke mt ffm snnidhh t ande fa caro"],
 ['leapeturel@gmail.com',
  " study.. when is tou#enn y feee dayke tece ttut toir to europe. and you# are you oki? thur hrddef tourrt shay anhyte ind pe#gnr do .o laseth#i mat tb moet wnu,#teay i sasn't#myself, but i 'm very tired, a little bit stress and tomorrow i'm going back#to poland#it was really nice, i was studying hnre naec to tolnk weed#you siilld .bte not wau iaie  inr aeeuu "],
 ['leapeturel@gmail.com',
  "nd#from france for 1 week, and we meet only french people so i don't somak weir a toedhddtl ail theinsne,b a don seeemder, i like when you will go whailand, i will go in cambodga only e

convert back each '#' to newLine:

In [2314]:
for i in  np.arange(0,len(lst_generate),1):
            lst_generate[i][1] = re.sub(r'#','/r/n',lst_generate[i][1])#\r\n

In [2315]:
lst_generate

[['leapeturel@gmail.com',
  " long  well, comparing to my 5 months  the regtln oo ieee a ireteh itiend wlce bot,a let ab ut miiek. anh mo maneong . i we nean anirue mo tans aapnndell   shley, th'oem tou yolrdod /r/n /r/neemr anhathe io ioatef  /r/n  ah ah  iacap !/r/n   rea seollt and aod  nowtrs aih tfr shne   cd wrurio snay roiee arw n lepe d woe tiil wour atu o leke mt ffm snnidhh t ande fa caro"],
 ['leapeturel@gmail.com',
  " study.. when is tou/r/nenn y feee dayke tece ttut toir to europe. and you/r/n are you oki? thur hrddef tourrt shay anhyte ind pe/r/ngnr do .o laseth/r/ni mat tb moet wnu,/r/nteay i sasn't/r/nmyself, but i 'm very tired, a little bit stress and tomorrow i'm going back/r/nto poland/r/nit was really nice, i was studying hnre naec to tolnk weed/r/nyou siilld .bte not wau iaie  inr aeeuu "],
 ['leapeturel@gmail.com',
  "nd/r/nfrom france for 1 week, and we meet only french people so i don't somak weir a toedhddtl ail theinsne,b a don seeemder, i like when you will

save the data:

In [2317]:
title=["email address","message"]
lst_generate.insert(0,title)
with open('./generate/leaGeneratedData.csv','w',newline='') as fp:
    a = csv.writer(fp, delimiter=',')
    a.writerows(lst_generate)

we could remove also numbers because there are bery rare, but we've already received satisfactory results so there's no need for that.

### Type 2: sender address - illonashachar78@gmail.com

**examine the data**

In [2448]:
ilonaText

" Hi  no, I will be free for almost all Day!\r\nShavit I am not going out today but If You would like to go I would\r\nrecomend You club called  for tonight!\r\n \r\n  You can go free before 10 pm. At 10 people will start letting a lot of\r\nlanterns in the air and then will be a concert .\r\nPeople just want to beat a record (a lot of lanterns) without any reason.\r\nDo what are Your plans for tomorrow? Are You going to Kazimierz?\r\n \r\n  What time are You going to Kazimierz?\r\n \r\n  I think it is better for You to go there by bus because my parents will\r\nstart travel around 8.20. IT is to early!\r\nI will check schedul of buses for You now\r\nBus to Kazimierz Dolny 9.25 - You will be in Kazimierz at 10.35 (big bus )\r\nor 10.20 -You will be in Kazimierz at 11.40 (small bus )\r\nThe places where the bus start is : 9.25 : ul. Ruska 7/pod  10.\r\n20:\r\nul. Ruska\r\nBoth are behind bus station close to this Castle where we where today\r\n\r\nShavi\r\n \r\n  Go to Riviera! \r\nI sh

In [2449]:
get_general_details(ilonaText)

+---------------+---------------+
| Punctuation   |  Appearance   |
| :             |       8       |
+---------------+---------------+
| ,             |      17       |
+---------------+---------------+
| -             |       3       |
+---------------+---------------+
| =             |       0       |
+---------------+---------------+
| ;             |       0       |
+---------------+---------------+
| .             |      108      |
+---------------+---------------+
| !             |      48       |
+---------------+---------------+
| ?             |      24       |
+---------------+---------------+
| (             |      17       |
+---------------+---------------+
| )             |      16       |
+---------------+---------------+
| '             |       5       |
+---------------+---------------+
| #             |       0       |
+---------------+---------------+
| \             |       0       |
+---------------+---------------+

Number of words:  1442
Number of characters:  8

the proccess we will do here is similar to the previous type, so we will not explain deeply.  
first: we will remove all the characters that are not commonly used. {:,-,(,),'} by removing them we are not change the message significantly but we will reduce the vocabulary, so our model will train better. we will not remove '?' because existence of this character is crucial for sentence meaning structure.  
second: we can see that capital letters are not commonly use (just 3% from WHOLE the messages), so we will remove them from the same reason as before.  
we also change \r\n to '#' like in the previous type.

In [246]:
ilonaText_=re.sub(r'[\(\)\;\-\\]','',ilonaText)
ilonaText_=re.sub(r'\r\n','#',ilonaText_)
ilonaText_=ilonaText_.lower()
ilonaText_

" hi  no, i will be free for almost all day!#shavit i am not going out today but if you would like to go i would#recomend you club called  for tonight!# #  you can go free before 10 pm. at 10 people will start letting a lot of#lanterns in the air and then will be a concert .#people just want to beat a record a lot of lanterns without any reason.#do what are your plans for tomorrow? are you going to kazimierz?# #  what time are you going to kazimierz?# #  i think it is better for you to go there by bus because my parents will#start travel around 8.20. it is to early!#i will check schedul of buses for you now#bus to kazimierz dolny 9.25  you will be in kazimierz at 10.35 big bus #or 10.20 you will be in kazimierz at 11.40 small bus #the places where the bus start is : 9.25 : ul. ruska 7/pod  10.#20:#ul. ruska#both are behind bus station close to this castle where we where today##shavi# #  go to riviera! #i should go there yesterday because it was latino night [image: #] have a#nice eveni

NOTICE: we can also notice that there are numbers in the text but the number has important meaning because most of the text contains explanations of bus numbers and hours so the information is relevant to create a similar message.

**building the model**

number of words and number of characters are almost similar like in the previous type. we will EXACTLY the same model, because we were very satisfied from previous model results. 

In [19]:
import numpy
def n_seq_model(text,seq_length,n_hidden):
    chars = sorted(list(set(text)))
    n_chars = len(text)
    n_vocab = len(chars)
    print("Total Characters: ", n_chars)
    print("Total Vocab: ", n_vocab)
    char_to_int = dict((c, i) for i, c in enumerate(chars))
    int_to_char = dict((i, c) for i, c in enumerate(chars))
    dataX = []
    dataY = []
    for i in range(0, n_chars - seq_length, 1):
        seq_in = text[i:i + seq_length]
        seq_out = text[i + seq_length]
        dataX.append([char_to_int[char] for char in seq_in])
        dataY.append(char_to_int[seq_out])
    n_patterns = len(dataX)
    print("number of patterns: ", n_patterns)
    X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
    X = X / float(n_vocab)
    y = np_utils.to_categorical(dataY)
    model = Sequential()
    model.add(LSTM(n_hidden, input_shape=(X.shape[1], X.shape[2])))
    model.add(Dropout(0.2))
    model.add(Dense(y.shape[1], activation='softmax'))
    return n_vocab,int_to_char,char_to_int,dataX,X,y,model

*** for the next types we will use this function in order to create model in n-seq-architecture ***

In [247]:
n_vocab,int_to_char,char_to_int,dataX,X,y,model1=n_seq_model(ilonaText_,10,256)

Total Characters:  7819
Total Vocab:  47
number of patterns:  7809


In [248]:
model1.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
filepath="weights-improvement-{epoch:02d}-{loss:.4f}-ilona1.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
model1.fit(X, y, epochs=350, batch_size=32,verbose=1, callbacks=callbacks_list)

load the best results:

In [249]:
model1.load_weights('./weights/weights-improvement-345-0.1442-ilona.hdf5')

we will evaluate to find the loss and accurate:

In [2518]:
model1.evaluate(X,y,verbose=1)



[0.082060105017654594, 0.96375976437443978]

again, really impressive results, let's generate some text:

In [2537]:
lst_generate_1=generate_text(8,350,"illonashachar78@gmail.com",model1,int_to_char,dataX,n_vocab)
lst_generate_1

[['illonashachar78@gmail.com',
  'ce you left poland!  i#mean: without rain!#all of my friend cut she is going home soon.#i will be  happy to#see how people in iorael  1925/27.04 .? would it be good#time for u to meet with you even#one time.... # # yes i am stilg wor cou weaso fac#  nois i cn gotb us wee ca tog rest   ....#! #ot thaep bocbmee boomtd aa to ys ealioire bs  uotbnh wous ala moap'],
 ['illonashachar78@gmail.com',
  'n kazimierz at 10.35 big bus #or 10.20 you will be il kazimierz at 10.35 big bus #or 10.20 you will be il kazimierz at 10.35 big bus #or 10.20 you will be il kazimierz at 10.35 big bus #or 10.20 you will be il kazimierz at 10.35 big bus #or 10.20 you will be il kazimierz at 10.35 big bus #or 10.20 you will be il kazimierz at 10.35 big bus #or 10.20 you will '],
 ['illonashachar78@gmail.com',
  'hi shavit ! !how are you?  i moved to warsaw tomarr wialewn are  aagoes tamgusou,iira iashen io #5a..0 7i vo bu so i wouldnt like you to change your plans aod make any#pr

convert # to \r\n and save it:

In [2547]:
for i in  np.arange(0,len(lst_generate_1),1):
            lst_generate_1[i][1] = re.sub(r'#','/r/n',lst_generate_1[i][1])#\r\n
title=["email address","message"]
lst_generate_1.insert(0,title)
with open('./generate/ilonaGeneratedData.csv','w',newline='') as fp:
    a = csv.writer(fp, delimiter=',')
    a.writerows(lst_generate_1)

### Type 3 - dvirnimrod84@gmail.com

**examine the data**

In [120]:
dvirText

'I\'m not even sure I have depression, I think this may just be \'me\'.\r\n\r\nI have always been pretty useless socially, but have had a normal\r\nupbringing, no horrible experiences, have gone through education fine and\r\neven got jobs but I\'ve never enjoyed life, never really cared and normally\r\nfeel like I\'m not really worth anything and, inevitably enventually, will\r\nbecome a burden.\r\n\r\nI\'m not going to be dramatic and say I\'m going to end it, it sounds so\r\npathetic (no offence intended to anyone) but I have thoughts of \'going\' or\r\neven dreams that I could die \'blamelessly\' thorugh accident or illness.\r\nSelfish I know.\r\n\r\nI have a cycle. Get job, put on confident easy going persona, get\r\nphysically tired from doing that, lose energy to maintain job, focus on\r\nnegative, leave job in some form (fired/quit). Friends are much the same,\r\nplay easy going fun, can\'t keep it up, lose or push them away.\r\nRelationships, get attracted to ones who need help

In [118]:
get_general_details(dvirText)

+---------------+---------------+
| Punctuation   |  Appearance   |
| :             |       5       |
+---------------+---------------+
| ,             |      483      |
+---------------+---------------+
| -             |      97       |
+---------------+---------------+
| =             |       0       |
+---------------+---------------+
| ;             |      18       |
+---------------+---------------+
| .             |      525      |
+---------------+---------------+
| !             |      128      |
+---------------+---------------+
| ?             |      32       |
+---------------+---------------+
| (             |      51       |
+---------------+---------------+
| )             |      51       |
+---------------+---------------+
| '             |      641      |
+---------------+---------------+
| #             |       0       |
+---------------+---------------+
| \             |       0       |
+---------------+---------------+

Number of words:  1442
Number of characters:  5

In this type, unlike the previous ones, there are many characters , we can see that the number of digits in the text are very small so for our goal, it will not effet if we will remove them and thereby reduce the size of the dictionary. the same about capital letters (convert to lower letters).  
we will also remove ':' because it exist only 5 times. and as we did before we change \r\n to #.  
we do all this steps to reduce the size of the vocabulary so the model will be trained better and faster.

In [16]:
dvirText_=re.sub(r'[\(\)\;\-\\\:]','',dvirText)
dvirText_=re.sub(r'\r\n',' # ',dvirText_)
dvirText_=re.sub(r'[0-9]',' '+ ''+ ' ',dvirText_)
dvirText_=dvirText_.lower()
dvirText_

'i\'m not even sure i have depression, i think this may just be \'me\'. #  # i have always been pretty useless socially, but have had a normal # upbringing, no horrible experiences, have gone through education fine and # even got jobs but i\'ve never enjoyed life, never really cared and normally # feel like i\'m not really worth anything and, inevitably enventually, will # become a burden. #  # i\'m not going to be dramatic and say i\'m going to end it, it sounds so # pathetic no offence intended to anyone but i have thoughts of \'going\' or # even dreams that i could die \'blamelessly\' thorugh accident or illness. # selfish i know. #  # i have a cycle. get job, put on confident easy going persona, get # physically tired from doing that, lose energy to maintain job, focus on # negative, leave job in some form fired/quit. friends are much the same, # play easy going fun, can\'t keep it up, lose or push them away. # relationships, get attracted to ones who need help, help them in anyway

In [2554]:
get_general_details(dvirText_)

+---------------+---------------+
| Punctuation   |  Appearance   |
| :             |       0       |
+---------------+---------------+
| ,             |      483      |
+---------------+---------------+
| -             |       0       |
+---------------+---------------+
| =             |       0       |
+---------------+---------------+
| ;             |       0       |
+---------------+---------------+
| .             |      525      |
+---------------+---------------+
| !             |      128      |
+---------------+---------------+
| ?             |      32       |
+---------------+---------------+
| (             |       0       |
+---------------+---------------+
| )             |       0       |
+---------------+---------------+
| '             |      641      |
+---------------+---------------+
| #             |     1053      |
+---------------+---------------+
| \             |       0       |
+---------------+---------------+

Number of words:  1442
Number of characters:  5

*** building the model ***

we set sequence of 50, this is less than 1% from one message text, so we think this is not too much. we also set 128 memory units because with 256 it will take much more time to train, and we have limited time.

In [20]:
n_vocab,int_to_char,char_to_int,dataX,X,y,model2=n_seq_model(dvirText_,50,256)

Total Characters:  60297
Total Vocab:  39
number of patterns:  60247


here we will try another optimizer: adam.

In [21]:
model2.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

** parameter tuning: **  
epochs- as we said, much better, and we can always stop it in case of convergence.  
batch_size- we decided to set this value with 128, because the data is bigger and it will take a lot of time to train.

In [None]:
filepath="weights-improvement-{epoch:02d}-{loss:.4f}-dvirN.hdf5"#ilona1
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
model2.fit(X, y, epochs=650, batch_size=32,verbose=1, callbacks=callbacks_list)

In [22]:
model2.load_weights('./weights/weights-improvement-302-0.3309-dvirN.hdf5')

In [143]:
model2.evaluate(X,y,verbose=1)



[0.10928729084657406, 0.9755174531511942]

In [27]:
lst_generate_2=generate_text(9,1200,"dvirnimrod84@gmail.com",model2,int_to_char,dataX,n_vocab)
lst_generate_2

[['dvirnimrod84@gmail.com',
  "y but, nice as he was, he tried a religious route with me # never going to work and i put on my fake chatty character for the hour, # even if i was tired afterwards. it didn't do anything. even if i try to get # refered again by the doctor, i'd actually be scared that i'd be seeing him # agatn, or be seen to have asked not for him  andka's soope # realls it woir toued to got yous feel in enptherg you # hnow wit ana you taem to bn a pol to tae as i hae # ae erterelr # fve th the depsert on them asa leoeey bempsse and maksed #      i mose the say you write! it soulds really feml then # uould seal is peapll seated i meee to therp abdut wourd # focaase in iiss me an a ceteer anf cad to eete avt there is a limit you want to # talk about that soueane the lotge st ditces ant roee # aserytion it shis till to soy onn thet. i al no ls esue thme # to rooue to the point not she hosge men. so an suos dace ti them # aweaale to get ma i've lred ao a lts lo e bat ee a lk

In [28]:
for i in  np.arange(0,len(lst_generate_2),1):
            lst_generate_2[i][1] = re.sub(r'#','/r/n',lst_generate_2[i][1])#\r\n
title=["email address","message"]
lst_generate_2.insert(0,title)
with open('./generate/dvirGeneratedData.csv','w',newline='') as fp:
    a = csv.writer(fp, delimiter=',')
    a.writerows(lst_generate_2)

### Type 4- asafdavid08@gmail.com

**examine the data**

In [19]:
asafText



In [17]:
get_general_details(asafText)

+---------------+---------------+
| Punctuation   |  Appearance   |
| :             |      24       |
+---------------+---------------+
| ,             |      220      |
+---------------+---------------+
| -             |      41       |
+---------------+---------------+
| =             |       0       |
+---------------+---------------+
| ;             |      16       |
+---------------+---------------+
| .             |      427      |
+---------------+---------------+
| !             |      15       |
+---------------+---------------+
| ?             |      59       |
+---------------+---------------+
| (             |      10       |
+---------------+---------------+
| )             |      10       |
+---------------+---------------+
| '             |      159      |
+---------------+---------------+
| #             |       1       |
+---------------+---------------+
| \             |       0       |
+---------------+---------------+

Number of words:  1442
Number of characters:  2

In [20]:
asafText_=re.sub(r'[\(\)\;\\\:\#]','',asafText)
asafText_=re.sub(r'\r\n','#',asafText_)
asafText_=re.sub(r'[0-9]',' '+ ''+ ' ',asafText_)
asafText_=asafText_.lower()
asafText_



**building the model**

In [30]:
n_vocab,int_to_char,char_to_int,dataX,X,y,model3=n_seq_model(asafText_,20,256)

Total Characters:  29315
Total Vocab:  40
number of patterns:  29295


In [31]:
model3.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

In [34]:
filepath="weights-improvement-{epoch:02d}-{loss:.4f}-asaf1.hdf5"#ilona1
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
model3.fit(X, y, epochs=650, batch_size=40,verbose=1, callbacks=callbacks_list)

Epoch 1/650
Epoch 00001: loss improved from inf to 3.00996, saving model to weights-improvement-01-3.0100-asaf1.hdf5
Epoch 2/650
Epoch 00002: loss improved from 3.00996 to 2.91390, saving model to weights-improvement-02-2.9139-asaf1.hdf5
Epoch 3/650
Epoch 00003: loss improved from 2.91390 to 2.86563, saving model to weights-improvement-03-2.8656-asaf1.hdf5
Epoch 4/650
Epoch 00004: loss improved from 2.86563 to 2.83487, saving model to weights-improvement-04-2.8349-asaf1.hdf5
Epoch 5/650
Epoch 00005: loss improved from 2.83487 to 2.81121, saving model to weights-improvement-05-2.8112-asaf1.hdf5
Epoch 6/650
Epoch 00006: loss improved from 2.81121 to 2.78917, saving model to weights-improvement-06-2.7892-asaf1.hdf5
Epoch 7/650
Epoch 00007: loss improved from 2.78917 to 2.76945, saving model to weights-improvement-07-2.7695-asaf1.hdf5
Epoch 8/650
Epoch 00008: loss improved from 2.76945 to 2.75049, saving model to weights-improvement-08-2.7505-asaf1.hdf5
Epoch 9/650
Epoch 00009: loss improv

Epoch 29/650
Epoch 00029: loss improved from 1.67460 to 1.60742, saving model to weights-improvement-29-1.6074-asaf1.hdf5
Epoch 30/650
Epoch 00030: loss improved from 1.60742 to 1.53751, saving model to weights-improvement-30-1.5375-asaf1.hdf5
Epoch 31/650
Epoch 00031: loss improved from 1.53751 to 1.47632, saving model to weights-improvement-31-1.4763-asaf1.hdf5
Epoch 32/650
Epoch 00032: loss improved from 1.47632 to 1.40353, saving model to weights-improvement-32-1.4035-asaf1.hdf5
Epoch 33/650
Epoch 00033: loss improved from 1.40353 to 1.34368, saving model to weights-improvement-33-1.3437-asaf1.hdf5
Epoch 34/650
Epoch 00034: loss improved from 1.34368 to 1.28428, saving model to weights-improvement-34-1.2843-asaf1.hdf5
Epoch 35/650
Epoch 00035: loss improved from 1.28428 to 1.22851, saving model to weights-improvement-35-1.2285-asaf1.hdf5
Epoch 36/650
Epoch 00036: loss improved from 1.22851 to 1.17807, saving model to weights-improvement-36-1.1781-asaf1.hdf5
Epoch 37/650
Epoch 00037

Epoch 57/650
Epoch 00057: loss improved from 0.58546 to 0.56252, saving model to weights-improvement-57-0.5625-asaf1.hdf5
Epoch 58/650
Epoch 00058: loss improved from 0.56252 to 0.55500, saving model to weights-improvement-58-0.5550-asaf1.hdf5
Epoch 59/650
Epoch 00059: loss improved from 0.55500 to 0.53975, saving model to weights-improvement-59-0.5398-asaf1.hdf5
Epoch 60/650
Epoch 00060: loss improved from 0.53975 to 0.53495, saving model to weights-improvement-60-0.5349-asaf1.hdf5
Epoch 61/650
Epoch 00061: loss improved from 0.53495 to 0.52440, saving model to weights-improvement-61-0.5244-asaf1.hdf5
Epoch 62/650
Epoch 00062: loss improved from 0.52440 to 0.51173, saving model to weights-improvement-62-0.5117-asaf1.hdf5
Epoch 63/650
Epoch 00063: loss improved from 0.51173 to 0.49771, saving model to weights-improvement-63-0.4977-asaf1.hdf5
Epoch 64/650
Epoch 00064: loss improved from 0.49771 to 0.48740, saving model to weights-improvement-64-0.4874-asaf1.hdf5
Epoch 65/650
Epoch 00065

Epoch 86/650
Epoch 00086: loss did not improve
Epoch 87/650
Epoch 00087: loss improved from 0.35954 to 0.35173, saving model to weights-improvement-87-0.3517-asaf1.hdf5
Epoch 88/650
Epoch 00088: loss improved from 0.35173 to 0.35046, saving model to weights-improvement-88-0.3505-asaf1.hdf5
Epoch 89/650
Epoch 00089: loss did not improve
Epoch 90/650
Epoch 00090: loss improved from 0.35046 to 0.35032, saving model to weights-improvement-90-0.3503-asaf1.hdf5
Epoch 91/650
Epoch 00091: loss improved from 0.35032 to 0.34929, saving model to weights-improvement-91-0.3493-asaf1.hdf5
Epoch 92/650
Epoch 00092: loss improved from 0.34929 to 0.34269, saving model to weights-improvement-92-0.3427-asaf1.hdf5
Epoch 93/650
Epoch 00093: loss improved from 0.34269 to 0.33192, saving model to weights-improvement-93-0.3319-asaf1.hdf5
Epoch 94/650
Epoch 00094: loss did not improve
Epoch 95/650
Epoch 00095: loss did not improve
Epoch 96/650
Epoch 00096: loss improved from 0.33192 to 0.32618, saving model to

Epoch 117/650
Epoch 00117: loss improved from 0.28610 to 0.28551, saving model to weights-improvement-117-0.2855-asaf1.hdf5
Epoch 118/650
Epoch 00118: loss improved from 0.28551 to 0.28500, saving model to weights-improvement-118-0.2850-asaf1.hdf5
Epoch 119/650
Epoch 00119: loss improved from 0.28500 to 0.27802, saving model to weights-improvement-119-0.2780-asaf1.hdf5
Epoch 120/650
Epoch 00120: loss did not improve
Epoch 121/650
Epoch 00121: loss did not improve
Epoch 122/650
Epoch 00122: loss improved from 0.27802 to 0.27512, saving model to weights-improvement-122-0.2751-asaf1.hdf5
Epoch 123/650
Epoch 00123: loss did not improve
Epoch 124/650
Epoch 00124: loss did not improve
Epoch 125/650
Epoch 00125: loss did not improve
Epoch 126/650
Epoch 00126: loss did not improve
Epoch 127/650
Epoch 00127: loss improved from 0.27512 to 0.26713, saving model to weights-improvement-127-0.2671-asaf1.hdf5
Epoch 128/650
Epoch 00128: loss did not improve
Epoch 129/650
Epoch 00129: loss did not impr

Epoch 00182: loss improved from 0.21455 to 0.21269, saving model to weights-improvement-182-0.2127-asaf1.hdf5
Epoch 183/650
Epoch 00183: loss did not improve
Epoch 184/650
Epoch 00184: loss improved from 0.21269 to 0.21239, saving model to weights-improvement-184-0.2124-asaf1.hdf5
Epoch 185/650
Epoch 00185: loss improved from 0.21239 to 0.21173, saving model to weights-improvement-185-0.2117-asaf1.hdf5
Epoch 186/650
Epoch 00186: loss improved from 0.21173 to 0.20618, saving model to weights-improvement-186-0.2062-asaf1.hdf5
Epoch 187/650
Epoch 00187: loss did not improve
Epoch 188/650
Epoch 00188: loss did not improve
Epoch 189/650
Epoch 00189: loss did not improve
Epoch 190/650
Epoch 00190: loss did not improve
Epoch 191/650
Epoch 00191: loss did not improve
Epoch 192/650
Epoch 00192: loss did not improve
Epoch 193/650
Epoch 00193: loss did not improve
Epoch 194/650
Epoch 00194: loss improved from 0.20618 to 0.20356, saving model to weights-improvement-194-0.2036-asaf1.hdf5
Epoch 195/

Epoch 217/650
Epoch 00217: loss did not improve
Epoch 218/650
Epoch 00218: loss improved from 0.19293 to 0.19182, saving model to weights-improvement-218-0.1918-asaf1.hdf5
Epoch 219/650
Epoch 00219: loss did not improve
Epoch 220/650
Epoch 00220: loss did not improve
Epoch 221/650
Epoch 00221: loss did not improve
Epoch 222/650
Epoch 00222: loss improved from 0.19182 to 0.19018, saving model to weights-improvement-222-0.1902-asaf1.hdf5
Epoch 223/650
Epoch 00223: loss improved from 0.19018 to 0.18547, saving model to weights-improvement-223-0.1855-asaf1.hdf5
Epoch 224/650
Epoch 00224: loss did not improve
Epoch 225/650
Epoch 00225: loss did not improve
Epoch 226/650
Epoch 00226: loss did not improve
Epoch 227/650
Epoch 00227: loss did not improve
Epoch 228/650
Epoch 00228: loss did not improve
Epoch 229/650
Epoch 00229: loss did not improve
Epoch 230/650

KeyboardInterrupt: 

In [35]:
model3.load_weights('./weights/weights-improvement-223-0.1855-asaf1.hdf5')

In [36]:
model3.evaluate(X,y,verbose=1)



[0.066968457580609708, 0.98433179724112696]

In [43]:
lst_generate_3=generate_text(9,500,"asafdavid08@gmail.com",model3,int_to_char,dataX,n_vocab)
lst_generate_3

[['asafdavid08@gmail.com',
  "e?# # most of us a pretty happy with last year's phone purchase be it a pixel,#iphone, galaxy or something else.##yet, try as they might, no oem can produce a truly perfect phone. as#technolo,y ca foebeg ot se#eits.siiu crnicteit, whes will be#mama ioialiig to toree bn f#cueo codu'essoayears,cehen ahhinsdt s#  uss, nt you sot sis'des'rlek sf say prdg. tailldne anple toirl#a   moo iivf sp tuarene tisseg #aeainn an whss pisect thet ir eg pacuuengy  do sanpen. thnnen ar somsh kneoi oo bntn ere oftas#bomagssnnlee#.ten "],
 ['asafdavid08@gmail.com',
  "odel. and while it's been a#great device, i can't help but feel as though i want a larger device.##but not for battery life. i simply like my watches on the larger size. i#think a     mm size option would suit me just fine.##what do you all think of sc  x##let yesl thes iase cee#aeare no ss eede teo then th tey #nhtm tou lrs toer aallsee##toi saes ssm aelisellel asdrg tielly andow # cot soil sfey oete aoh ihnngn 

In [44]:
for i in  np.arange(0,len(lst_generate_3),1):
            lst_generate_3[i][1] = re.sub(r'#','/r/n',lst_generate_3[i][1])#\r\n
title=["email address","message"]
lst_generate_3.insert(0,title)
with open('./generate/asafGeneratedData.csv','w',newline='') as fp:
    a = csv.writer(fp, delimiter=',')
    a.writerows(lst_generate_3)

### Type 5- itaicohen266@gmail.com

**examine the data**

In [45]:
itaiText

'No  the intent of this guide isn\'t a how-to or strategy guide. My \r\nintent\r\nis to build an index of all the great information that available on pet\r\nbattling that is currently scattered across sites and posts and condense it\r\nto one location. I\'m hoping to keep this fairly up-to-date as best I can.\r\n\r\nIf you spot that I\'ve got some information incorrect, some information is\r\nmissing, or you have some information/link/macro/addon you\'d like to share,\r\nplease let me know in the comments and I will update.\r\n \r\n Pet battles are one of the features Blizzard introduced with the Mists of\r\nPandaria expansion for WoW. Currently, they\'re mini-games within the game,\r\nintended as an alternative action or for when there\'s down-time in the\r\ngame. As there are really no rewards outside of pet-battle rewards (more\r\npets, achievements, titles, pet battle supplies), they are not an\r\nalternative to gearing up your character.\r\n \r\n You need to have a character that 

In [46]:
get_general_details(itaiText)

+---------------+---------------+
| Punctuation   |  Appearance   |
| :             |      25       |
+---------------+---------------+
| ,             |      224      |
+---------------+---------------+
| -             |      58       |
+---------------+---------------+
| =             |       0       |
+---------------+---------------+
| ;             |       0       |
+---------------+---------------+
| .             |      269      |
+---------------+---------------+
| !             |      13       |
+---------------+---------------+
| ?             |      30       |
+---------------+---------------+
| (             |      80       |
+---------------+---------------+
| )             |      81       |
+---------------+---------------+
| '             |      79       |
+---------------+---------------+
| #             |       1       |
+---------------+---------------+
| \             |       0       |
+---------------+---------------+

Number of words:  1442
Number of characters:  2

In [19]:
itaiText_ = re.sub(r'https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)',' ',itaiText)
itaiText_=re.sub(r'\r\n',' # ',itaiText_)
itaiText_=re.sub(r'[0-9]',' '+ ''+ ' ',itaiText_)
itaiText_=re.sub(r'\([^)]*\)|\<[^)]*\>', '', itaiText_)
itaiText_=re.sub(r'[\;\\\:\#\<\>\*\%]',' ',itaiText_)
itaiText_= ' '.join(itaiText_.split()) # substitute multiply whitespace to one whitespace
itaiText_=itaiText_.lower()
itaiText_

'no the intent of this guide isn\'t a how-to or strategy guide. my intent is to build an index of all the great information that available on pet battling that is currently scattered across sites and posts and condense it to one location. i\'m hoping to keep this fairly up-to-date as best i can. if you spot that i\'ve got some information incorrect, some information is missing, or you have some information/link/macro/addon you\'d like to share, please let me know in the comments and i will update. pet battles are one of the features blizzard introduced with the mists of pandaria expansion for wow. currently, they\'re mini-games within the game, intended as an alternative action or for when there\'s down-time in the game. as there are really no rewards outside of pet-battle rewards , they are not an alternative to gearing up your character. you need to have a character that is level and have gold. you also need to have a full account - pet battles will not work on starter/free to play a

In [31]:
n_vocab,int_to_char,char_to_int,dataX,X,y,model4=n_seq_model(itaiText_,15,128)

Total Characters:  22057
Total Vocab:  37
number of patterns:  22042


In [32]:
model4.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

In [None]:
filepath="weights-improvement-{epoch:02d}-{loss:.4f}-itai.hdf5"#ilona1
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
model4.fit(X, y, epochs=950, batch_size=32,verbose=1, callbacks=callbacks_list)

In [33]:
model4.load_weights('./weights/weights-improvement-929-0.3824-itai.hdf5')

In [24]:
model4.evaluate(X,y,verbose=1)



[0.16239810233572177, 0.96071136917179678]

In [34]:
lst_generate_4=generate_text(9,500,"itaicohen266@gmail.com",model4,int_to_char,dataX,n_vocab)
lst_generate_4

[[' itaicohen266@gmail.com',
  'capture increase the looer whth tee cemtl.tep bacturt ho a doiab oroekh o teiuhu lee  mo yhu bla burimi nn yel aastles aadtuia whr batoles wilt bantanaeey lut.a man kors  nogits you hen  uelahte pr tet atu s toodse dr ilt  tor cano ws bo fageiyh uoen  aso psem pne iet roa petse pees a fcisac iat tou agdou te mak  woer ir daloui pats  ongead.safw oo nnnni h tecltent te and ihcnab tueatornn sess aoe wbtsldt te meser t a crrollt on teu bantstsn toar wour tuirtion toe metev auome tot eoeoon .eaiist you cev f te ca'],
 [' itaicohen266@gmail.com',
  'a battling and capturing wild pets . many battle pets are able to be caged. vhedee mats wn thakn  ioeedmef tr bece co eani to bet has hase tnen a carels paa maetoi gfden,achuceu  entiti lhns won psns ll reee ro toal d ceadec lfl aoipalesd i to pald thel shok sees tiitl ii news woulu mege  hli foenllen is tit has seer.t demise teneiss,wn wiakl i iandd iets.frddd yhu lo tins thl cetill heneso fy msoeeig yparte orvee

In [35]:
for i in  np.arange(0,len(lst_generate_4),1):
            lst_generate_4[i][1] = re.sub(r'#','/r/n',lst_generate_4[i][1])#\r\n
title=["email address","message"]
lst_generate_4.insert(0,title)
with open('./generate/itaiGeneratedData.csv','w',newline='') as fp:
    a = csv.writer(fp, delimiter=',')
    a.writerows(lst_generate_4)

## what could we do to get even better results?

as you know we were limited on time,because we worked in the labs, so we couldn't train the model for more than a few hours. but some of the following things we tried to do and others we think may help us but due to time constraints just mention them.  

* try to reduce more the vocabulary by remove all punctuation from the source text.
* train the model on padded sentences rather than random sequences of characters, so it will be more precise.
* increase the number of epochs.
* try to Tune the batch size. 
* add more memory units or more layers.

** in conclusion ** , we think that the results we received were surprising, although the texts are not entirely readable in most of them you can still see sequences of words completely logical, and in the other parts of the text, where the generated text was bot so good, you can still understand the general idea. We think that with more training time and by exploring more the affects of different values for each parameter, we could achieved even better results.  
in long texts we saw that this architecture doesnt work well, and the texts were almost not readable, but even there we can see that the structure of word and sentences are logical.  
In addition, it is important to note that before we chose to use the above architecture, we debated to use word distribution. After investigation, we realized that the models that are divided into words are generally less precise. In addition, the number of words in e-mail messages is small in our case.

### Bibliography

we used the following resources in addition to lectures:  
* lstm keras example- https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py
* keras documentation in keras site. - https://keras.io/