# Final project
## Basics of approximation
### A recursive network model that learns to build sentences in English
The aim of the project is to build and fine-tune a model learning to build correct sentences in English. The collection of books about Harry Potter by J.K. Rowling was used as the teaching text. The neural network will build sentences based on its estimated probability of a given character on the basis of the characters preceding it in the sequence.

In [1]:
using Flux
using Flux: onehot, argmax, chunk, batchseq, throttle, crossentropy
using StatsBase: wsample
using Base.Iterators: partition
using BSON

└ @ CUDA C:\Users\Mateusz\.julia\packages\CUDA\wTQsK\src\initialization.jl:100


In [2]:
text = collect(read("Harry_Potter.txt",String));
alphabet = [unique(text)..., '_'];

### Exploratory analysis
The text on the basis of which the model will learn consists of 7 books from the Harry Potter saga, i.e. nearly 211 thousand. lines of text. The text consists of upper and lower case letters of the English alphabet and special characters. An interesting fact is that there are no numbers in the text, this may be due to the poor recognition of the text in the scanner, which was probably used to digitize the text. You may notice the letter "D" represented as "|)" on one of the first pages. Based on self  memory, there were dates in the text, but due to the volume of the text, it is impossible to fill them in manually.

In [3]:
println(alphabet)

['\ufeff', 'T', 'H', 'E', ' ', 'B', 'O', 'Y', 'W', 'L', 'I', 'V', 'D', '\r', '\n', 'M', 'r', '.', 'a', 'n', 'd', 's', 'u', 'l', 'e', 'y', ',', 'o', 'f', 'm', 'b', 'P', 'i', 'v', 't', 'w', 'p', 'h', 'c', 'k', '’', 'x', 'g', 'j', 'G', ';', '-', 'N', 'A', '“', '”', '—', 'F', 'z', '?', '!', 'q', 'C', 'S', 'R', 'K', '(', ')', ':', 'J', 'U', '"', '‘', 'Z', 'Q', '_']


Onehot coding was used to improve the learning properties of the network

In [4]:
text = map(ch -> onehot(ch, alphabet), text);
stop = onehot('_', alphabet);

### Architecture and parameters
Sequence length is assumed to be 100 characters because, based on the articles found, the average English sentence is 14.4 words and the average word is 6.47 letters, making an average of 93.17 characters per sentence with no spaces. The project investigated the ability to train the network for various batch_size values which define how many portions of the sequence the network will receive at the entrance to the model.

The model uses the LSTM function, which, due to its architecture, is able to properly filter information
and use them when the source is far away from the current neuron. The network learning ability for batch_size of 32, 64 and 128 was tested on the basis of the value of the loss function. The expected result is that the lattice will learn better (lower loss function value) for lower input batches.

The network consists of 2 hidden LSMT layers and a probability return layer between which the Dropout function was included for regularization.

In [5]:
N = length(alphabet);
seqlen = 100;
batch_size = 32;

In [6]:
Xs = collect(partition(batchseq(chunk(text, batch_size), stop), seqlen));
Ys = collect(partition(batchseq(chunk(text[2:end], batch_size), stop), seqlen));

In [7]:
m = Chain(
    LSTM(N, 128),
    Dropout(0.3), 
    LSTM(128, 128),
    Dropout(0.3),
    Dense(128, N),
    softmax)

function loss(xs, ys)
  l = sum(crossentropy.(m.(xs), ys))
  return l
end

opt = ADAM(0.001)


function sample(m, alphabet, len; temp = 1)
  Flux.reset!(m)
  buf = IOBuffer()
  c = rand(alphabet)
  for i = 1:len
    write(buf, c)
    c = wsample(alphabet, m(onehot(c, alphabet)))
  end
  return String(take!(buf))
end
evalcb = function ()
    @show loss(Xs[5], Ys[5])
    println(sample(m, alphabet, 100))
end

#4 (generic function with 1 method)

In [9]:
@info("Beginning training loop...")
best_ls = Inf
last_improvement = 0
bs32 = []
for epoch = 1:10
    @info "Epoch: $epoch"
    global best_ls, last_improvement
    Flux.train!(loss, params(m), zip(Xs, Ys), opt, cb=throttle(evalcb, 240))
    ls = loss(Xs[5], Ys[5])
    if ls <= best_ls
        @info "New best result: $ls"
        BSON.@save "char_model.bson" m
        best_ls = ls
        last_improvement = epoch
    end
    append!(bs32,ls)
end

┌ Info: Beginning training loop...
└ @ Main In[9]:1
┌ Info: Epoch: 1
└ @ Main In[9]:6


loss(Xs[5], Ys[5]) = 422.07043f0
wdng!zuEowggATOkP’S
)JHw”WOt"(sd)uIYDT—al;kw”OpwzTZR“NFg‘p—t’UFYmnFiVV:l?)?:uIOlw﻿M‘,vJ"ktw"﻿vV QOxs


┌ Info: New best result: 321.77805
└ @ Main In[9]:11
┌ Info: Epoch: 2
└ @ Main In[9]:6


loss(Xs[5], Ys[5]) = 321.6836f0
x.—Kimdv ifayn oron n’litiu y-e“ etsu
Uorvwaf.osw
aos
ul 
ku  ouapifra  Ww,m edo frV tg,tom drHpecm


┌ Info: New best result: 320.03
└ @ Main In[9]:11
┌ Info: Epoch: 3
└ @ Main In[9]:6


loss(Xs[5], Ys[5]) = 319.11584f0
B"mcor goiadam
e o

hratnmnuhur
orenr
ih asut nD u t w?ac
car
jaehl sn yaotoyoc”e  o
,nalth
e”d


┌ Info: New best result: 289.3968
└ @ Main In[9]:11
┌ Info: Epoch: 4
└ @ Main In[9]:6


loss(Xs[5], Ys[5]) = 289.0309f0
HU.b‘ g  ode rdn “ekat p lnrom rnejqmo”mraoi”yeey w plihdm Iot
utt oot
f
t l c ba
whmocwou.he.age


┌ Info: New best result: 274.95068
└ @ Main In[9]:11
┌ Info: Epoch: 5
└ @ Main In[9]:6


loss(Xs[5], Ys[5]) = 274.74323f0
L—a —gpedt drd hrth i f dli
 
hhen faaht melionr d. trabpWapf eo
Ula p, 

 ni
to

yu

m
?on T


┌ Info: New best result: 267.01474
└ @ Main In[9]:11
┌ Info: Epoch: 6
└ @ Main In[9]:6


loss(Xs[5], Ys[5]) = 266.82718f0
iDetqat s Kn d,. ry  amihc oce iuvlrwfn?d” meeugn Ad twis. g duvehint dr weonme totte hy at..greed e


┌ Info: New best result: 260.9972
└ @ Main In[9]:11
┌ Info: Epoch: 7
└ @ Main In[9]:6


loss(Xs[5], Ys[5]) = 260.6558f0
kJvritritionclimu st. un thar inlrecwe, tla ehas, haagI uogd “a ree’m onll triand’wia g row pipoin


┌ Info: New best result: 257.77185
└ @ Main In[9]:11
┌ Info: Epoch: 8
└ @ Main In[9]:6


loss(Xs[5], Ys[5]) = 257.54782f0
vYMs!itue han d ry siuript 
ehileRevosd thbi
“
“recr” sr Viit  n nd Him tns rapn nand 
Y!ab angse


┌ Info: New best result: 255.05707
└ @ Main In[9]:11
┌ Info: Epoch: 9
└ @ Main In[9]:6


loss(Xs[5], Ys[5]) = 254.91763f0
L;LzhYronzr m, Heaaf reetisr?” ripitie id Dalasligme 

I” nd hehos.  biofs toavanmo!vwicnct hry re!


┌ Info: New best result: 252.10231
└ @ Main In[9]:11


loss(Xs[5], Ys[5]) = 251.78633f0
yg;nh, gngafatteslcovald titt ang n . phhia row tibd y. awe shyeshehy. Yaas gs se  nis in sit ichih’

┌ Info: Epoch: 10
└ @ Main In[9]:6





┌ Info: New best result: 248.94778
└ @ Main In[9]:11


In [10]:
N = length(alphabet);
seqlen = 100;
batch_size = 64;
Xs = collect(partition(batchseq(chunk(text, batch_size), stop), seqlen));
Ys = collect(partition(batchseq(chunk(text[2:end], batch_size), stop), seqlen));

In [11]:
m = Chain(
    LSTM(N, 256),
    Dropout(0.3), 
    LSTM(256, 256),
    Dropout(0.3),
    Dense(256, N),
    softmax)

#9 (generic function with 1 method)

In [12]:
@info("Beginning training loop...")
best_ls = Inf
last_improvement = 0
bs64 = []
for epoch = 1:10
    @info "Epoch: $epoch"
    global best_ls, last_improvement
    Flux.train!(loss, params(m), zip(Xs, Ys), opt, cb=throttle(evalcb, 240))
    ls = loss(Xs[5], Ys[5])
    if ls <= best_ls
        @info "New best result: $ls"
        BSON.@save "char_model.bson" m
        best_ls = ls
        last_improvement = epoch
    end
    append!(bs64,ls)
end

┌ Info: Beginning training loop...
└ @ Main In[12]:1
┌ Info: Epoch: 1
└ @ Main In[12]:6


loss(Xs[5], Ys[5]) = 411.23767f0
r﻿B—UyaApqfoId!C_?rvb"V“jKEE﻿
)_IL“K﻿aZ"(SO)ODF-jo_xFa’C‘n—
Qa—EpQptwti;r’:GtJOns.﻿LjYB’Q”— dnU_Z)zO


┌ Info: New best result: 320.70786
└ @ Main In[12]:11
┌ Info: Epoch: 2
└ @ Main In[12]:6


loss(Xs[5], Ys[5]) = 320.67307f0
. l.t;r’oa ucl a
 lgtemeuaerido ftrmsrmirHstitirgT stot
aas”
nu dwe rer


┌ Info: New best result: 319.2849
└ @ Main In[12]:11
┌ Info: Epoch: 3
└ @ Main In[12]:6


loss(Xs[5], Ys[5]) = 319.25943f0
.uBcW

ltu  ooah  s
eht r  .  eIiso.oirolohutntcaaraioss
rwst
 sma,ntdttloa hrao  ep;F tlo ng o


┌ Info: New best result: 315.2238
└ @ Main In[12]:11
┌ Info: Epoch: 4
└ @ Main In[12]:6


loss(Xs[5], Ys[5]) = 314.4044f0
Wlmztmenretoe’ai“ u.  e ci
rmycgsth nrtldkt fdeoaokosy.hw olthodh r Iae ftihero, yp ytCu‘ y deetv 


┌ Info: New best result: 296.87393
└ @ Main In[12]:11
┌ Info: Epoch: 5
└ @ Main In[12]:6


loss(Xs[5], Ys[5]) = 296.01035f0
:JCyBp. ti
l
cseBrsoadeH 
ra o nothlhaooBh nsshkoooue seear lrwua htd ter.y o. —o  suslvib vol


┌ Info: New best result: 284.33936
└ @ Main In[12]:11
┌ Info: Epoch: 6
└ @ Main In[12]:6


loss(Xs[5], Ys[5]) = 283.6893f0
”“ee  ened  ” at omeesiy na?iriusraeSrug og souad wt r?, r he uibhcaanriqheotne as y ot cfVoapd  s


┌ Info: New best result: 270.1659
└ @ Main In[12]:11
┌ Info: Epoch: 7
└ @ Main In[12]:6


loss(Xs[5], Ys[5]) = 269.0389f0
—’sLy”tfofr.zally, t adr cknrerld ldewag,iuuelwn dd’ypr Hrctt de ud ed Sarotrette s bve gy d el’ynd 


┌ Info: New best result: 263.5062
└ @ Main In[12]:11
┌ Info: Epoch: 8
└ @ Main In[12]:6


loss(Xs[5], Ys[5]) = 263.40918f0
hhJyoica
w’sed d’ jhoats t tnd rrr uwreulavt ig she Hencogo.lay Hharlyuhanloue ehmat gnhy le 
ng 


┌ Info: New best result: 260.25836
└ @ Main In[12]:11
┌ Info: Epoch: 9
└ @ Main In[12]:6


loss(Xs[5], Ys[5]) = 260.05893f0
yokUhkaot’ 
he cid tutooe thaf d car relkt arecpofdo oceoih os cauorheat an a itnsocheolns woiait n


┌ Info: New best result: 258.0029
└ @ Main In[12]:11
┌ Info: Epoch: 10
└ @ Main In[12]:6


loss(Xs[5], Ys[5]) = 257.88885f0
JEvqpflumbiy itt, 

 wle sir ahe 
HyT ust,’ sapuy sre tly b 
I
bh
Hh
s, srd 
foHoleso


n 
loss(Xs[5], Ys[5]) = 256.7641f0
(meWweiagtsseslereosasd B sceHad troya
 sintofl o reomeicat d bitrigm efr o . wiseu
hlirk meofgror


┌ Info: New best result: 255.86424
└ @ Main In[12]:11


In [23]:
N = length(alphabet);
seqlen = 100;
batch_size = 128;
Xs = collect(partition(batchseq(chunk(text, batch_size), stop), seqlen));
Ys = collect(partition(batchseq(chunk(text[2:end], batch_size), stop), seqlen));

In [24]:
m = Chain(
    LSTM(N, 512),
    Dropout(0.3), 
    LSTM(512, 512),
    Dropout(0.3),
    Dense(512, N),
    softmax)

#14 (generic function with 1 method)

In [25]:
@info("Beginning training loop...")
best_ls = Inf
last_improvement = 0
bs128 = []
for epoch = 1:10
    @info "Epoch: $epoch"
    global best_ls, last_improvement
    Flux.train!(loss, params(m), zip(Xs, Ys), opt, cb=throttle(evalcb, 240))
    ls = loss(Xs[5], Ys[5])
    if ls <= best_ls
        @info "New best result: $ls"
        BSON.@save "char_model.bson" m
        best_ls = ls
        last_improvement = epoch
    end
    append!(bs128,ls)
end

┌ Info: Beginning training loop...
└ @ Main In[25]:1
┌ Info: Epoch: 1
└ @ Main In[25]:6


loss(Xs[5], Ys[5]) = 398.93127f0
sSNBO
ce
KAcydPaVL?"?:gBzvJ,’NDih RaqIjPg 
dgjohd xR"x,DvY“Mf’Vfed:WU?F—toG"jq()-toVg_i
loss(Xs[5], Ys[5]) = 320.28763f0
lyw?ro—oagi t    ’rorxdoodadoesnireish.nso r s g yb oe
hecr iec— adnryrha 


┌ Info: New best result: 320.21646
└ @ Main In[25]:11
┌ Info: Epoch: 2
└ @ Main In[25]:6


loss(Xs[5], Ys[5]) = 320.27628f0
E(!Pynfhdi) ewI
i    
nrn”aisGsfoasane nDgeKr,td dleophyim” ee HH stMedgh trhnuuoa
oolvkshnrawsr
loss(Xs[5], Ys[5]) = 319.4048f0
GRz”
h fsellPte   d
wmrreae sr p
r 
eqGt.nseott u sp.g  i 
hsumDd wwoanfon
trefes.e


┌ Info: New best result: 319.54654
└ @ Main In[25]:11
┌ Info: Epoch: 3
└ @ Main In[25]:6


loss(Xs[5], Ys[5]) = 319.40466f0
v-Whmvhloatt’omoe b.c
tergh wdd  oe
  hHqe ;ve
e hter.yt
ks tdmvyniof 
Hhu
loss(Xs[5], Ys[5]) = 318.5011f0
jIcn’s”RcHew nr.
rOr  igalb bwmara)
m i ntn elge  tatuitwdet   d  ar  iio.oyhroklryb w


┌ Info: New best result: 318.542
└ @ Main In[25]:11
┌ Info: Epoch: 4
└ @ Main In[25]:6


loss(Xs[5], Ys[5]) = 318.53098f0
ov, lpcsus cayon raesyenr  
sabe
dasgl wooHPhroa"ltaksha!rsnar 
H rdlyg dlnelchofe
loss(Xs[5], Ys[5]) = 316.27087f0
cJN
isv
nibnaeohcee araaed.wh seymffes nyhn eotoeyr sc bd”oa


┌ Info: New best result: 315.23526
└ @ Main In[25]:11
┌ Info: Epoch: 5
└ @ Main In[25]:6


loss(Xs[5], Ys[5]) = 314.8634f0
I’
did andigsqldn
o Cer’l aoh ictenrk eear w Pia   reghiloae
 swhea
loss(Xs[5], Ys[5]) = 308.5589f0
rtam tt.t
HitaanosNiensfwnkleu
 sslklbdu
ywatr ealhvke?’rtorgeynn rsy—raywfne t’aow
y ks


┌ Info: New best result: 308.84573
└ @ Main In[25]:11
┌ Info: Epoch: 6
└ @ Main In[25]:6


loss(Xs[5], Ys[5]) = 309.14264f0
Y

t b
mtyh. s Z dawoiartT
e be,e sek ursiMso Oy fl﻿ois 
tls )”,l nraok Ahgc  rl
loss(Xs[5], Ys[5]) = 299.9339f0
IRE(E absed.ehe t a pnrGooeKw arahihylaHhoi bointtmiii
e hPabiweAh ggit,.aafseou vshwl 


┌ Info: New best result: 302.9027
└ @ Main In[25]:11
┌ Info: Epoch: 7
└ @ Main In[25]:6


loss(Xs[5], Ys[5]) = 307.96176f0
 aha s , .woat chariee .s     nlnoe o w a n wr cftain 
tochee”yu  th n un  B  t in 
loss(Xs[5], Ys[5]) = 290.04468f0
OQ’ne
nin,mcSs’tasse el yn— w a gton d ye
ea 
tr w 


┌ Info: New best result: 293.20596
└ @ Main In[25]:11
┌ Info: Epoch: 8
└ @ Main In[25]:6


loss(Xs[5], Ys[5]) = 289.34213f0
)!VU
h  wg .
 ehaotSofhifyghtnl id woktrd,p
leiifd Md 
loss(Xs[5], Ys[5]) = 283.75586f0
WaCer egsn d od.nbNHli  n
napediwe’d t tnCar
c
naotf
na 
 hHegrtg tovn wol
y s 


┌ Info: New best result: 292.48535
└ @ Main In[25]:11


loss(Xs[5], Ys[5]) = 285.50076f0


┌ Info: Epoch: 9
└ @ Main In[25]:6


ECwH

c
nTovm e iharit atsor cf, t e Pg “o t’hr’ lsh d eig
loss(Xs[5], Ys[5]) = 277.3621f0
cinnsrs ew, rih. poH I. e Dals peeuigdbssorh ondr
nt wi—,s” ulerTteid  


┌ Info: New best result: 275.6532
└ @ Main In[25]:11
┌ Info: Epoch: 10
└ @ Main In[25]:6


loss(Xs[5], Ys[5]) = 276.12375f0
ud  t ulkAHkrbn hy” nitc htrRan ..” 
“tecanenS wty  fsis hanero
nho biiY 
loss(Xs[5], Ys[5]) = 272.46347f0
_Aabh
hhohtot hemvt eop w ome und ,aahe r, v cutd.,.
rg  llddr. auit n n cn tisdrsrvus


┌ Info: New best result: 268.07843
└ @ Main In[25]:11


In [1]:
using Plots
plot(bs32,label = "Batch_size 32")
plot!(bs64, label = "Batch_size 64")
plot!(bs128, label = "Batch_size 128")

SyntaxError: invalid syntax (2537353997.py, line 1)

### Summary
As expected, the best result was achieved by the network with batch_size = 32, which is the smallest examined. The size of the set in relation to the computer's performance turned out to be a big problem when learning the network on this set. In the project, the network was set to 10 iterations to reduce computation time, and conclusions were drawn on this basis. In this number of iterations, the network has learned roughly the word length of the English language. In the first attempt, the network was set to 50 iterations, but the learning process lasted nearly 40 hours, after which the computer refused to cooperate further. The data resulting from the first attempt was lost, but the network already around 40 iterations correctly spelled the name of the main character in most cases, which is quite a surprise. This is probably due to the fact that "Harry" is the word most often found in the pages of the saga.

### Bibilography
* http://karpathy.github.io/2015/05/21/rnn-effectiveness/
* https://fluxml.ai/Flux.jl/stable/
* Length-frequency statistics for written English, G.A.MillerE.B.NewmanE.A.Friedman, Harvard University 1958
* https://medium.com/@theacropolitan/sentence-length-has-declined-75-in-the-past-500-years-2e40f80f589f