## Julia on Colaboratory ##

[Colaboratory](https://colab.research.google.com) does not provide native support for the [Julia programming language](https://julialang.org). However, since Colaboratory gives you root access to the machine that runs your notebook (the *“runtime”* in Colaboratory terminology), we can install Julia support by uploading a specially crafted Julia notebook  – *this* notebook. We then install Julia and [IJulia](https://github.com/JuliaLang/IJulia.jl) ([Jupyter](https://jupyter.org)/Colaboratory notebook support) and reload the notebook so that Colaboratory detects and initiates what we installed.

In brief:

1. **Run the cell below**
2. **Reload the page**
3. **Edit the notebook name and start hacking Julia code below**

**If your runtime resets**, either manually or if left idle for some time, **repeat steps 1 and 2**.

### Acknowledgements ###

This hack by Pontus Stenetorp is an adaptation of [James Bradbury’s original Colaboratory Julia hack](https://discourse.julialang.org/t/julia-on-google-colab-free-gpu-accelerated-shareable-notebooks/15319/27), that broke some time in September 2019 as Colaboratory increased their level of notebook runtime isolation. There also appears to be CUDA compilation support installed by default for each notebook runtime type in October 2019, which shaves off a good 15 minutes or so from the original hack’s installation time.

In [0]:
# Installation cell
%%shell
if ! command -v julia 2>&1 > /dev/null
then
    wget 'https://julialang-s3.julialang.org/bin/linux/x64/1.2/julia-1.2.0-linux-x86_64.tar.gz' \
        -O /tmp/julia.tar.gz
    tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
    rm /tmp/julia.tar.gz
fi
julia -e 'using Pkg; pkg"add Plots; add PyPlot; add IJulia; add Knet; precompile"'
julia -e 'using Pkg; pkg"build Knet;"'

--2020-01-05 23:15:06--  https://julialang-s3.julialang.org/bin/linux/x64/1.2/julia-1.2.0-linux-x86_64.tar.gz
Resolving julialang-s3.julialang.org (julialang-s3.julialang.org)... 151.101.2.49, 151.101.66.49, 151.101.130.49, ...
Connecting to julialang-s3.julialang.org (julialang-s3.julialang.org)|151.101.2.49|:443... connected.
HTTP request sent, awaiting response... 302 gce internal redirect trigger
Location: https://storage.googleapis.com/julialang2/bin/linux/x64/1.2/julia-1.2.0-linux-x86_64.tar.gz [following]
--2020-01-05 23:15:06--  https://storage.googleapis.com/julialang2/bin/linux/x64/1.2/julia-1.2.0-linux-x86_64.tar.gz
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.142.128, 2607:f8b0:400e:c08::80
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.142.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 91990555 (88M) [application/x-tar]
Saving to: ‘/tmp/julia.tar.gz’


2020-01-05 23:15:06 (121 MB/s) - ‘/tmp/julia.t



In [1]:
using Knet
# Test if Knet is using gpu
Knet.gpu()

┌ Info: Recompiling stale cache file /root/.julia/compiled/v1.2/Knet/f4vSz.ji for Knet [1902f260-5fb4-5aff-8c31-6271790ab950]
└ @ Base loading.jl:1240


0

In [0]:
a = KnetArray(randn(4,4))
sigm.(a)

4×4 KnetArray{Float64,2}:
 0.54592   0.554279  0.343429   0.838957
 0.806902  0.768127  0.285473   0.488519
 0.568271  0.857464  0.0606523  0.424611
 0.53042   0.396642  0.694845   0.125767

In [13]:

#imports
import Pkg
using Pkg; for p in ("Knet","IterTools","WordTokenizers","Test","Random","Statistics","Dates","LinearAlgebra","CuArrays"); haskey(Pkg.installed(),p) || Pkg.add(p); end
using Statistics, IterTools, WordTokenizers, Test, Knet, Random, Dates, Base.Iterators, LinearAlgebra
# Update and list all packages
Pkg.update()
pkgs = Pkg.installed()

for package in keys(pkgs)
    if pkgs[package] == nothing
        pkgs[package] = VersionNumber("0.0.1")
    end
    println("Package name: ", package, " Version: ", pkgs[package])
end
using CuArrays: CuArrays, usage_limit
CuArrays.usage_limit[] = 8_000_000_000
BATCH_SIZE = 64

Knet.atype() = KnetArray{Float32} #Array{Float32}
is_lstm_strategy_on = true # if true rnn type becomes lstm, otherwise we preferred to use relu
gpu() # GPU test must result as 0


# Vocabulary Structure
struct Vocab
    w2i::Dict{String,Int}
    i2w::Vector{String}
    tags::Vector{String}
    unk::Int
    eos::Int
    tokenizer    
end

function Vocab(file::String; tokenizer=split, vocabsize=Inf, mincount=1, unk="<unk>", eos="<s>")
    vocab_freq = Dict{String,Int64}(unk => 0, eos => 0)
    w2i = Dict{String, Int64}(unk => 2, eos => 1)
    i2w = Vector{String}()
    tags = Vector{String}()

    push!(i2w, eos)
    push!(i2w, unk)
        
    open(file) do f
        for line in eachline(f)
            tag, sentence = split(strip(lowercase(line))," ||| ")
            !(tag in tags) && push!(tags, tag)
            
            sentence = tokenizer(sentence, [' '], keepempty = false)
            
            for word in sentence
                word == unk && continue
                word == eos && continue # They are default ones to be added later
                vocab_freq[word] = get!(vocab_freq, word, 0) + 1
            end
        end
        close(f)
    end


    # End of vanilla implementation of the vocabulary
    # From here we must add the mincount and vocabsize properties
    # We must change the first two property of the vocab wrt those paramaters
    vocab_freq = sort!(
        collect(vocab_freq),
        by = tuple -> last(tuple),
        rev = true,
    )

    if length(vocab_freq)>vocabsize - 2 # eos and unk ones
        vocab_freq = vocab_freq[1:vocabsize-2] # trim to fit the size
    end

    #vocab_freq = reverse(vocab_freq)

    while true
        length(vocab_freq)==0 && break
        word,freq = vocab_freq[end]
        freq>=mincount && break # since it is already ordered
        vocab_freq = vocab_freq[1:(end - 1)]
    end
    #pushfirst!(vocab_freq,unk=>1,eos=>1) # freq does not matter, just adding the
    for i in 1:length(vocab_freq)
        word, freq = vocab_freq[i]
        ind = (get!(w2i, word, 1+length(w2i)))
        (length(i2w) < ind) && push!(i2w, word)
    end

    return Vocab(w2i, i2w, tags, 2, 1, tokenizer)
end



# Special reader for the task
struct TextReader
    file::String
    vocab::Vocab
end

word2ind(dict,x) = get(dict, x, 2)
findtagid(tag,vector) = findall(x -> x == tag,vector)[1]

#Implementing the iterate function
function Base.iterate(r::TextReader, s=nothing)
    if s == nothing
        state = open(r.file)
        Base.iterate(r,state)
    else
        if eof(s) == true
            close(s)
            return nothing
        else
            line = readline(s)
            sent_ind = Int[]
            
            # Tagification
            tag, sentence = split(strip(lowercase(line))," ||| ")
            
            tagind = findtagid(tag,r.vocab.tags)
            push!(sent_ind,tagind)
            
            sent = r.vocab.tokenizer(strip(lowercase(sentence)), [' '], keepempty = false)
            
            for word in sent
                ind = word2ind(r.vocab.w2i,word)
                push!(sent_ind,ind)
            end
            push!(sent_ind,r.vocab.eos)
            return (sent_ind, s)
        end
    end
end


Base.IteratorSize(::Type{TextReader}) = Base.SizeUnknown()
Base.IteratorEltype(::Type{TextReader}) = Base.HasEltype()
Base.eltype(::Type{TextReader}) = Vector{Int}



# File 
const datadir = "nn4nlp-code/data/classes"
isdir(datadir) || run(`git clone https://github.com/neubig/nn4nlp-code.git`)

if !isdefined(Main, :a_vocab)
    vocab = Vocab("$datadir/train.txt", mincount=2)

    train = TextReader("$datadir/train.txt", vocab)
    test = TextReader("$datadir/test.txt", vocab)

end

# Minibatching
struct LMData
    src::TextReader
    batchsize::Int
    maxlength::Int
    bucketwidth::Int
    buckets
end

function LMData(src::TextReader; batchsize = BATCH_SIZE, maxlength = typemax(Int), bucketwidth = 10)
    numbuckets = min(128, maxlength ÷ bucketwidth)
    buckets = [ [] for i in 1:numbuckets ]
    LMData(src, batchsize, maxlength, bucketwidth, buckets)
end

Base.IteratorSize(::Type{LMData}) = Base.SizeUnknown()
Base.IteratorEltype(::Type{LMData}) = Base.HasEltype()
Base.eltype(::Type{LMData}) = Matrix{Int}

function Base.iterate(d::LMData, state=nothing)
    if state == nothing
        for b in d.buckets; empty!(b); end
    end
    bucket,ibucket = nothing,nothing
    while true
        iter = (state === nothing ? iterate(d.src) : iterate(d.src, state))
        if iter === nothing
            ibucket = findfirst(x -> !isempty(x), d.buckets)
            bucket = (ibucket === nothing ? nothing : d.buckets[ibucket])
            break
        else
            sent, state = iter
            if length(sent) > d.maxlength || length(sent) == 0; continue; end
            ibucket = min(1 + (length(sent)-1) ÷ d.bucketwidth, length(d.buckets))
            bucket = d.buckets[ibucket]
            push!(bucket, sent)
            if length(bucket) === d.batchsize; break; end
        end
    end
    if bucket === nothing; return nothing; end
    batchsize = length(bucket)
    maxlen = maximum(length.(bucket))
    batch = fill(d.src.vocab.eos, batchsize, maxlen + 1)
    for i in 1:batchsize
        batch[i, 1:length(bucket[i])] = bucket[i]
    end
    empty!(bucket)
    return batch, state
end

# Mask!
function mask!(a,pad)
    matr = a
    for j in 1:size(matr)[1]
        i=0
        while i<(length(matr[j,:])-1)
            matr[j,length(matr[j,:])-i-1]!=pad && break

            if matr[j,length(matr[j,:])-i]== pad
                matr[j,length(matr[j,:])-i]= 0
            end
            i+=1
        end
    end
    matr
    end
# Embed format updated with respect to task
struct Embed; w; end

function Embed(tagsize::Int, embedsize::Int)
    Embed(param(embedsize,tagsize))
end

function (l::Embed)(x)
    l.w[:,x]
end

#Linear
struct Linear; w; b; end

function Linear(inputsize::Int, outputsize::Int)
    Linear(param(outputsize,inputsize), param0(outputsize))
end

function (l::Linear)(x)
    l.w * mat(x,dims=1) .+ l.b
end


struct RNNsent_model
    embed::Embed        # language embedding
    rnn::RNN            # RNN (can be bidirectional)
    projection::Linear  # converts output to vocab scores
    dropout::Real       # dropout probability to prevent overfitting
    vocab::Vocab        # language vocabulary
end

function RNNsent_model(hidden::Int,      # hidden size for both the encoder and decoder RNN
                embsz::Int,          # embedding size
                vocab::Vocab;        # language vocabulary
                layers=1,            # number of layers
                bidirectional=false, # whether encoder RNN is bidirectional
                dropout=0)           # dropout probability

    embed = Embed(length(vocab.i2w),embsz)

    rnn = RNN(embsz,hidden;rnnType=is_lstm_strategy_on ? :lstm : :relu, numLayers=layers,bidirectional=bidirectional, dropout= dropout)
    
    layerMultiplier = bidirectional ? 2 : 1
    
    projection = Linear(layerMultiplier*hidden,length(vocab.tags))

    RNNsent_model(embed,rnn,projection,dropout,vocab)

end

print(2)

function calc_scores(rm::RNNsent_model, data; average=true)
    B, Tx = size(data)
    
    emb = rm.embed(data)
    
    y = sum(rm.rnn(emb), dims=3) # nature of nll allows us to sum along each sentence

    return rm.projection(reshape(y,:,B))    

end


function loss_f(model, batch)  
    verify = deepcopy(batch[:,1]) # only tags allowed to in
    #mask!(verify,vocab.eos) no need to mask more :)
        
    scores = calc_scores(model,batch[:,2:end]) # trim one end
   
    return nll(scores,verify)/size(verify,1) # Loss for each sentence

end

function maploss(lossfn, model, data)
    total_loss = 0.0
    for part in collect(data)
        total_loss += lossfn(model,part)
    end

    return total_loss
end

model = RNNsent_model(512, 512, vocab; bidirectional=true, dropout=0.2)
rm = model

train_batches = collect(LMData(train))
test_batches = collect(LMData(test))
train_batches50 = train_batches[1:50] # Small sample for quick loss calculation

epoch = adam(loss_f, ((model, batch) for batch in train_batches))
bestmodel, bestloss = deepcopy(model), maploss(loss_f, model, test_batches)

[32m[1m  Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[?25l[2K[?25h[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.2/Project.toml`
[90m [no changes][39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.2/Manifest.toml`
[90m [no changes][39m
Package name: Statistics Version: 0.0.1
Package name: Test Version: 0.0.1
Package name: Random Version: 0.0.1
Package name: WordTokenizers Version: 0.5.3
Package name: IterTools Version: 1.3.0
Package name: LinearAlgebra Version: 0.0.1
Package name: CuArrays Version: 1.6.0
Package name: IJulia Version: 1.20.2
Package name: Plots Version: 0.28.4
Package name: PyPlot Version: 2.8.2
Package name: Dates Version: 0.0.1
Package name: Knet Version: 1.2.7
2

(RNNsent_model(Embed(P(KnetArray{Float32,2}(512,8218))), LSTM(input=512,hidden=512,bidirectional,dropout=0.2), Linear(P(KnetArray{Float32,2}(5,1024)), P(KnetArray{Float32,1}(5))), 0.2, Vocab(Dict("aimlessly" => 4907,"offend" => 3845,"enjoy" => 435,"chocolate" => 4908,"fight" => 1823,"nicholas" => 1775,"everywhere" => 4593,"princess" => 7582,"uniformly" => 3846,"larky" => 6921…), ["<s>", "<unk>", ".", "the", ",", "a", "and", "of", "to", "is"  …  "scarifying", "sealed", "effectiveness", "wraps", "na", "wills", "circles", "sharper", "pluto", "pleaser"], ["3", "4", "2", "1", "0"], 2, 1, split)), 6.721138363704085)

137-element Array{Array{Int64,2},1}:
 [1 301 … 1 1; 2 6 … 1 1; … ; 1 4 … 1 1; 1 20 … 1 1]         
 [3 24 … 1 1; 2 59 … 1 1; … ; 1 12 … 1 1; 3 28 … 1 1]        
 [1 237 … 1 1; 3 660 … 1 1; … ; 1 22 … 1 1; 2 2974 … 1 1]    
 [1 12 … 1 1; 1 52 … 1 1; … ; 1 16 … 1 1; 2 139 … 1 1]       
 [1 4 … 1 1; 2 4 … 1 1; … ; 1 317 … 1 1; 3 377 … 1 1]        
 [1 176 … 1 1; 1 63 … 1 1; … ; 1 22 … 1 1; 2 4 … 1 1]        
 [2 2168 … 1 1; 1 75 … 1 1; … ; 1 6 … 1 1; 3 2034 … 1 1]     
 [1 20 … 1 1; 1 4 … 1 1; … ; 1 6 … 1 1; 2 3808 … 1 1]        
 [1 6 … 1 1; 3 4 … 1 1; … ; 2 12 … 1 1; 2 6257 … 1 1]        
 [1 6 … 1 1; 1 12 … 1 1; … ; 2 4 … 1 1; 2 2974 … 1 1]        
 [1 6 … 1 1; 3 69 … 1 1; … ; 1 12 … 1 1; 3 4 … 1 1]          
 [1 3663 … 1 1; 3 3223 … 1 1; … ; 2 4 … 1 1; 1 634 … 1 1]    
 [2 738 … 1 1; 1 2961 … 1 1; … ; 2 28 … 1 1; 1 6 … 1 1]      
 ⋮                                                           
 [4 4 … 1 1; 5 28 … 1 1; … ; 5 28 … 1 1; 3 24 … 1 1]         
 [5 52 … 1 1; 4 4 … 1 1; … ; 5 85

In [21]:
#progress!(ncycle(epoch, 100), seconds=5) do x
j = 0
start = time()
println("Start Time=",start)

for i in ncycle(epoch,100)
    j+= 1
    global bestmodel, bestloss
    ## Report gradient norm for the first batch
    f = @diff loss_f(model,train_batches[1])
    gnorm = sqrt(sum(norm(grad(f,x))^2 for x in params(model)))
    ## Report training and validation loss
    trnloss = maploss(loss_f,model, train_batches50)
    println("iter=",j ,"     train loss/sent=", trnloss , "    time=",time()-start)
    devloss = maploss(loss_f,model, test_batches)
    
    totacc = 0
    totsents= 0
    print(1)
    for part in (test_batches)
      y = calc_scores(model,part[:,2:end])
      acc,sents = accuracy(y,part[:,1],average=false )
      totacc+=acc
      totsents +=sents
    end

    ## Save model that does best on validation data
    if devloss < bestloss
        bestmodel, bestloss = deepcopy(model), devloss
    end
    println("iter =",j, "test acc=",totacc/totsents )
end

Start Time=1.57827023839858e9
iter=1     train loss/sent=5.960024325642735    time=0.2696108818054199
1iter =1test acc=0.18416289592760182
iter=2     train loss/sent=6.546866671880707    time=0.7315590381622314
1iter =2test acc=0.22941176470588234
iter=3     train loss/sent=6.3758391798473895    time=1.1392178535461426
1iter =3test acc=0.21764705882352942
iter=4     train loss/sent=6.486251290421933    time=1.542524814605713
1iter =4test acc=0.19366515837104073
iter=5     train loss/sent=4.18316938506905    time=1.9494240283966064
1iter =5test acc=0.2158371040723982
iter=6     train loss/sent=6.2334997821599245    time=2.347327947616577
1iter =6test acc=0.22217194570135745
iter=7     train loss/sent=6.91528778988868    time=2.7554328441619873
1iter =7test acc=0.21945701357466063
iter=8     train loss/sent=7.212218166328967    time=3.167346954345703
1iter =8test acc=0.2506787330316742
iter=9     train loss/sent=9.372003868222237    time=3.576476812362671
1iter =9test acc=0.2778280542986

InterruptException: ignored

In [23]:
totacc = 0
totsents = 0
for part in (test_batches)
      y = calc_scores(bestmodel,part[:,2:end])
      acc,sents = accuracy(y,part[:,1],average=false )
      totacc+=acc
      totsents +=sents
    end
print(totacc/totsents)

0.36515837104072396