# Trace Generation and Modeling

To test our DeepValidate approach we generate a dataset of test traces from a chain of relatively simple arithmetical functions operating on a series of randomized inputs. Given the generated program traces, we train a LSTM classifier to predict whether the output will be valid or result in an error. 

The trace generation is performed by `output_trace.jl` which reproduces much of the functionality of `varextract.jl` with some important differences. Rather than send trace information to `stdout`, we direct the traces to a file `traces.dat`. 

(It must be noted that this is not possible within an IJulia notebook due to restrictions on [task switching in staged functions](https://github.com/JuliaLang/julia/issues/18568) which prevents the trace outputs from being written to a file recursively. However, this works just fine from the command line.)

In [None]:
function Cassette.overdub(ctx::TraceCtx,
                          f,
                          args...)
    open("traces.dat", "a") do file
        write(file, string(f))
        write(file, string(args))
    end
    
    # if we are supposed to descend, we call Cassette.recurse
    if Cassette.canrecurse(ctx, f, args...)
        subtrace = (Any[],Any[])
        push!(ctx.metadata[1], (f, args) => subtrace)
        newctx = Cassette.similarcontext(ctx, metadata = subtrace)
        retval = Cassette.recurse(newctx, f, args...)
        # push!(ctx.metadata[2], subtrace[2])
    else
        retval = Cassette.fallback(ctx, f, args...)
        push!(ctx.metadata[1], :t)
        push!(ctx.metadata[2], retval)
    end
    @info "returning"
    @show retval
    return retval
end

We then modify our `@textset` so that it creates the `traces.dat` file and then loops through a large number of randomized runs of our arithmetic tests. Error conditions happen most often when our inputs are sufficiently close to zero, so a Normal(0,2) distribution gives us a good range of values to generate a reasonable percentage of "bad" traces on which to train. Empirically the share of "bad" traces generated is about 15-17%.

In [None]:
@testset "TraceExtract" begin
    g(x) = begin
        y = add(x.*x, -x)
        z = 1
        v = y .- z
        s = sum(v)
        return s
    end
    h(x) = begin
        z = g(x)
        zed = sqrt(z)
        return zed
    end

    open("traces.dat", "w") do f
        write(f, "")
    end

    seeds = rand(Normal(0,2),30000,3)
    
    for i=1:size(seeds,1)
        ctx = TraceCtx(pass=ExtractPass, metadata = (Any[], Any[]))
        try
            result = Cassette.overdub(ctx, h, seeds[i,:])
        catch DomainError
            dump(ctx.metadata)
        finally
            open("traces.dat", "a") do f
                write(f, "\n")
            end
        end
        if i%1000 == 0
            @info string(i)
        end
    end
end


After generating our raw traces, a small amount of pre-processing is required before attempting to model around them. First, we classify our "good" and "bad" traces based on whether they have resulted in an error. 

We then need to strip out the actual error dump information from our "bad" traces, as this would too easily give away the prediction game. All traces end just before they would error, allowing the validation model to predict the that next outcome. 

In [None]:
text = split(String(read("traces.dat")), "\n");
Ys = Int.(occursin.(Ref(r"(Base[\S(?!\))]+error)"i), text));

text = split.(text, Ref(r"(Base[\S(?!\))]+error)"i));
text = [t[1] for t in text];

sum(Ys)

Finally, we save our traces and our labels our as CSV files for easy ingestion for our model. 

In [None]:
writedlm( "traces.csv",  text[1:end-1], ',')
writedlm( "y_results.csv",  Ys[1:end-1], ',')

## Validation Classifier Model
For our modeling, we use [Flux.jl](https://github.com/FluxML/Flux.jl) and train an LSTM encoder/decoder classifier on our traces.

In [2]:
using Pkg

In [3]:
Pkg.add(["Flux", "MLDataPattern", "DelimitedFiles"])

[32m[1m  Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[2K[?25h[32m[1m  Updating[22m[39m registry at `~/.julia/registries/Uncurated`
[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/Uncurated.git`
[2K[?25h[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.0/Project.toml`
 [90m [587475ba][39m[92m + Flux v0.7.3[39m
 [90m [9920b226][39m[92m + MLDataPattern v0.5.0[39m
 [90m [8bb1440f][39m[92m + DelimitedFiles [39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.0/Manifest.toml`
 [90m [9920b226][39m[92m + MLDataPattern v0.5.0[39m
 [90m [66a33bbf][39m[92m + MLLabelUtils v0.5.1[39m
 [90m [dbb5928d][39m[92m + MappedArrays v0.2.1[39m


In [1]:
using DelimitedFiles
using Flux
using Flux: onehot, throttle, crossentropy, onehotbatch, params, shuffle
using MLDataPattern: stratifiedobs
using Base.Iterators: partition

include("../../src/validation/utils.jl")


get_data (generic function with 1 method)

In [2]:
#
# Set up inputs for model
#

# Read lines from traces.dat text in to arrays of characters
# Convert to onehot matrices

cd(@__DIR__)

text, alphabet, N = get_data("traces.dat")
stop = onehot('\n', alphabet);
alphabet

58-element Array{Char,1}:
 'g' 
 'e' 
 't' 
 'f' 
 'i' 
 'l' 
 'd' 
 '(' 
 'M' 
 'a' 
 'n' 
 ',' 
 ' ' 
 ⋮   
 'A' 
 '3' 
 '6' 
 '+' 
 'q' 
 'v' 
 'F' 
 '<' 
 '_' 
 'w' 
 'x' 
 '\n'

In [3]:
# Partition into subsequences to input to our model

seq_len = 50

Xs = [collect(partition(t,seq_len)) for t in text];

In [4]:
prod.(Xs[7])

32-element Array{String,1}:
 "getfield(Main, Symbol(\"#h#9\")){getfield(Main, Symb"  
 "ol(\"#g#8\"))}(getfield(Main, Symbol(\"#g#8\"))())([0."
 "842063, -1.67386, 2.31813],)getfield(getfield(Main"    
 ", Symbol(\"#h#9\")){getfield(Main, Symbol(\"#g#8\"))}("
 "getfield(Main, Symbol(\"#g#8\"))()), :g)getfield(Mai"  
 "n, Symbol(\"#g#8\"))()([0.842063, -1.67386, 2.31813]"  
 ",)Base.Broadcast.broadcasted(*, [0.842063, -1.6738"    
 "6, 2.31813], [0.842063, -1.67386, 2.31813])Base.Br"    
 "oadcast.materialize(Base.Broadcast.Broadcasted(*, "    
 "([0.842063, -1.67386, 2.31813], [0.842063, -1.6738"    
 "6, 2.31813])),)Base.Broadcast.instantiate(Base.Bro"    
 "adcast.Broadcasted(*, ([0.842063, -1.67386, 2.3181"    
 "3], [0.842063, -1.67386, 2.31813])),)copy(Base.Bro"    
 ⋮                                                       
 "e.Broadcast.materialize(Base.Broadcast.Broadcasted"    
 "(-, ([-0.132993, 4.47568, 3.0556], 1)),)Base.Broad"    
 "cast.instantiate(Base.Broadcast.Broadcaste

In [5]:
Ys = (map(t->occursin("sqrt_llvm", prod(t)), text));

In [6]:
sum(Ys), length(Ys)

(2479, 3000)

In [7]:
Xs_vec = [[onehotbatch(x, alphabet) for x in Xs[i]] for i in 1:length(Xs)];
Xs_vec[1][1]

58×50 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
  true  false  false  false  false  …  false  false  false  false  false
 false   true  false  false  false     false  false  false  false  false
 false  false   true  false  false     false  false  false  false  false
 false  false  false   true  false     false  false  false  false  false
 false  false  false  false   true     false  false  false  false  false
 false  false  false  false  false  …  false  false  false  false  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false  …  false  false  false  false  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false      true  false  false  false  fals

In [21]:
#Ys = readdlm("y_results.csv");
labelset = unique(Ys)
#dataset = [(onehotbatch(x, alphabet, '\n'), onehot(Ys[i],labelset))
#           for i in 1:length(Ys) for x in Xs[i]] |> shuffle
dataset = shuffle([(Xs_vec[i], onehot(Ys[i], labelset)) for i in 1:length(Xs_vec)])
Ys = last.(dataset)

# Pad sequences to equal lengths
#Xs_padded = [hcat(x,repeat(stop,1,seq_len-size(x)[1])) for x in first.(dataset)]
Xs_padded = [hcat(x[1:25]...) for x in first.(dataset)]
map(length, Xs_padded)

3000-element Array{Int64,1}:
 72500
 72500
 72500
 72500
 72500
 72500
 72500
 72500
 72500
 72500
 72500
 72500
 72500
     ⋮
 72500
 72500
 72500
 72500
 72500
 72500
 72500
 72500
 72500
 72500
 72500
 72500

In [22]:
# There are 972,290 items in our data. We use a train:test split of 90:10, stratified to ensure we have 
# the same share of "bad" and "good" traces in our train and test sets.

(Xtrain, Ytrain), (Xtest, Ytest) = stratifiedobs((Xs_padded, Ys), p=0.9)

train = [(Xtrain[i], Ytrain[i]) for i in 1:length(Ytrain)];
test = [(Xtest[i], Ytest[i]) for i in 1:length(Ytest)];


In [25]:
train[1][1]

58×1250 Array{Bool,2}:
  true  false  false  false  false  …  false  false  false  false  false
 false   true  false  false  false     false  false  false  false  false
 false  false   true  false  false     false  false  false  false  false
 false  false  false   true  false     false  false  false  false  false
 false  false  false  false   true     false  false  false  false  false
 false  false  false  false  false  …  false  false  false  false  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false      true  false  false  false   true
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false  …  false  false  false  false  false
 false  false  false  false  false     false  false   true  false  false
 false  false  false  false  false     false  false  false   true  false
     ⋮                      

In [26]:
# We set up our model architecture

scanner = Chain(Dense(length(alphabet), seq_len, σ), LSTM(seq_len, seq_len))
encoder = Dense(seq_len, 2)

function model(x)
  state = scanner.([x])[end]
  Flux.reset!(scanner)
  softmax(encoder(state))
end

loss(tup...) = begin
    #@show typeof.(tup)
    #@show size.(tup)
    crossentropy(model(tup[1]), tup[2])
end
accuracy(tup...) = mean(argmax(model(tup[1])) .== argmax(tup[2]))

opt = ADAM(0.000001)
ps = params(scanner, encoder)
#ps = params(model)

Params([Float32[0.092204 0.12312 … -0.0509083 0.141771; -0.154509 -0.0309254 … -0.146467 -0.114301; … ; 0.229347 -0.216933 … 0.152333 0.0923766; -0.0323819 0.235028 … 0.196169 -0.203647] (tracked), Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] (tracked), Float32[-0.00201754 0.0422538 … -0.0476042 0.0998157; -0.03872 0.0183181 … -0.121725 0.0561775; … ; -0.000976615 -0.00398163 … -0.0539271 -0.0958087; 0.0658745 -0.128838 … -0.0471692 0.118311] (tracked), Float32[-0.00880199 0.094397 … 0.132086 -0.106352; 0.113541 -0.147772 … -0.0814755 -0.0401412; … ; 0.0469816 -0.15177 … -0.0944487 -0.151429; -0.0274256 -0.153434 … 0.0438874 0.0299447] (tracked), Float32[-0.114926, 0.148934, 0.047811, 0.0609061, 0.137653, 0.0148457, 0.0898842, 0.0250348, -0.106333, -0.0821758  …  0.0843505, 0.133302, 0.0510322, 0.0745461, -0.136328, -0.0389513, -0.0623016, -0.0483884, -0.0873799, 0.167702] (tracked), Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0,

In [27]:
# Finally, we set up our callbacks for reporting on training progress.
mean(x) = sum(x)/length(x)
#testacc() = mean(accuracy(t) for t in test)
testloss() = mean(loss(t...) for t in test)

evalcb = () -> @show testloss()#, testacc()


#45 (generic function with 1 method)

In [43]:
# Now, train!
Flux.train!(loss, ps, train, opt, cb = throttle(evalcb, 10))


testloss() = 781.3041f0 (tracked)
testloss() = 776.22095f0 (tracked)
testloss() = 772.70044f0 (tracked)
testloss() = 768.3754f0 (tracked)
testloss() = 763.0578f0 (tracked)
testloss() = 758.77075f0 (tracked)
testloss() = 753.8968f0 (tracked)
testloss() = 748.3758f0 (tracked)
testloss() = 744.4518f0 (tracked)
testloss() = 738.4357f0 (tracked)


In [29]:
ps

Params([Float32[0.0940121 0.124935 … -0.0509083 0.141771; -0.156174 -0.0326008 … -0.146467 -0.114301; … ; 0.227655 -0.218624 … 0.152333 0.0923766; -0.0340876 0.233315 … 0.196169 -0.203647] (tracked), Float32[0.00181466, -0.00167447, 0.00206472, 0.00261433, 0.00243351, 0.00178143, -0.0016603, -0.00173892, -0.00125623, 0.0017878  …  -0.00171839, -0.00162966, 0.000603553, 0.00186516, 0.0018682, 0.001819, -0.00171817, -0.00153048, -0.00169567, -0.00171098] (tracked), Float32[-8.42247e-5 0.0441859 … -0.045672 0.101749; -0.0368795 0.0201577 … -0.119885 0.0580172; … ; 0.000853458 -0.00215264 … -0.0520977 -0.0939795; 0.0643062 -0.130405 … -0.0487364 0.116744] (tracked), Float32[-0.0115358 0.0971321 … 0.134828 -0.109086; 0.110875 -0.145105 … -0.0788017 -0.0428075; … ; 0.044323 -0.14911 … -0.0917821 -0.154088; -0.0249757 -0.155885 … 0.041429 0.0323947] (tracked), Float32[-0.112993, 0.150775, 0.0499197, 0.0626871, 0.139467, 0.0166777, 0.088253, 0.0233359, -0.104539, -0.0838754  …  0.0826611, 0.13

In [30]:
#model.(Xs_padded)[:,end]

In [44]:
sumη = [0.0,0.0]
for i in 1:300
    yhat = model(train[i][1])[:,end]
    η = crossentropy(yhat, train[i][2])
    sumη += η.*train[i][2]
    υ = Flux.onecold(train[i][2])
    logods = round(Flux.data(yhat[1]/yhat[2]);digits=3)
    println("$υ\t$logods\t$η")
end
ηbar = sumη./sum(train[i][2] for i in 1:length(train))


1	1.471	0.51874804f0 (tracked)
1	1.462	0.5211691f0 (tracked)
1	1.455	0.5230118f0 (tracked)
1	1.456	0.5227492f0 (tracked)
1	1.496	0.5118083f0 (tracked)
2	1.476	0.9068327f0 (tracked)
2	1.452	0.89673f0 (tracked)
1	1.452	0.5240819f0 (tracked)
1	1.456	0.5227492f0 (tracked)
1	1.462	0.5211691f0 (tracked)
1	1.452	0.5240819f0 (tracked)
1	1.456	0.5228054f0 (tracked)
1	1.456	0.5227492f0 (tracked)
1	1.451	0.5243726f0 (tracked)
1	1.471	0.51874804f0 (tracked)
2	1.462	0.90112954f0 (tracked)
1	1.456	0.5227492f0 (tracked)
1	1.455	0.5230118f0 (tracked)
1	1.452	0.5240819f0 (tracked)
1	1.469	0.51913625f0 (tracked)
1	1.452	0.5240819f0 (tracked)
1	1.451	0.5243726f0 (tracked)
1	1.461	0.52153325f0 (tracked)
1	1.469	0.51913625f0 (tracked)
1	1.462	0.5211691f0 (tracked)
1	1.446	0.5255746f0 (tracked)
1	1.44	0.527337f0 (tracked)
1	1.455	0.5230118f0 (tracked)
1	1.496	0.5118083f0 (tracked)
1	1.461	0.52153325f0 (tracked)
2	1.462	0.9010203f0 (tracked)
1	1.46	0.52179074f0 (tracked)
2	1.448	0.8954526f0 (tracked)
2	1.442

Tracked 2-element Array{Float64,1}:
 0.056582545202871655
 0.11129543280550666 

In [33]:
map(x->size(x[end]), Xs_padded)

3000-element Array{Tuple{},1}:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ⋮ 
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

In [143]:
train[1][1]

58×1500 Array{Bool,2}:
  true  false  false  false  false  …  false  false  false  false  false
 false   true  false  false  false     false  false  false  false  false
 false  false   true  false  false     false  false  false  false  false
 false  false  false   true  false     false  false  false  false  false
 false  false  false  false   true     false  false  false  false  false
 false  false  false  false  false  …  false  false  false  false  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false  …  false  false  false  false  false
 false  false  false  false  false     false  false  false  false  false
 false  false  false  false  false     false  false  false  false  false
     ⋮                      