# Inspection of Intermediate Results

In iterative algorithms like DSEA, IBU, and RUN you may want to inspect intermediate results. This tutorial assumes you already know the other notebook at `doc/01-getting-started.ipynb`.

In [1]:
using CherenkovDeconvolution
using ScikitLearn, MLDataUtils, Random
using Discretizers: encode, CategoricalDiscretizer

# load the example data, encode labels with integers
X, y_labels, _ = load_iris()
y = encode(CategoricalDiscretizer(y_labels), y_labels)

# split the data into training and observed data sets
Random.seed!(42) # make split reproducible
(X_train, y_train), (X_data, y_data) = splitobs(shuffleobs((X', y), obsdim = 1), obsdim = 1)

# discretize the feature space
td = DeconvLearn.TreeDiscretizer(X_train, y_train, 3) # obtain up to 3 clusters
x_train = encode(td, X_train)
x_data  = encode(td, X_data)

# also prepare the classifier for DSEA
@sk_import naive_bayes : GaussianNB
tp_function = DeconvLearn.train_and_predict_proba(GaussianNB());

## Inspection in DSEA

Inspection is realized through the keyword argument `inspect` available in all iterative algorithms. This argument accept a `Function` object, which will be called in each iteration of the deconvolution algorithm. Depending on the algorithm, this `Function` object has to have different signatures.

For DSEA, the `inspect` function has to have the following signature:

    (f_k::Vector, k::Int, chi2s::Float64, alpha::Float64) -> Any

You do not have to stick to the parameter names (`f_k`, `k`, etc) and you do not even have to specify the types of the arguments explicitly. However, these are the types that the parameters will have, so do not expect anything else. The return value of the `inspect` function is never used, so you can return any value, including `nothing`.

The first parameter of `inspect`, `f_k`, refers to the intermediate result of the `k`-th iteration. `chi2s` is the Chi-Square distance between `f_k` and the previous estimate. This distance may be used to check convergence. `alpha` is the step size used in the `k`-th iteration of DSEA.

In [2]:
# we want to store all inspection results in a single DataFrame
using DataFrames
df = DataFrame(f=Vector{Float64}[], k=Int[], chi2s=Float64[], a=Float64[]) # empty frame with fixed types

# set up the inspection function
inspect_function = (f, k, chi2s, a) -> push!(df, [f, k, chi2s, a]) # store results in df

# provide inspect_function as a keyword argument to DSEA, make 3 iterations and return the final result
f_dsea = dsea(X_data, X_train, y_train, tp_function,
              K=3, inspect=inspect_function)

┌ Info: DSEA iteration 1/3 uses alpha = 1.0 (chi2s = 0.0028011676660929232)
└ @ CherenkovDeconvolution /home/bunse/.julia/dev/CherenkovDeconvolution/src/methods/dsea.jl:154
┌ Info: DSEA iteration 2/3 uses alpha = 1.0 (chi2s = 1.9904724192671642e-5)
└ @ CherenkovDeconvolution /home/bunse/.julia/dev/CherenkovDeconvolution/src/methods/dsea.jl:154
┌ Info: DSEA iteration 3/3 uses alpha = 1.0 (chi2s = 1.4392542975822753e-7)
└ @ CherenkovDeconvolution /home/bunse/.julia/dev/CherenkovDeconvolution/src/methods/dsea.jl:154


3-element Array{Float64,1}:
 0.3333333327459093 
 0.35690069352953485
 0.30976597372455583

In [3]:
# let's have a look at the DataFrame - beautiful, isn't it?
df

Unnamed: 0_level_0,f,k,chi2s,a
Unnamed: 0_level_1,Array…,Int64,Float64,Float64
1,"[0.333333, 0.333333, 0.333333]",0,,
2,"[0.333333, 0.354929, 0.311738]",1,0.00280117,1.0
3,"[0.333333, 0.356746, 0.30992]",2,1.99047e-05,1.0
4,"[0.333333, 0.356901, 0.309766]",3,1.43925e-07,1.0


## Inspection in RUN and IBU

IBU is inspected just like DSEA. The `inspect` function of RUN, however, has a different signature:

    (f_k::Array, k::Int, ldiff::Float64, tau::Float64) -> Any

Here, `ldiff` stores the difference in the likelihood loss of RUN between two iterations. `tau` is the regularization parameter chosen in the `k`-th iteration.

In [4]:
# set up an inspection function for IBU - do not store anything, just print
inspect_ibu = (f, k, chi2s, alpha) -> println("This is iteration $k with chi2s=$chi2s")

f_ibu = ibu(x_data, x_train, y_train, inspect=inspect_ibu) # by default, IBU stops after 3 iterations

This is iteration 0 with chi2s=NaN
This is iteration 1 with chi2s=0.0010670935040682943
This is iteration 2 with chi2s=4.204562475856572e-5
This is iteration 3 with chi2s=1.78948873920693e-6


3-element Array{Float64,1}:
 0.3333333333333333 
 0.31680990030938466
 0.349856766357282  