## An introduction to MLJ

This is a first experiment to see how the MLJ library from the Alan Turing Institute fits together. Later I want to have a look at how it compares with other APIs offered in Julia aswell as the `Flux` library.

Cheat sheet for MLJ can be found [here](https://github.com/alan-turing-institute/MLJ.jl/blob/master/docs/src/mlj_cheatsheet.md)

[Tutorial to follow](https://github.com/alan-turing-institute/MLJ.jl/blob/master/examples/xgboost.jl)

In [44]:
Pkg.add("MLJModels")

[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.0/Project.toml`
 [90m [d491faf4][39m[92m + MLJModels v0.2.5[39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.0/Manifest.toml`
[90m [no changes][39m


In [None]:
using MLJ

In [7]:
#using Pkg
#Pkg.add("MLJ")
Pkg.add("DataFrames")
Pkg.add("Statistics")
import MLJ

[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.0/Project.toml`
 [90m [a93c6f00][39m[92m + DataFrames v0.18.4[39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.0/Manifest.toml`
[90m [no changes][39m
[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.0/Project.toml`
 [90m [10745b16][39m[92m + Statistics [39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.0/Manifest.toml`
[90m [no changes][39m


The code below is taken from the MLJ github documentation - I will probably modify it to work out what is going on in various sections. 

In [8]:
using MLJ
using DataFrames, Statistics

Xraw = rand(300,3)
y = exp.(Xraw[:,1] - Xraw[:,2] - 2Xraw[:,3] + 0.1*rand(300))
X = DataFrame(Xraw)

train, test = partition(eachindex(y), 0.70); # 70:30 split


Rand is used to generate random arrays.
DataFrame (obviously) creates a dataframe, documentation shown below

In [40]:
X[1:5,1:2]

Unnamed: 0_level_0,x1,x2
Unnamed: 0_level_1,Float64,Float64
1,0.264755,0.107134
2,0.674143,0.53762
3,0.643489,0.859872
4,0.867383,0.164025
5,0.368138,0.558377


In [43]:
X[1:10,:]

Unnamed: 0_level_0,x1,x2,x3
Unnamed: 0_level_1,Float64,Float64,Float64
1,0.264755,0.107134,0.967214
2,0.674143,0.53762,0.649284
3,0.643489,0.859872,0.25699
4,0.867383,0.164025,0.431323
5,0.368138,0.558377,0.900089
6,0.835311,0.758922,0.598778
7,0.48551,0.705731,0.105429
8,0.969575,0.772435,0.784534
9,0.422349,0.302194,0.671541
10,0.00892118,0.687196,0.844213


In [27]:
rand(1:0.1:100, 10, 5)

10×5 Array{Float64,2}:
 76.5  41.3   9.9  63.7  53.8
 60.8  71.4  91.3  65.3  99.7
 90.0  33.6  69.9  99.5  83.5
 78.6  62.5  39.2  26.7  18.4
 72.2  57.0  99.9  61.1  26.7
  9.2  10.5  36.6  28.8   5.6
 56.2  49.3  65.4  69.6  44.9
 28.9  32.4  75.8  25.3  66.6
 48.2  74.7  83.0  74.0  42.7
 42.4  33.5  88.2   4.5  48.3

In [18]:
knn_model=KNNRegressor(K=10)

KNNRegressor(K = 10,
             metric = MLJ.KNN.euclidean,
             kernel = MLJ.KNN.reciprocal,)[34m @ 1…63[39m

In [19]:
knn = machine(knn_model, X, y)

[34mMachine{KNNRegressor} @ 3…88[39m


In [20]:
fit!(knn, rows=train)
yhat = predict(knn, X[test,:])
rms(yhat, y[test])

┌ Info: Training [34mMachine{KNNRegressor} @ 3…88[39m.
└ @ MLJ /home/joe/.julia/packages/MLJ/XYSFt/src/machines.jl:135


0.11248451413808438

In [21]:
evaluate!(knn, resampling=Holdout(fraction_train=0.7), measure=rms)

┌ Info: Evaluating using a holdout set. 
│ fraction_train=0.7 
│ shuffle=false 
│ measure=MLJ.rms 
│ operation=StatsBase.predict 
│ Resampling from all rows. 
└ @ MLJ /home/joe/.julia/packages/MLJ/XYSFt/src/resampling.jl:92


0.11248451413808438

In [22]:
knn_model.K = 20
evaluate!(knn, resampling=Holdout(fraction_train=0.7))  # `default_measure(knn) == rms` so `measure` kwarg can be dropped

┌ Info: Evaluating using a holdout set. 
│ fraction_train=0.7 
│ shuffle=false 
│ measure=MLJ.rms 
│ operation=StatsBase.predict 
│ Resampling from all rows. 
└ @ MLJ /home/joe/.julia/packages/MLJ/XYSFt/src/resampling.jl:92


0.1322170417086064

In [None]:
## Homogenous ensembles

It seems very easy to create ensembles in Julia.