# Introduction to Machine Learning in Julia with MLJ

Welcome to this little Jupyter Notebook for getting to know MLJ, the goto ML platform within Julia.

To start with, take a look at [MLJ's github page](https://github.com/alan-turing-institute/MLJ.jl):
* super well organized: own [Github Organization "JuliaAI"](https://github.com/JuliaAI)
* well maintained and supported: see the maintainers and support below

> -----------------------------
>
> <div align="center">
>     <img src="https://github.com/alan-turing-institute/MLJ.jl/raw/dev/material/MLJLogo2.svg" alt="MLJ" width="200">
> </div>
> 
> <h2 align="center">A Machine Learning Framework for Julia
> </h2>
> 
> 
> MLJ (Machine Learning in Julia) is a toolbox written in Julia
> providing a common interface and meta-algorithms for selecting,
> tuning, evaluating, composing and comparing over [160 machine learning
> models](https://alan-turing-institute.github.io/MLJ.jl/dev/list_of_supported_models/)
> written in Julia and other languages.
> 
> **New to MLJ?** Start [here](https://alan-turing-institute.github.io/MLJ.jl/dev/).
> 
> **Integrating an existing machine learning model into the MLJ
> framework?** Start [here](https://alan-turing-institute.github.io/MLJ.jl/dev/quick_start_guide_to_adding_models/).
> 
> MLJ was initially created as a Tools, Practices and Systems project at
> the [Alan Turing Institute](https://www.turing.ac.uk/)
> in 2019. Current funding is provided by a [New Zealand Strategic
> Science Investment
> Fund](https://www.mbie.govt.nz/science-and-technology/science-and-innovation/funding-information-and-opportunities/investment-funds/strategic-science-investment-fund/ssif-funded-programmes/university-of-auckland/)
> awarded to the University of Auckland.
> 
> MLJ been developed with the support of the following organizations:
> 
> <div align="center">
>     <img src="https://github.com/alan-turing-institute/MLJ.jl/raw/dev/material/Turing_logo.png" width = 100/>
>     <img src="https://github.com/alan-turing-institute/MLJ.jl/raw/dev/material/UoA_logo.png" width = 100/>
>     <img src="https://github.com/alan-turing-institute/MLJ.jl/raw/dev/material/IQVIA_logo.png" width = 100/>
>     <img src="https://github.com/alan-turing-institute/MLJ.jl/raw/dev/material/warwick.png" width = 100/>
>     <img src="https://github.com/alan-turing-institute/MLJ.jl/raw/dev/material/julia.png" width = 100/>
> </div>
> 
> 
> ### The MLJ Universe
> 
> The functionality of MLJ is distributed over a number of repositories
> illustrated in the dependency chart below. These repositories live at
> the [JuliaAI](https://github.com/JuliaAI) umbrella organization.
> 
> <div align="center">
>     <img src="https://github.com/alan-turing-institute/MLJ.jl/raw/dev/material/MLJ_stack.svg" alt="Dependency Chart">
> </div>
> 
> *Dependency chart for MLJ repositories. Repositories with dashed
> connections do not currently exist but are planned/proposed.*
> 
> <br>
> <p align="center">
> <a href="CONTRIBUTING.md">Contributing</a> &nbsp;•&nbsp; 
> <a href="ORGANIZATION.md">Code Organization</a> &nbsp;•&nbsp;
> <a href="ROADMAP.md">Road Map</a> 
> </br>
> 
> #### Contributors
> 
> *Core design*: A. Blaom, F. Kiraly, S. Vollmer
> 
> *Lead contributor*: A. Blaom
> 
> *Active maintainers*: A. Blaom, S. Okon, T. Lienart, D. Aluthge
> 
>
> ------------------------

Disclaimer: Many examples and text snippets are taken directly from documentation and examples provided by MLJ.

# Let's jump into it: Supervised Learning

In [None]:
using MLJ

### Loading a Machine Learning Model

In [None]:
@iload DecisionTreeClassifier  # interactive model loading

In [None]:
@load DecisionTreeClassifier pkg=DecisionTree  # declaritive model loading
tree = DecisionTreeClassifier()  # instance

MLJ is essentially a big wrapper providing unified access to other packages containing the models

### Loading Data

In [None]:
import RDatasets
iris = RDatasets.dataset("datasets", "iris"); # a DataFrame
y, X = unpack(iris, ==(:Species), colname -> true); # y = a vector, and X = a DataFrame 
first(X, 3) |> pretty

In [None]:
?unpack

----------------
### Fit & Predict

In [None]:
mach = machine(tree, X, y)  # adding a mutable cache to the model+data for performant training 

In [None]:
train, test = partition(eachindex(y), 0.7, shuffle=false); # 70:30 split

In [None]:
fit!(mach, rows=train)
yhat = predict(mach, X[test,:]);
yhat[3:5]

In [None]:
using Distributions
isa(yhat[1], Distribution)

In [None]:
Distributions.mode.(yhat[3:5])

In [None]:
log_loss(yhat, y[test]) |> mean

In [None]:
measures()

In [None]:
for m in measures()
    if "log_loss" in m.instances
        display(m)
    end
end

### Evaluate = auto fit/predict

In [None]:
mach = machine(tree, X, y)  # adding a mutable cache to the model for performant training 
evaluate!(mach, resampling=Holdout(fraction_train=0.7, shuffle=false),
    measures=[log_loss, brier_score], verbosity=0)

In [None]:
tree.max_depth = 3
evaluate!(mach, resampling=CV(shuffle=true), measure=[accuracy, balanced_accuracy], operation=predict_mode, verbosity=0)

# Unsupervised Learning: fit!, transform, inverse_transform

In [None]:
v = [1, 2, 3, 4]
mach2 = machine(UnivariateStandardizer(), v)
fit!(mach2)
w = transform(mach2, v)

In [None]:
inverse_transform(mach2, w)

# Model Registry

MLJ has a model registry, allowing the user to search models and their properties.

In [None]:
models(matching(X,y))

In [None]:
?models

In [None]:
info("DecisionTreeClassifier", pkg="DecisionTree")

--------------------------------

# MLJ features


MLJ (Machine Learning in Julia) is a toolbox written in Julia
providing a common interface and meta-algorithms for selecting,
tuning, evaluating, composing and comparing machine learning models
written in Julia and other languages. In particular MLJ wraps a large
number of [scikit-learn](https://scikit-learn.org/stable/) models. 


* Data agnostic, train models on any data supported by the
  [Tables.jl](https://github.com/JuliaData/Tables.jl) interface,

* Extensive support for model composition (*pipelines* and *learning
  networks*),

* Convenient syntax to tune and evaluate (composite) models.

* Consistent interface to handle probabilistic predictions.

* Extensible [tuning
  interface](https://github.com/alan-turing-institute/MLJTuning.jl),
  to support growing number of optimization strategies, and designed
  to play well with model composition.


More information is available from the [MLJ design paper](https://github.com/alan-turing-institute/MLJ.jl/blob/master/paper/paper.md)

# Model composition

MLJ supports extremely flexible and multi-purpose model composition. It is described in detail in [the documentation](https://alan-turing-institute.github.io/MLJ.jl/dev/composing_models/) or a [respective paper](https://arxiv.org/abs/2012.15505).

These compositions are called "learning networks" by MLJ, and the best place to start with them is a [learning-networks-tutorial by MLJ](https://alan-turing-institute.github.io/DataScienceTutorials.jl/getting-started/learning-networks/).

In [None]:
# TODO explore the learning-networks-tutorial and see that it works

# Thank you for being here

further information:
* MLJ repository: https://github.com/alan-turing-institute/MLJ.jl
* MLJ docs: https://alan-turing-institute.github.io/MLJ.jl/dev/
* MLJ tutorials: https://alan-turing-institute.github.io/DataScienceTutorials.jl/

In case you have more questions or suggestions, always feel welcome to reach out to me at Meetup and Julia User Group Munich, or directly at Stephan.Sahm@gmx.de