# Shogun in Scala #

Shogun is and open-source machine learning library that offers a wide range of efficient and unified machine learning methods. 

It is implemented in C++ and provides the necessary java integration so that it can be used in any language which is based on the JVM:
- Java
- Scala
- Groovy
- Kotlin
- etc

Here is a short introduction which shows how to use Shogun in Scala using a GaussianNaiveBayes for predicting IRIS data. 
We use Jupyter with the BeakerX (http://beakerx.com) kernel.

## Setup ##
Before you can start you need to install the Shogun binaries with 'conda install -c pschatzmann shogun-jvm'.


Unfortunalty Shogun is not available via Maven. In order to simplify the usage of Shogun in any JVM environment I crafted the  Shogun-JVM project which provides the binaries in conda and the jars via Maven. You can use these java libraries starting from JDK 1.8. 

We also add DL4J to simplify the pre-processing of the data.

In [46]:
%classpath config resolver maven-public http://pschatzmann.ch/repository/maven-public/
%%classpath add mvn 
org.shogun:shogun-jvm:0.0.1-SNAPSHOT

We add the import for Shogun so that we can use the classes without package prefixes:

In [47]:
// shogun
import org.jblas._
import org.shogun._


import org.jblas._
import org.shogun._


Here we use shogun-jvm to load the correct libshogun.so.
Alternativly could try to set the LD_LIBRARY_PATH, DYLD_LIBRARY_PATH, java.library.path before the JVM is started or call System load() yourself

In [48]:
ShogunNative.load()

## Loading of the Shogun Data ##
After we have prepared our data we can load it from the files. For the deatils how to setup the data can check the document DataSetup.ipnb. If we already have DoubleMatrix objects we can pass them int constructor of the Feautures and Labels.

In [49]:
var shogunFeaturesTrain =  new RealFeatures(new CSVFile("iris_train_features.csv"))
var shogunLabelsTrain =  new MulticlassLabels(new CSVFile("iris_train_labels.csv"))

var shogunFeaturesTest =  new RealFeatures(new CSVFile("iris_test_features.csv"))
var shogunLabelsTest =  new MulticlassLabels(new CSVFile("iris_test_labels.csv"))

shogunLabelsTest.get_labels()

[0.000000, 0.000000, 0.000000, 1.000000, 0.000000, 1.000000, 1.000000, 0.000000, 1.000000, 0.000000, 2.000000, 2.000000, 0.000000, 2.000000, 2.000000, 1.000000, 0.000000, 2.000000, 1.000000, 2.000000, 1.000000, 1.000000, 2.000000]

We double check the structure of the data in shogun:

In [50]:
println(shogunFeaturesTrain.get_num_features+" / " +shogunFeaturesTrain.get_num_vectors)
println("1 / "+shogunLabelsTrain.get_num_labels)
println("--------")
println(shogunFeaturesTest.get_num_features +" / " +shogunFeaturesTest.get_num_vectors)
println("1 / "+shogunLabelsTest.get_num_labels)

4 / 127
1 / 127
--------
4 / 23
1 / 23


In [51]:
shogunFeaturesTest.get_feature_vector(0)

[5.400000, 3.400000, 1.500000, 0.400000]

## Classify and Predict ##


We train the MulticlassLogisticRegression:

In [52]:
var classifier = new GaussianNaiveBayes(shogunFeaturesTrain, shogunLabelsTrain)

classifier.train()

true

In [53]:
var predictedLabels = classifier.apply_multiclass(shogunFeaturesTest)

predictedLabels.get_labels()

[0.000000, 0.000000, 0.000000, 1.000000, 0.000000, 1.000000, 1.000000, 0.000000, 1.000000, 0.000000, 2.000000, 2.000000, 0.000000, 2.000000, 2.000000, 1.000000, 0.000000, 2.000000, 1.000000, 2.000000, 1.000000, 1.000000, 2.000000]

In [54]:
shogunLabelsTest.get_labels()

[0.000000, 0.000000, 0.000000, 1.000000, 0.000000, 1.000000, 1.000000, 0.000000, 1.000000, 0.000000, 2.000000, 2.000000, 0.000000, 2.000000, 2.000000, 1.000000, 0.000000, 2.000000, 1.000000, 2.000000, 1.000000, 1.000000, 2.000000]

Finally we can calculate the accuracy of our prediction:

In [55]:
var eval = new MulticlassAccuracy()
var accuracy = eval.evaluate(predictedLabels, shogunLabelsTest)


1.0