Skip to content
This repository has been archived by the owner on Nov 15, 2020. It is now read-only.

Latest commit

 

History

History
116 lines (74 loc) · 3.4 KB

README.md

File metadata and controls

116 lines (74 loc) · 3.4 KB

Spark SMILE

License Build Status codecov Maven Central

Deprecated repository, all features have been upstreamed to the official SMILE repository.

Repository for better integration of Spark MLLib Pipelines and SMILE library.

Setup

Download the dependency from Maven Central

SBT

libraryDependencies += "com.github.pierrenodet" %% "spark-smile" % "0.0.2"

Maven

<dependency>
  <groupId>com.github.pierrenodet</groupId>
  <artifactId>spark-smile_2.12</artifactId>
  <version>0.0.2</version>
</dependency>

What's inside

This repository contains :

  • Distributed GridSearch of SMILE trainer with Spark
  • Integration of SMILE with Spark MLLib Pipelines
  • Seamless interoperability between SMILE and Spark DataFrames

How to use

Distributed GridSearch

val spark = SparkSession.builder().master("local[*]").getOrCreate()

val mushrooms = read.arff("data/mushrooms.arff")

val x = mushrooms.select(1,22).toArray
val y = mushrooms("class").toIntArray

sparkgscv(spark)(5, x, y, Seq(new Accuracy()): _*) { (x, y) => knn(x, y, 3) }

From Spark DataFrame to SMILE DataFrame

import org.apache.spark.smile.implicits._

val mushrooms = spark.read.format("libsvm").load("data/mushrooms.svm")

val x = mushrooms.toSmileDF().select("features").map(t=>t.getArray[AnyRef](0).map(_.asInstanceOf[Double])).toArray
val y = mushrooms.toSmileDF().apply("label").toDoubleArray.map(_.toInt-1)

val res = classification(5, x, y, Seq(new Accuracy()): _*) { (x, y) => knn(x, y, 3) }

println(res(0))

From SMILE DataFrame to Spark DataFrame

import org.apache.spark.smile.implicits._

val spark = SparkSession.builder().master("local[*]").getOrCreate()

val mushrooms = read.arff("data/mushrooms.arff").omitNullRows().toSparkDF(spark)

mushrooms.show()

Use SMILE Classifier (or Regressor) in Spark MLLib Pipeline

val raw = spark.read.format("libsvm").load("data/mushrooms.svm")

val scl = new SmileClassifier()
  .setTrainer({ (x, y) => knn(x, y, 3) })

val bce = new BinaryClassificationEvaluator()
  .setLabelCol("label")
  .setRawPredictionCol("rawPrediction")

val model = scl.fit(data)

println(bce.evaluate(model.transform(data)))

model.write.overwrite().save("/tmp/bonjour")
val loaded = SmileClassificationModel.load("/tmp/bonjour")
println(bce.evaluate(loaded.transform(data)))

Contributing

Feel free to open an issue or make a pull request to contribute to the repository.

Authors

See also the list of contributors who participated in this project.

License

This project is licensed under the Apache License Version 2.0 - see the LICENSE file for details.