forked from Fokko/spark-stochastic-outlier-selection
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
removing Apache Spark dependencies from SOS implementation
.. and some other minor adaptions
- Loading branch information
gni
committed
Aug 2, 2019
1 parent
bd9f6d0
commit 504001e
Showing
6 changed files
with
93 additions
and
101 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
language: scala | ||
scala: | ||
- 2.11.7 | ||
- 2.11.8 | ||
script: | ||
- sbt clean coverage test | ||
after_success: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,28 @@ | ||
Stochastic Outlier Selection on Apache Spark | ||
Stochastic Outlier Selection in Scala | ||
============================ | ||
|
||
[![Codacy Badge](https://www.codacy.com/project/badge/9069624e46ac4d97bb19a34705f95965)](https://www.codacy.com) | ||
[![Build Status](https://travis-ci.org/Fokko/spark-stochastic-outlier-selection.svg?branch=master)](https://travis-ci.org/Fokko/spark-stochastic-outlier-selection) | ||
[![Coverage Status](https://coveralls.io/repos/Fokko/spark-stochastic-outlier-selection/badge.svg?branch=master&service=github)](https://coveralls.io/github/Fokko/spark-stochastic-outlier-selection?branch=master) | ||
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/df5fc23eb5b74795b62d0daa52436a0d)](https://www.codacy.com/app/Gnni/scala-stochastic-outlier-selection?utm_source=github.com&utm_medium=referral&utm_content=Gnni/scala-stochastic-outlier-selection&utm_campaign=Badge_Grade) | ||
[![Build Status](https://travis-ci.org/Gnni/scala-stochastic-outlier-selection.svg?branch=master)](https://travis-ci.org/Gnni/scala-stochastic-outlier-selection) | ||
[![Coverage Status](https://coveralls.io/repos/github/Gnni/scala-stochastic-outlier-selection/badge.svg?branch=master)](https://coveralls.io/github/Gnni/scala-stochastic-outlier-selection?branch=master) | ||
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/frl.driesprong/spark-stochastic-outlier-selection_2.11/badge.svg)](https://maven-badges.herokuapp.com/maven-central/frl.driesprong/spark-stochastic-outlier-selection_2.11) | ||
|
||
Stochastic Outlier Selection (SOS) is an unsupervised outlier selection algorithm. It uses the concept of affinity to compute an outlier probability for each data point. | ||
Adapted version of the implementation for Apache Spark. This versions | ||
aims to perform Stochastic Outlier Selection (SOS) using Scala only, | ||
i.e., w/o the need of any Apache Spark resources. | ||
|
||
For more information about SOS, see the technical report: J.H.M. Janssens, F. Huszar, E.O. Postma, and H.J. van den Herik. [Stochastic Outlier Selection](https://github.com/jeroenjanssens/sos/blob/master/doc/sos-ticc-tr-2012-001.pdf?raw=true). Technical Report TiCC TR 2012-001, Tilburg University, Tilburg, the Netherlands, 2012. | ||
SOS is an unsupervised outlier selection algorithm. It uses the concept of affinity to compute an outlier probability for each data point. | ||
|
||
For more information about SOS, see the technical report: J.H.M. | ||
Janssens, F. Huszar, E.O. Postma, and H.J. van den Herik. | ||
[Stochastic Outlier Selection](https://github.com/jeroenjanssens/sos/blob/master/doc/sos-ticc-tr-2012-001.pdf?raw=true). | ||
Technical Report TiCC TR 2012-001, Tilburg University, Tilburg, the | ||
Netherlands, 2012. | ||
|
||
Selecting outliers from data | ||
---------------------------------------- | ||
|
||
The current implementation accepts RDD's of the type `Array[Double]` and returns the indexes of the vector with it's degree of outlierness. | ||
The current implementation accepts an Array with elements of the type | ||
`Array[Double]` and returns the indexes of the vector with it's degree | ||
of outlierness. | ||
|
||
Current implementation only works with Euclidean distance, but this will be extended in the foreseeable future. | ||
Current implementation only works with Euclidean distance. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
24 changes: 7 additions & 17 deletions
24
src/main/scala/frl/driesprong/outlierdectection/EvaluateOutlierDetection.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,29 +1,19 @@ | ||
package frl.driesprong.outlierdectection | ||
|
||
import org.apache.spark.{SparkConf, SparkContext} | ||
|
||
object EvaluateOutlierDetection { | ||
|
||
def main(args: Array[String]) { | ||
val conf = new SparkConf() | ||
.setMaster("local[*]") | ||
.setAppName("Stochastic Outlier Selection") | ||
|
||
val sc = new SparkContext(conf) | ||
|
||
val toyDataset = Array( | ||
(0L, Array(1.00, 1.00)), | ||
(1L, Array(3.00, 1.25)), | ||
(2L, Array(3.00, 3.00)), | ||
(3L, Array(1.00, 3.00)), | ||
(4L, Array(2.25, 2.25)), | ||
(5L, Array(8.00, 2.00)) | ||
val testDataset = Array( | ||
(Array(1.00, 1.00)), | ||
(Array(2.00, 1.00)), | ||
(Array(1.00, 2.00)), | ||
(Array(2.00, 2.00)), | ||
(Array(5.00, 8.00)) | ||
) | ||
|
||
StochasticOutlierDetection.performOutlierDetection( sc.parallelize(toyDataset) ).foreach( x => | ||
StochasticOutlierDetection.performOutlierDetection(testDataset, perplexity = 3.0).foreach( x => | ||
System.out.println(x._1 + " : " + x._2) | ||
) | ||
|
||
sc.stop() | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters