CSUEB - CSR Project - Fall 2017 - Gui Larange
Outliers in Scala. Import Breeze Statistics Library 

In [12]:
import scala.math
import breeze.stats._
import breeze.linalg._
import scala.annotation.tailrec
import scala.collection.parallel.ParSeq

In [4]:
val  gau = Gaussian(0.0,1.0)
val  y = gau.sample(10)

getOutliers takes a sequence seq and returns a filtered parallel sequence with the outliers. Q1 and Q2 are the lower and upper quartiles. IQR is the inter-quartile distance. 

In [7]:
def getOutliers(seq: Seq[Double]): ParSeq[Double] = {
    val Q1 = DescriptiveStats.percentile(seq, 0.25)
    val Q3 = DescriptiveStats.percentile(seq, 0.75)
    val IQR = Q3 - Q1
    val d = 1.5 * IQR
    seq.par filter
      (x => ((x > Q3) && ((x - Q3).abs > d)) || ((x<Q1) && (x-Q1).abs>d))
}

nOutliers simply returns the outliers from a gaussian sample of length n.

In [8]:
def nOutliers(n: Int): Int = getOutliers(gau.sample(n)).length

genOutliers takes m the number of desired iterations and xi, length of desired and returns the average number of outliers for the m iterations

In [14]:
def genOutliers(m: Int, xi: Int): Double = 
    sum(Vector.fill(m) (nOutliers(xi)))/m.toDouble

Maps the genOutliers between mn and mx (m iterations for each value between mn and mx)

In [15]:
def mapOutliers(mn: Int, mx: Int, m: Int): Seq[Double] = {
    val xi = (mn to mx).toList
    xi map { xi => genOutliers(m, xi)}
}

Times the functions

In [17]:
def time[A](f: => A) = {
    val s = System.nanoTime
    val ret = f
    println("time: " + (System.nanoTime - s)/1e6 + "ms") 
    ret
}

In [20]:
time(println(mapOutliers(20,100,1000)))

List(0.2, 0.243, 0.261, 0.301, 0.279, 0.291, 0.314, 0.334, 0.316, 0.279, 0.341, 0.341, 0.384, 0.381, 0.374, 0.383, 0.378, 0.372, 0.379, 0.4, 0.428, 0.408, 0.419, 0.46, 0.462, 0.457, 0.421, 0.46, 0.492, 0.494, 0.473, 0.506, 0.485, 0.495, 0.563, 0.552, 0.535, 0.545, 0.544, 0.578, 0.597, 0.59, 0.566, 0.585, 0.566, 0.627, 0.615, 0.593, 0.6, 0.641, 0.677, 0.631, 0.625, 0.636, 0.664, 0.677, 0.694, 0.734, 0.746, 0.724, 0.769, 0.737, 0.727, 0.687, 0.735, 0.712, 0.724, 0.809, 0.739, 0.799, 0.861, 0.796, 0.769, 0.743, 0.745, 0.815, 0.836, 0.84, 0.841, 0.836, 0.803)
time: 12879.233103ms
