In [1]:
%AddDeps org.vegas-viz vegas_2.11 0.3.11 --transitive

Marking org.vegas-viz:vegas_2.11:0.3.11 for download
Obtained 42 files


# Data Exploration with Functional Programming using Jupyter Notebook, Scala and Vegas
## A Statistical Analysis of the Titanic Dataset

Titanic survivor dataset captures the various details of people who survived or not survived in the shipwreck. Using this data, we want to build a model which predicts the propability of someone's survival. It is a classification problem that maps all attributes like sex, fare, age on the most probable state: Survived or not

![Titanic](Titanic.jpg)
(Source: https://commons.wikimedia.org/wiki/RMS_Titanic)


The dataset contains the following attributes (for more information: see Kaggle):

| **Variable** | **Definition**                                | **Key**                                           |
|--------------|-----------------------------------------------|---------------------------------------------------|
| survival     | Survival                                      | 1=Yes, 0= No                                      |
| pclass       | Ticket class                                  | 1 = 1st, 2 = 2nd, 3 = 3rd                         |
| sex          | Sex                                           |                                                   |
| age          | Age                                           | Age in years                                      |
| sibsp        |  # of siblings / spouses aboard   the Titanic |                                                   |
| parch        |  # of parents / children aboard   the Titanic |                                                   |
| ticket       | Ticket number                                 |                                                   |
| fare         | Passenger fare                                |                                                   |
| cabin        | Cabin number                                  |                                                   |
| embarked     | Port of Embarkation                           |  C = Cherbourg, Q = Queenstown,   S = Southampton |


The dataset is splittet into three files:
* A Training Dataset (train.csv)
* A Test Dataset (test.csv)
* A Set which contains sample data for the submission (gender_submission.csv).

At first, we need to load the data creating maps for each set.

In [2]:
import vegas._
import vegas.data.External._
implicit val render = vegas.render.ShowHTML(kernel.display.content("text/html", _))
import java.io.PrintWriter

// Regular Expressions for extracting the information
val DATA_ACCESS_PATTERN_test = """(\d+),(\d),"(.+)",(male|female),([0-9]*\.[0-9]+|[0-9]+|d*),(\d*),(\d*),(.*),([0-9]*\.[0-9]+|[0-9]+|d*),(.*),(\w*)""".r
val DATA_ACCESS_PATTERN_train=  """(\d+),(\d),(\d),"(.+)",(male|female),([0-9]*\.[0-9]+|[0-9]+|d*),(\d*),(\d*),(.*),([0-9]*\.[0-9]+|[0-9]+|d*),(.*),(\w*)""".r
val DATA_ACCESS_PATTERN_surv= """(\d+),(\d)""".r

// Reading text file
// Stores the information in a map consisting of a property name (key) and its value
def loadDataCSV(filename:String):List[Map[String,Any]]= {

  val src = scala.io.Source.fromFile(filename)
  val iter = src.getLines().drop(1) //skip first line
    
    val result= (for (row <- iter) yield readData(row)).toList
   
    src.close
    result.flatMap(_ match{ case p:Option[Map[String,Any]]=>p})
}
  

// Extracting all information storing it into a Map[String,Any]
def readData(line:String):Option[Map[String,Any]]={
    
    def toInteger(key:String,s:String):Option[(String,Int)]={
      
      try{
        Some(key,s.toInt)
      } catch { case e:Exception => None}
    }
    
    def toFloat(key:String,s:String):Option[(String,Float)]={
      
      try{
        Some((key,s.toFloat))
      } catch { case e:Exception => None}
    }
    
    def toStr(key:String, s:String):Option[(String,String)]=
        if (s!="") Some((key,s)) else None

    def createPassengerMap(t1:String,t2:String,t3:String,t4:String,t5:String,t6:String,t7:String,
                           t8:String,t9:String,t10:String,t11:String,t12:String):Option[Map[String,Any]]={
        
        val l=List(
            toInteger("passengerID",t1),
            toInteger("survived",t2),
            toInteger("pclass",t3),
            toStr("name",t4),
            toStr("sex",t5),
            toFloat("age",t6),
            toInteger("sibsp",t7),
            toInteger("parch",t8),
            toStr("ticket",t9),
            toFloat("fare",t10),
            toStr("cabin",t11),
            {if (t12.length>0) Some(("embarked",t12(0))) else None})
         Some(l.flatMap(_ match{ case p:Option[(String,Any)]=>p}).toMap)        
    }
    
    val result = line match{
       case DATA_ACCESS_PATTERN_test(t1,t3,t4,t5,t6,t7,t8,t9,t10,t11,t12) => 
                 createPassengerMap(t1,"-1",t3,t4,t5,t6,t7,t8,t9,t10,t11,t12)
       
       case DATA_ACCESS_PATTERN_train(t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11,t12) => {
                  createPassengerMap(t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11,t12)
       }
       
       case DATA_ACCESS_PATTERN_surv (t1,t2) => {
            val t= (toInteger("passengerID",t1),toInteger("survived",t2))
            t match {
                case (Some(p),Some(s)) => Some(List(p,s).toMap)
                case _ => None
            }
       }
       case _ => println("None:"+line);None
     }
     result
}

// Method for printing a passenger in a readable manner
def printPassenger(p:Map[String,Any]):Unit={
    
    println("\n---------------------------------------------------------------------")
    println("passengerID:"+p.getOrElse("passengerID",-1))
    println("survived:"+p.getOrElse("survived",-1))
    println("pclass:"+p.getOrElse("pclass",-1))
    println("name:"+p.getOrElse("name","-"))
    println("sex:"+p.getOrElse("sex","-"))
    println("age:"+p.getOrElse("age",-1))
    println("sibsp:"+p.getOrElse("sibsp",-1))
    println("parch:"+p.getOrElse("parch",-1))
    println("ticket:"+p.getOrElse("ticket","-"))
    println("fare:"+p.getOrElse("fare",-1))
    println("cabin:"+p.getOrElse("cabin",-1))
    println("embarked:"+p.getOrElse("embarked",'-'))
    println("---------------------------------------------------------------------\n")
}

//def countAllMissingValues(passengers:List[Map[String,Any]],attList:List[String]):Map[String,Int]= ???
 


//produces sometimes an missing argument list error - can be ignored
def applyModel[CLASS,ID](model:(Map[String,Any],String)=> (ID,CLASS), 
            testdata: Seq[Map[String,Any]], idKey:String):Seq[(ID,CLASS)]= {
    
    testdata.map(d => model(d,idKey))
}  

def createSubmitFile[ID,CLASS](filename:String, data:Seq[(ID,CLASS)],header:String):Unit= {
    
    val pw = new PrintWriter(filename)
    pw.println(header)
    data.foreach(e=>pw.println(e._1.toString+","+e._2.toString))
    pw.close
}


render = <function1>
DATA_ACCESS_PATTERN_test = (\d+),(\d),"(.+)",(male|female),([0-9]*\.[0-9]+|[0-9]+|d*),(\d*),(\d*),(.*),([0-9]*\.[0-9]+|[0-9]+|d*),(.*),(\w*)
DATA_ACCESS_PATTERN_train = (\d+),(\d),(\d),"(.+)",(male|female),([0-9]*\.[0-9]+|[0-9]+|d*),(\d*),(\d*),(.*),([0-9]*\.[0-9]+|[0-9]+|d*),(.*),(\w*)
DATA_ACCESS_PATTERN_surv = (\d+),(\d)


loadDataCSV: (filename: String)List[Map[String,Any]]
readData: (line: String)Option[Map[String,Any]]
printPassenger: (p: Map[String,Any])Unit
applyModel: [CLASS, ID](model: (Map[String,Any], String) => (ID, CLASS), testdata: Seq[Map[String,Any]], idKey: String)Seq[(ID, CLASS...


(\d+),(\d)

In [3]:
val train= loadDataCSV("train.csv")
val test= loadDataCSV("test.csv")
val all= train ++ test
  
println("Train Dataset:"+ train.size+" Elements")
println("Test Dataset:"+ test.size+" Elements")
println("whole Dataset:"+ all.size+" Elements")


Train Dataset:891 Elements
Test Dataset:418 Elements
whole Dataset:1309 Elements


train = List(Map(name -> Braund, Mr. Owen Harris, fare -> 7.25, parch -> 0, age -> 22.0, ticket -> A/5 21171, sex -> male, passengerID -> 1, pclass -> 3, sibsp -> 1, embarked -> S, survived -> 0), Map(name -> Cumings, Mrs. John Bradley (Florence Briggs Thayer), fare -> 71.2833, parch -> 0, age -> 38.0, ticket -> PC 17599, cabin -> C85, sex -> female, passengerID -> 2, pclass -> 1, sibsp -> 1, embarked -> C, survived -> 1), Map(name -> Heikkinen, Miss. Laina, fare -> 7.925, parch -> 0, age -> 26.0, ticket -> STON/O2. 3101282, sex -> female, passengerID -> 3, pclass -> 3, sibsp -> 0, embarked -> S, survived -> 1), Map(name -> Futrelle, Mrs. Jacques Heath (Lily May Peel), fare -> 53.1, parch -> 0, age -> 35.0, ticket -> 113803, cabin -> C123, sex -> female, passenger...


<console>:67: error: missing argument list for method applyModel
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `applyModel _` or `applyModel(_,_,_)` instead of `applyModel`.
       applyModel
       ^
lastException: Throwable = null


List(Map(name -> Braund, Mr. Owen Harris, fare -> 7.25, parch -> 0, age -> 22.0, ticket -> A/5 21171, sex -> male, passengerID -> 1, pclass -> 3, sibsp -> 1, embarked -> S, survived -> 0), Map(name -> Cumings, Mrs. John Bradley (Florence Briggs Thayer), fare -> 71.2833, parch -> 0, age -> 38.0, ticket -> PC 17599, cabin -> C85, sex -> female, passengerID -> 2, pclass -> 1, sibsp -> 1, embarked -> C, survived -> 1), Map(name -> Heikkinen, Miss. Laina, fare -> 7.925, parch -> 0, age -> 26.0, ticket -> STON/O2. 3101282, sex -> female, passengerID -> 3, pclass -> 3, sibsp -> 0, embarked -> S, survived -> 1), Map(name -> Futrelle, Mrs. Jacques Heath (Lily May Peel), fare -> 53.1, parch -> 0, age -> 35.0, ticket -> 113803, cabin -> C123, sex -> female, passenger...

Now we can examine a small sample of the data set

In [4]:
all.take(2).foreach(printPassenger)


---------------------------------------------------------------------
passengerID:1
survived:0
pclass:3
name:Braund, Mr. Owen Harris
sex:male
age:22.0
sibsp:1
parch:0
ticket:A/5 21171
fare:7.25
cabin:-1
embarked:S
---------------------------------------------------------------------


---------------------------------------------------------------------
passengerID:2
survived:1
pclass:1
name:Cumings, Mrs. John Bradley (Florence Briggs Thayer)
sex:female
age:38.0
sibsp:1
parch:0
ticket:PC 17599
fare:71.2833
cabin:C85
embarked:C
---------------------------------------------------------------------



Count the missing values in a passenger set.

In [5]:
val attList= List("passengerID","pclass","survived","name","sex","age","sibsp","parch",
        "ticket","fare","cabin","embarked")

  
def countAllMissingValues(data:List[Map[String,Any]],attList:List[String]):Map[String,Int]= {
    
    attList.map{
        y =>
        
       ( y ,  ( data.count(x => (!x.keySet.exists(_ == y))) ) ) 
        
    }.toMap
}

val train_mv= countAllMissingValues(train,attList)
val test_mv= countAllMissingValues(test,attList)
assert(train_mv("cabin")== 687 && train_mv("age")==177 && train_mv("embarked")== 2)
assert(test_mv("cabin")== 327 && test_mv("age")==86 && test_mv("fare")== 1)

attList = List(passengerID, pclass, survived, name, sex, age, sibsp, parch, ticket, fare, cabin, embarked)
train_mv = Map(name -> 0, fare -> 0, parch -> 0, age -> 177, ticket -> 0, cabin -> 687, sex -> 0, passengerID -> 0, pclass -> 0, sibsp -> 0, embarked -> 2, survived -> 0)
test_mv = Map(name -> 0, fare -> 1, parch -> 0, age -> 86, ticket -> 0, cabin -> 327, sex -> 0, passengerID -> 0, pclass -> 0, sibsp -> 0, embarked -> 0, survived -> 0)


countAllMissingValues: (data: List[Map[String,Any]], attList: List[String])Map[String,Int]


Map(name -> 0, fare -> 1, parch -> 0, age -> 86, ticket -> 0, cabin -> 327, sex -> 0, passengerID -> 0, pclass -> 0, sibsp -> 0, embarked -> 0, survived -> 0)

In [6]:
//family's death rate

val train_fun = train.map(x =>  {
    val name = x.map(y => y._1 match {
        case "name" => if (y._2!="") (y._2) else None
        case _ => "boo"
    }).head.asInstanceOf[String].split(", ")
     x.updated("name", name(0))
} )


Vegas("Passengers splitted by family" ).
    withData(train_fun).
    mark(Bar).
    addTransform("survival", "datum.survived == 0 ? \"No\" : \"Yes\"").
    encodeY("passengerID", Quantitative,AggOps.Count,axis=Axis(title="Passengers")).
    encodeX("name", Ordinal, sortField=Sort("survival", AggOps.Count, SortOrder.Desc)).
    encodeColor("survival", Nominal, scale=Scale(rangeNominals=List("#EA98D2", "#659CCA"))).
    show



train_fun = List(Map(name -> Braund, fare -> 7.25, parch -> 0, age -> 22.0, ticket -> A/5 21171, sex -> male, passengerID -> 1, pclass -> 3, sibsp -> 1, embarked -> S, survived -> 0), Map(name -> Cumings, fare -> 71.2833, parch -> 0, age -> 38.0, ticket -> PC 17599, cabin -> C85, sex -> female, passengerID -> 2, pclass -> 1, sibsp -> 1, embarked -> C, survived -> 1), Map(name -> Heikkinen, fare -> 7.925, parch -> 0, age -> 26.0, ticket -> STON/O2. 3101282, sex -> female, passengerID -> 3, pclass -> 3, sibsp -> 0, embarked -> S, survived -> 1), Map(name -> Futrelle, fare -> 53.1, parch -> 0, age -> 35.0, ticket -> 113803, cabin -> C123, sex -> female, passengerID -> 4, pclass -> 1, sibsp -> 1, embarked -> S, survived -> 1), Map(name -> Al...


List(Map(name -> Braund, fare -> 7.25, parch -> 0, age -> 22.0, ticket -> A/5 21171, sex -> male, passengerID -> 1, pclass -> 3, sibsp -> 1, embarked -> S, survived -> 0), Map(name -> Cumings, fare -> 71.2833, parch -> 0, age -> 38.0, ticket -> PC 17599, cabin -> C85, sex -> female, passengerID -> 2, pclass -> 1, sibsp -> 1, embarked -> C, survived -> 1), Map(name -> Heikkinen, fare -> 7.925, parch -> 0, age -> 26.0, ticket -> STON/O2. 3101282, sex -> female, passengerID -> 3, pclass -> 3, sibsp -> 0, embarked -> S, survived -> 1), Map(name -> Futrelle, fare -> 53.1, parch -> 0, age -> 35.0, ticket -> 113803, cabin -> C123, sex -> female, passengerID -> 4, pclass -> 1, sibsp -> 1, embarked -> S, survived -> 1), Map(name -> Al...

In [7]:
val passengers= train.size
val survivedPass= (train.filter(m=>m("survived")==1)).size
val rate= survivedPass.toDouble/passengers
println("propability of surviving:"+rate)

Vegas("Passengers classified by survival" ).
    withData(train).
    mark(Bar).
    addTransform("survival", "datum.survived == 0 ? \"Dead\" : \"Alive\"").
    encodeX("survival", Ordinal,axis=Axis(title="Survival")).
    encodeY("passengerID", Quantitative,AggOps.Count,axis=Axis(title="Passengers")).show

propability of surviving:0.3838383838383838


passengers = 891
survivedPass = 342
rate = 0.3838383838383838


0.3838383838383838

In [8]:
Vegas("Survival splitted by sex").
      withData(train).
      mark(Bar).
      addTransform("survival", "datum.survived == 0 ? \"No\" : \"Yes\"").
      encodeY("passengerID",Quantitative, AggOps.Count, axis=Axis(title="Passengers")).
      encodeX("sex", Ord).
      encodeColor("survival", Nominal, scale=Scale(rangeNominals=List("#EA98D2", "#659CCA"))).
      show

In [9]:
Vegas("Survival splitted by sex").
      withData(train).
      mark(Bar).
      addTransform("survival", "datum.survived == 0 ? \"No\" : \"Yes\"").
      encodeY("passengerID",Quantitative, AggOps.Count, axis=Axis(title="Passengers")).
      encodeX("sex", Ord).
      encodeColor("survival", Nominal, scale=Scale(rangeNominals=List("#EA98D2", "#659CCA"))).
      configMark(stacked = StackOffset.Normalize).
      show

In [151]:

val train_delay = List[Map[String,Any]](
    
Map("day" -> "weekday", "season" -> "spring", "wind" -> "none", "rain" -> "none", "class" -> "on time"),
Map("day" -> "weekday", "season" -> "winter", "wind" -> "none", "rain" -> "slight", "class" -> "on time"),
Map("day" -> "weekday", "season" -> "winter", "wind" -> "none", "rain" -> "slight", "class" -> "on time"),
Map("day" -> "weekday", "season" -> "winter", "wind" -> "high", "rain" -> "heavy", "class" -> "late"),
Map("day" -> "saturday", "season" -> "summer", "wind" -> "normal", "rain" -> "none", "class" -> "on time"),
Map("day" -> "weekday", "season" -> "autumn", "wind" -> "normal", "rain" -> "none", "class" -> "very late"),
Map("day" -> "holiday", "season" -> "summer", "wind" -> "high", "rain" -> "slight", "class" -> "on time"),
Map("day" -> "sunday", "season" -> "summer", "wind" -> "normal", "rain" -> "none", "class" -> "on time"),
Map("day" -> "weekday", "season" -> "winter", "wind" -> "high", "rain" -> "heavy", "class" -> "very late"),
Map("day" -> "weekday", "season" -> "summer", "wind" -> "none", "rain" -> "slight", "class" -> "on time"),
Map("day" -> "saturday", "season" -> "spring", "wind" -> "high", "rain" -> "heavy", "class" -> "cancelled"),
Map("day" -> "weekday", "season" -> "summer", "wind" -> "high", "rain" -> "slight", "class" -> "on time"),
Map("day" -> "saturday", "season" -> "winter", "wind" -> "normal", "rain" -> "none", "class" -> "late"),
Map("day" -> "weekday", "season" -> "summer", "wind" -> "high", "rain" -> "none", "class" -> "on time"),
Map("day" -> "weekday", "season" -> "winter", "wind" -> "normal", "rain" -> "heavy", "class" -> "very late"),
Map("day" -> "saturday", "season" -> "autumn", "wind" -> "high", "rain" -> "slight", "class" -> "on time"),
Map("day" -> "weekday", "season" -> "autumn", "wind" -> "none", "rain" -> "heavy", "class" -> "on time"),
Map("day" -> "holiday", "season" -> "spring", "wind" -> "normal", "rain" -> "slight", "class" -> "on time"),
Map("day" -> "weekday", "season" -> "spring", "wind" -> "normal", "rain" -> "none", "class" -> "on time"),
Map("day" -> "weekday", "season" -> "spring", "wind" -> "normal", "rain" -> "slight", "class" -> "on time")
    
)


List(Map(season -> spring, rain -> none, wind -> none, class -> on time, day -> weekday), Map(season -> winter, rain -> slight, wind -> none, class -> on time, day -> weekday), Map(season -> winter, rain -> slight, wind -> none, class -> on time, day -> weekday), Map(season -> winter, rain -> heavy, wind -> high, class -> late, day -> weekday), Map(season -> summer, rain -> none, wind -> normal, class -> on time, day -> saturday), Map(season -> autumn, rain -> none, wind -> normal, class -> very late, day -> weekday), Map(season -> summer, rain -> slight, wind -> high, class -> on time, day -> holiday), Map(season -> summer, rain -> none, wind -> normal, class -> on time, day -> sunday), Map(season -> winter, rain -> heavy, wind -> high, class -> very late, day -> weekday), Map(season -> summer, rain -> slight, wind -> none, class -> on time, day -> weekday), Map(season -> spring, rain -> heavy, wind -> high, class -> cancelled, day -> saturday), Map(season -> summer, rain -> slight, w

train_delay = List(Map(season -> spring, rain -> none, wind -> none, class -> on time, day -> weekday), Map(season -> winter, rain -> slight, wind -> none, class -> on time, day -> weekday), Map(season -> winter, rain -> slight, wind -> none, class -> on time, day -> weekday), Map(season -> winter, rain -> heavy, wind -> high, class -> late, day -> weekday), Map(season -> summer, rain -> none, wind -> normal, class -> on time, day -> saturday), Map(season -> autumn, rain -> none, wind -> normal, class -> very late, day -> weekday), Map(season -> summer, rain -> slight, wind -> high, class -> on time, day -> holiday), Map(season -> summer, rain -> none, wind -> normal, class -> on time, day -> sunday), Map(season -> winter, rain -> heavy, wind -> high, class -> ver...


List(Map(season -> spring, rain -> none, wind -> none, class -> on time, day -> weekday), Map(season -> winter, rain -> slight, wind -> none, class -> on time, day -> weekday), Map(season -> winter, rain -> slight, wind -> none, class -> on time, day -> weekday), Map(season -> winter, rain -> heavy, wind -> high, class -> late, day -> weekday), Map(season -> summer, rain -> none, wind -> normal, class -> on time, day -> saturday), Map(season -> autumn, rain -> none, wind -> normal, class -> very late, day -> weekday), Map(season -> summer, rain -> slight, wind -> high, class -> on time, day -> holiday), Map(season -> summer, rain -> none, wind -> normal, class -> on time, day -> sunday), Map(season -> winter, rain -> heavy, wind -> high, class -> ver...

In [493]:
//implementation of the naive bayes algorithm

//class name 
val className = "class"

// the total number of classes 
val numberOfClasses = train_delay.map(x => x(className)).distinct.toList

//the total size of the dataset
val totalNumberOfInstances = train_delay.size.toFloat

//get the total number of instances for which className = c
def instance_occ(c:Any):Float= (for (x <- train_delay if (x(className) == c)) yield x(className)).size.toFloat

def attribute_occ(inst:String, a:Any, c:Any):Float=(for (x <- train_delay if (x(inst) == a && x(className) == c)) yield x(inst)).size.toFloat

//get all the unique attributes: ex. List(season, rain, wind, day)
val atts: List[String] = train_delay.map(x => x.keys.toList).take(1)(0).filter(_ != className)


//get all unique attributes and their instances ex. List((season,List(spring, winter, summer, autumn)), ...
val allAtts: List[(String, List[Any])] = atts.map(x => {
  val attVals =  train_delay.map(y => y(x) ).distinct.toList   
    (x,attVals)
}).toList



className = class
numberOfClasses = List(on time, late, very late, cancelled)
totalNumberOfInstances = 20.0
atts = List(season, rain, wind, day)
allAtts = List((season,List(spring, winter, summer, autumn)), (rain,List(none, slight, heavy)), (wind,List(none, high, normal)), (day,List(weekday, saturday, holiday, sunday)))


instance_occ: (c: Any)Float
attribute_occ: (inst: String, a: Any, c: Any)Float


List((season,List(spring, winter, summer, autumn)), (rain,List(none, slight, heavy)), (wind,List(none, high, normal)), (day,List(weekday, saturday, holiday, sunday)))

In [537]:
//training algorithm

val train_dataset = numberOfClasses.map(c => {
    
    val p = (instance_occ(c) / totalInstances).toFloat
    (( c, p),(
       allAtts.map( x =>  (x._1, x._2.map(y => (y, attribute_occ(x._1, y, c) / instance_occ(c)  )).toMap)).toMap).toList)  

})



on time
late
very late
cancelled


train_dataset = List(((on time,0.7),List((season,Map(spring -> 0.2857143, winter -> 0.14285715, summer -> 0.42857143, autumn -> 0.14285715)), (rain,Map(none -> 0.35714287, slight -> 0.5714286, heavy -> 0.071428575)), (wind,Map(none -> 0.35714287, high -> 0.2857143, normal -> 0.35714287)), (day,Map(weekday -> 0.64285713, saturday -> 0.14285715, holiday -> 0.14285715, sunday -> 0.071428575)))), ((late,0.1),List((season,Map(spring -> 0.0, winter -> 1.0, summer -> 0.0, autumn -> 0.0)), (rain,Map(none -> 0.5, slight -> 0.0, heavy -> 0.5)), (wind,Map(none -> 0.0, high -> 0.5, normal -> 0.5)), (day,Map(weekday -> 0.5, saturday -> 0.5, holiday -> 0.0, sunday -> 0.0)))), ((very late,0.15),List((season,Map(spring -> ...


List(((on time,0.7),List((season,Map(spring -> 0.2857143, winter -> 0.14285715, summer -> 0.42857143, autumn -> 0.14285715)), (rain,Map(none -> 0.35714287, slight -> 0.5714286, heavy -> 0.071428575)), (wind,Map(none -> 0.35714287, high -> 0.2857143, normal -> 0.35714287)), (day,Map(weekday -> 0.64285713, saturday -> 0.14285715, holiday -> 0.14285715, sunday -> 0.071428575)))), ((late,0.1),List((season,Map(spring -> 0.0, winter -> 1.0, summer -> 0.0, autumn -> 0.0)), (rain,Map(none -> 0.5, slight -> 0.0, heavy -> 0.5)), (wind,Map(none -> 0.0, high -> 0.5, normal -> 0.5)), (day,Map(weekday -> 0.5, saturday -> 0.5, holiday -> 0.0, sunday -> 0.0)))), ((very late,0.15),List((season,Map(spring -> ...

In [802]:
//classify algorithm

val prsPro = List("weekday", "winter", "high", "heavy")


val res = numberOfClasses.map(c => {
  
    train_dataset.map(x =>  (c, if (x._1._1 == c)  x._1._2 * x._2.map(y => {
        
   val bo =  y._2.map({ 
   
       case (k,v) => if(prsPro.contains(k)) v  else 1 
       case _ => 0
   
   }).reduce(_*_)
        
    bo
    
    }).reduce(_*_) 
                   else None 
                            ))    
    
}).filterNot(_.isEmpty).flatten

prsPro = List(weekday, winter, high, heavy)
res = List((on time,0.0013119535), (on time,None), (on time,None), (on time,None), (late,None), (late,0.0125), (late,None), (late,None), (very late,None), (very late,None), (very late,0.022222225), (very late,None), (cancelled,None), (cancelled,None), (cancelled,None), (cancelled,0.0))


List((on time,0.0013119535), (on time,None), (on time,None), (on time,None), (late,None), (late,0.0125), (late,None), (late,None), (very late,None), (very late,None), (very late,0.022222225), (very late,None), (cancelled,None), (cancelled,None), (cancelled,None), (cancelled,0.0))