# Scala's Collection Library

In [1]:
import $ivy.`org.scalatest::scalatest:3.0.8`
import _root_.org.scalatest._

[32mimport [39m[36m$ivy.$[39m
[32mimport [39m[36m_root_.org.scalatest._[39m

### References

A gentle guide to the Scala's collection library can be found online in the Scala site:

https://docs.scala-lang.org/overviews/collections-2.13/introduction.html

[__Programming in Scala, 
A comprehensive step-by-step guide__](https://www.artima.com/shop/programming_in_scala_3ed) Third Edition.
by Martin Odersky, Lex Spoon, and Bill Venners. 

- Chapter 11. Scala's Hierarchy 
- Chapter 17. Working with Other Collections
- Chapter 24. Collections in Depth
- Chapter 25. The Architecture of Scala Collections

[__Functional programming in Scala__](https://www.manning.com/books/functional-programming-in-scala), by Paul Chiusano and Runar Bjarnason.

- Chapter 3. Functional data structures

[__Functional programming simplified__](https://alvinalexander.com/downloads/fpsimplified-free-preview.pdf), by Alvin Alexander.

- Chapters 29-36. Recursion.

## Collection types

Collection types can be classified along two major dimensions:
- Mutable/Immutable: mutable collections can be modified in-place; immutable collections return a new instance when they are updated.
- Sequences/Sets/Maps. These are ordered/unordered collections and key-value collections, respectively. 

There are many different implementations of sequences, sets and maps, as summarized by these figures (cf. Scala's [guide](https://docs.scala-lang.org/overviews/collections-2.13/overview.html))

![](images/collections-hierarchy.svg)

![](images/collections-hierarchy-inmutable.svg)

![](images/collections-mutable.svg)

How do we choose from this huge number of collection types and implementations? The colletion type is chosen according to the functionality that we demand from our collection, e.g., should it be _ordered_?, can there be repeated elements?, are elements associated with keys?, etc. In order to choose the right implementation of the collection type (e.g. should I use a `Vector` or a `List`?) we must take into account the complexity of the corresponding implementation: 

https://docs.scala-lang.org/overviews/collections-2.13/performance-characteristics.html

From now on, we will work with immutable collection types only.

## `Map`s of key-value pairs

`Map`s are key-value collections (i.e. they are like sets of key-value pairs indexed by their keys).

In [None]:
val m: Map[String, Int] = 
    Map(("a", 5), ("b", 10), ("c", 11))

Specially for key-value pairs, we commonly write `Tuple2` values with the following syntax: 

In [2]:
val m: Map[String, Int] = 
    Map("a" -> 5, "b" -> 10, "c" -> 11)

[36mm[39m: [32mMap[39m[[32mString[39m, [32mInt[39m] = [33mMap[39m([32m"a"[39m -> [32m5[39m, [32m"b"[39m -> [32m10[39m, [32m"c"[39m -> [32m11[39m)

Common operations on maps: 

In [3]:
// retrieving values of existing and non-existing keys

m("a"): Int
m.get("a"): Option[Int]
try{ m("d") } catch { case _ => "exception raised" }
m.get("d")

val res3_2 = try{ m("d") } catch { case _ => "exception raised" }
                                        ^


[36mres3_0[39m: [32mInt[39m = [32m5[39m
[36mres3_1[39m: [32mOption[39m[[32mInt[39m] = [33mSome[39m(value = [32m5[39m)
[36mres3_2[39m: [32mAny[39m = [32m"exception raised"[39m
[36mres3_3[39m: [32mOption[39m[[32mInt[39m] = [32mNone[39m

In [4]:
// retrieving all keys and values

m.keys: Iterable[String]
m.values: Iterable[Int]

[36mres4_0[39m: [32mIterable[39m[[32mString[39m] = [33mSet[39m([32m"a"[39m, [32m"b"[39m, [32m"c"[39m)
[36mres4_1[39m: [32mIterable[39m[[32mInt[39m] = [33mIterable[39m([32m5[39m, [32m10[39m, [32m11[39m)

The default implementation types of these `Iterable`s are `collection.immutable.Set` and the general `View` type. Views are _lazy_ collections which offer better performance. For more information on views consult the Scala [guide](https://docs.scala-lang.org/overviews/collections-2.13/views.html). The only thing that we need from views and the general `Iterable` collection type is that they can be converted to a concrete collection type with conversors `toList`, `toSet`, `toMap`, etc. 

In [5]:
// Converting to proper implementation types
m.keys.toList
m.keys.toSet
m.values.toList
m.values.toSet

[36mres5_0[39m: [32mList[39m[[32mString[39m] = [33mList[39m([32m"a"[39m, [32m"b"[39m, [32m"c"[39m)
[36mres5_1[39m: [32mSet[39m[[32mString[39m] = [33mSet[39m([32m"a"[39m, [32m"b"[39m, [32m"c"[39m)
[36mres5_2[39m: [32mList[39m[[32mInt[39m] = [33mList[39m([32m5[39m, [32m10[39m, [32m11[39m)
[36mres5_3[39m: [32mSet[39m[[32mInt[39m] = [33mSet[39m([32m5[39m, [32m10[39m, [32m11[39m)

We can also convert optional values to collection types: 

In [6]:
Some(1).toSet
Some(0).toList
None.toSet
None.toList

[36mres6_0[39m: [32mSet[39m[[32mInt[39m] = [33mSet[39m([32m1[39m)
[36mres6_1[39m: [32mList[39m[[32mInt[39m] = [33mList[39m([32m0[39m)
[36mres6_2[39m: [32mSet[39m[[32mNothing[39m] = [33mSet[39m()
[36mres6_3[39m: [32mList[39m[[32mNothing[39m] = [33mList[39m()

In [9]:
// Mapping values (toMap is required to convert the view type `MapView` to `Map`)

val m2: Map[String, Boolean] = 
    m.mapValues((value: Int) => value % 2 == 0).toMap

1 deprecation (since 2.13.0); re-run with -deprecation for details


[36mm2[39m: [32mMap[39m[[32mString[39m, [32mBoolean[39m] = [33mMap[39m([32m"a"[39m -> [32mfalse[39m, [32m"b"[39m -> [32mtrue[39m, [32m"c"[39m -> [32mfalse[39m)

In [10]:
// Mapping whole entries, not only values

m.map((entry: (String, Int)) => entry._2 % 2 == 0)
m.map{ case (key: String, value: Int) => value % 2 == 0 }

[36mres10_0[39m: [32mcollection[39m.[32mimmutable[39m.[32mIterable[39m[[32mBoolean[39m] = [33mList[39m([32mfalse[39m, [32mtrue[39m, [32mfalse[39m)
[36mres10_1[39m: [32mcollection[39m.[32mimmutable[39m.[32mIterable[39m[[32mBoolean[39m] = [33mList[39m([32mfalse[39m, [32mtrue[39m, [32mfalse[39m)

In [15]:
// Filter also return a general iterable (a list, by default)
m.filter{ case (key: String, value: Int) => value > 10 }

[36mres15[39m: [32mMap[39m[[32mString[39m, [32mInt[39m] = [33mMap[39m([32m"c"[39m -> [32m11[39m)

We can obtain a `Map` from a list of pairs with `toMap`:

In [16]:
val l: List[(Int, String)] =  
    List((1, "a"), (2, "b"), (1, "c"), (2, "d"), (3, "a"))

l.toMap

[36ml[39m: [32mList[39m[([32mInt[39m, [32mString[39m)] = [33mList[39m(([32m1[39m, [32m"a"[39m), ([32m2[39m, [32m"b"[39m), ([32m1[39m, [32m"c"[39m), ([32m2[39m, [32m"d"[39m), ([32m3[39m, [32m"a"[39m))
[36mres16_1[39m: [32mMap[39m[[32mInt[39m, [32mString[39m] = [33mMap[39m([32m1[39m -> [32m"c"[39m, [32m2[39m -> [32m"d"[39m, [32m3[39m -> [32m"a"[39m)

Note that only one value is kept for a single key. If we want all values, we can use `groupBy`:

In [72]:
l.groupBy(entry => entry._1)

[36mres72[39m: [32mMap[39m[[32mInt[39m, [32mList[39m[([32mInt[39m, [32mString[39m)]] = [33mHashMap[39m(
  [32m1[39m -> [33mList[39m(([32m1[39m, [32m"a"[39m), ([32m1[39m, [32m"c"[39m)),
  [32m2[39m -> [33mList[39m(([32m2[39m, [32m"b"[39m), ([32m2[39m, [32m"d"[39m)),
  [32m3[39m -> [33mList[39m(([32m3[39m, [32m"a"[39m))
)

# Set collections

`Set`s are unordered collections of unique elements. The following two sets are equal: 

In [20]:
Set(1,2,2,3) == Set(3,1,2)
// compare with lists:
List(1,2,2,3) == List(3,1,2)

[36mres20_0[39m: [32mBoolean[39m = [32mtrue[39m
[36mres20_1[39m: [32mBoolean[39m = [32mfalse[39m

Common operations on sets: 

In [21]:
// Filtering elements
Set(-1, 0, 1, 3, -5, 2, 4).filter(e => e > 0)

[36mres21[39m: [32mSet[39m[[32mInt[39m] = [33mHashSet[39m([32m1[39m, [32m2[39m, [32m3[39m, [32m4[39m)

In [22]:
// Mapping elements
Set(-1,-4,-3,5).map(e => e.abs)

[36mres22[39m: [32mSet[39m[[32mInt[39m] = [33mSet[39m([32m1[39m, [32m4[39m, [32m3[39m, [32m5[39m)

In [24]:
// Flatmapping elements
Set(1,2,3).map(e => Set(e,-e))
Set(1,2,3).flatMap(e => Set(e,-e))

[36mres24_0[39m: [32mSet[39m[[32mSet[39m[[32mInt[39m]] = [33mSet[39m([33mSet[39m([32m1[39m, [32m-1[39m), [33mSet[39m([32m2[39m, [32m-2[39m), [33mSet[39m([32m3[39m, [32m-3[39m))
[36mres24_1[39m: [32mSet[39m[[32mInt[39m] = [33mHashSet[39m([32m-3[39m, [32m1[39m, [32m2[39m, [32m3[39m, [32m-1[39m, [32m-2[39m)

In [25]:
// Common set operations
Set(1,2,3).subsetOf(Set(1,2,3,4))
Set(1,2,3) subsetOf Set(1,2)
Set(1,2,3) diff Set(1,2)
Set(1) diff Set(3,4)
Set(1,2,3) union Set(1,2,3,4,5)

[36mres25_0[39m: [32mBoolean[39m = [32mtrue[39m
[36mres25_1[39m: [32mBoolean[39m = [32mfalse[39m
[36mres25_2[39m: [32mSet[39m[[32mInt[39m] = [33mSet[39m([32m3[39m)
[36mres25_3[39m: [32mSet[39m[[32mInt[39m] = [33mSet[39m([32m1[39m)
[36mres25_4[39m: [32mSet[39m[[32mInt[39m] = [33mHashSet[39m([32m5[39m, [32m1[39m, [32m2[39m, [32m3[39m, [32m4[39m)

## Lists as multisets

Multisets (or bags) are unordered collections like sets, but each element may occur repeated times. The collections library does not have a class to represent multisets directly, but we can approximate them with lists (with any sequence, actually). 

In [27]:
// multiset operations
assert(List(1,1,3,3,3,2).diff(List(3,4,1,2)) == List(1,3,3))
assert((List(1,3,3) union List(0,0,1,1,3)) == List(1,3,3,0,0,1,1,3))
assert((List(1,3,3) intersect List(0,0,1,1,3)) == List(1, 3))

1 deprecation (since 2.13.0); re-run with -deprecation for details


We can remove repeated occurrences with `distinct`:

In [28]:
List(1,1,1,3,2,2,3).distinct 

[36mres28[39m: [32mList[39m[[32mInt[39m] = [33mList[39m([32m1[39m, [32m3[39m, [32m2[39m)

## Implementing & querying data models

The Scala collections greatly facilitate the implementation and querying of data models. For instance, the following classes model the structure of an organization which consists of departments, employees and tasks that employees can perform. 

In [29]:
// departments

case class Department(id: Department.Id)
object Department{
    type Id = String
}

// tasks 

case class Task(id: Task.Id, hours: Int)
object Task{
    type Id = String
}

// Employees

case class Employee(id: Employee.Id, dpt: Department.Id)
object Employee{
    type Id = String
}

// The whole organization

case class Organization(
    departments: Map[Department.Id, Department], 
    tasks: Map[Task.Id, Task],
    employees: Map[Employee.Id, Employee], 
    knows: List[(Employee.Id, Task.Id)])

defined [32mclass[39m [36mDepartment[39m
defined [32mobject[39m [36mDepartment[39m
defined [32mclass[39m [36mTask[39m
defined [32mobject[39m [36mTask[39m
defined [32mclass[39m [36mEmployee[39m
defined [32mobject[39m [36mEmployee[39m
defined [32mclass[39m [36mOrganization[39m

This implementation is an example of a _flat_ data model. The key feature of these kinds of models are that the different entities (employees, departments and tasks, in this case) refer to each other by using _keys_. This is a possible instance of the organization data model:

In [58]:
val org: Organization = Organization(
    Map(
        "Product"  -> Department("Product"),
        "Quality"  -> Department("Quality"),
        "Research" -> Department("Research"),
        "Sales"    -> Department("Sales")),
    
    Map("build"    -> Task("build", 3), 
        "abstract" -> Task("abstract", 5), 
        "design"   -> Task("design", 2),
        "call"     -> Task("call", 1),
        "program"  -> Task("program", 3)),
    
    Map("Alex"     -> Employee("Alex", "Product"), 
        "Bert"     -> Employee("Bert", "Product"), 
        "Cora"     -> Employee("Cora", "Research"), 
        "Drew"     -> Employee("Drew", "Research"), 
        "Edna"     -> Employee("Edna", "Research"), 
        "Fred"     -> Employee("Fred", "Sales")),
    
    List(
        ("Alex", "build"),
        ("Bert", "build"),
        ("Cora", "abstract"),
        ("Cora", "build"),
        ("Cora", "design"),
        ("Drew", "abstract"),
        ("Drew", "design"),
        ("Edna", "abstract"),
        ("Edna", "call"),
        ("Edna", "design"),
        ("Fred", "call")))

[36morg[39m: [32mOrganization[39m = [33mOrganization[39m(
  departments = [33mMap[39m(
    [32m"Product"[39m -> [33mDepartment[39m(id = [32m"Product"[39m),
    [32m"Quality"[39m -> [33mDepartment[39m(id = [32m"Quality"[39m),
    [32m"Research"[39m -> [33mDepartment[39m(id = [32m"Research"[39m),
    [32m"Sales"[39m -> [33mDepartment[39m(id = [32m"Sales"[39m)
  ),
  tasks = [33mHashMap[39m(
    [32m"program"[39m -> [33mTask[39m(id = [32m"program"[39m, hours = [32m3[39m),
    [32m"design"[39m -> [33mTask[39m(id = [32m"design"[39m, hours = [32m2[39m),
    [32m"abstract"[39m -> [33mTask[39m(id = [32m"abstract"[39m, hours = [32m5[39m),
    [32m"build"[39m -> [33mTask[39m(id = [32m"build"[39m, hours = [32m3[39m),
    [32m"call"[39m -> [33mTask[39m(id = [32m"call"[39m, hours = [32m1[39m)
  ),
  employees = [33mHashMap[39m(
    [32m"Alex"[39m -> [33mEmployee[39m(id = [32m"Alex"[39m, dpt = [32m"Product"[39m),
   

Flat data models are actually very close to the common _relational_ data models used in SQL persistent stores. This is the equivalent relational model of the organization database: 

![](images/relational-model.png)

According to this mapping: 
- The `Organization` class represents the whole relational _database_.
- Members of this class correspond to the different _tables_ of the database, represented as `Map`s or simple `Set`s. We have four tables: the table of departments, employees, tasks, and a table which stores which tasks employees can perform.
- The key type of `Map` can be understood as the primary key of the relational table. The value type specifies the columns of the table. By convention, the identifier type is defined by the `Id` type alias in the companion object of the value type. For instance, the `employees` table is indexed by the employee identifier (a string value), and stores the department to which the employee belongs to.
- If the primary key consists of several keys, as in the `knows` table, we use tuples. 
- If the table just consists of the key (simple or composed) we use `Set` instead of `Map` (as the `knows` table also illustrates).



### Basic queries

Complex queries typically builds upon basic queries which are directly related to the structure of the data model. In particular, they are identified from the primary key and foreign-key relations in the relational model. In the organizational database we can identify the following queries:

In [35]:

object BasicQueries{

    // Entities
    
    def departments(org: Organization): List[Department] = 
        org.departments.values.toList

    def departmentIds(org: Organization): List[Department.Id] = 
        org.departments.keys.toList
    
    def getDepartment(id: Department.Id)(org: Organization): List[Department] = 
        org.departments.get(id).toList

    def employees(org: Organization): List[Employee] = 
        org.employees.values.toList

    def employeeIds(org: Organization): List[Employee.Id] = 
        org.employees.keys.toList

    def getEmployee(id: Employee.Id)(org: Organization): List[Employee] = 
        org.employees.get(id).toList

    def tasks(org: Organization): List[Task] = 
        org.tasks.values.toList

    def taskIds(org: Organization): List[Task.Id] = 
        org.tasks.keys.toList

    def getTask(id: Task.Id)(org: Organization): List[Task] = 
        org.tasks.get(id).toList
    
    // 1-N relationships
    
    def employeeIds(dpt: Department.Id)(org: Organization): List[Employee.Id] = 
        org.employees.filter(_._2.dpt == dpt).map(_._1).toList
    
    // N-M relationships
    
    def performerIds(tsk: Task.Id)(org: Organization): List[Employee.Id] = 
        org.knows.filter(_._2 == tsk).map(_._1)

    def capabilities(emp: Employee.Id)(org: Organization): List[Task.Id] = 
        org.knows.filter(_._1 == emp).map(_._2)
}

import BasicQueries._

defined [32mobject[39m [36mBasicQueries[39m
[32mimport [39m[36mBasicQueries._[39m

In [39]:
employeeIds("Product")(org)
org.employees.filter(_._2.dpt == "Product").map(_._1)

[36mres39_0[39m: [32mList[39m[[32mEmployee[39m.[32mId[39m] = [33mList[39m([32m"Alex"[39m, [32m"Bert"[39m)
[36mres39_1[39m: [32mcollection[39m.[32mimmutable[39m.[32mIterable[39m[[32mEmployee[39m.[32mId[39m] = [33mList[39m([32m"Alex"[39m, [32m"Bert"[39m)

### Sample queries

__Which are the tasks of the organization which can't be performed by any employee?__

In [40]:
class TestImpossibleTasks(
    impossibleTasks: Organization => List[Task.Id]
) extends FlatSpec with Matchers{
    
    "impossibleTasks" should "work" in {
        impossibleTasks(org) shouldBe 
            List("program")
    }
}

defined [32mclass[39m [36mTestImpossibleTasks[39m

This is a conventional imperative implementation, using mutable variables:

In [41]:
import collection.mutable.ListBuffer

def impossibleTasks(org: Organization): List[Task.Id] = {
    var impTasks: ListBuffer[Task.Id] = 
        ListBuffer.from(taskIds(org))
    for (entry <- org.knows)
        impTasks -= entry._2
//        impTasks = impTasks subtractOne entry._2
    impTasks.toList
}

[32mimport [39m[36mcollection.mutable.ListBuffer[39m
defined [32mfunction[39m [36mimpossibleTasks[39m

In [42]:
run(new TestImpossibleTasks(impossibleTasks))

[32mcell40$Helper$TestImpossibleTasks:[0m
[32mimpossibleTasks[0m
[32m- should work[0m


This works but it is not the _functional_ style. The following version is closer to what we are looking for:

In [None]:
def impossibleTasks(org: Organization): List[Task.Id] =
    org.knows.foldLeft(ListBuffer.from(taskIds(org)))(
        (impTasks, entry) => impTasks subtractOne entry._2
    ).toList

or with pattern matching syntax:

In [None]:
def impossibleTasks(org: Organization): List[Task.Id] =
    org.knows.foldLeft(ListBuffer.from(taskIds(org))){
        case (impTasks, (_, task)) => impTasks subtractOne task
    }.toList

In [None]:
run(new TestImpossibleTasks(impossibleTasks))

But we can do it even better. We will endorse the following implementation that uses hight-level set operations (`diff`) and HOFs (`map`):

In [43]:
def impossibleTasks(org: Organization): List[Task.Id] = {
    val possibleTasks = org.knows.map(entry => entry._2)
    taskIds(org) diff possibleTasks
}

defined [32mfunction[39m [36mimpossibleTasks[39m

In [44]:
val possibleTasks = org.knows.map(entry => entry._2)

[36mpossibleTasks[39m: [32mList[39m[[32mTask[39m.[32mId[39m] = [33mList[39m(
  [32m"build"[39m,
  [32m"build"[39m,
  [32m"abstract"[39m,
  [32m"build"[39m,
  [32m"design"[39m,
  [32m"abstract"[39m,
  [32m"design"[39m,
  [32m"abstract"[39m,
  [32m"call"[39m,
  [32m"design"[39m,
  [32m"call"[39m
)

In [None]:
run(new TestImpossibleTasks(impossibleTasks))

Arguably, this implementation conveys the intent of the function more clearly. It's more _declarative_. Moreover, it is more reliable since it builds upon standard methods of the Scala library (`diff` and `map`). It's true that the imperative version is also easy to read, but this is only because this is such a very simple function. We will see later on more complex examples where the functional solution shines brighter. 

__Which tasks can be performed by the employees of a given department?__

In [61]:
class TestAllTasks(
    allTasks: Department.Id => Organization => List[Task.Id]
) extends FlatSpec with Matchers{
    
    "allTasks" should "work" in {
        allTasks("Product")(org).toSet shouldBe 
            Set("build")
        allTasks("Quality")(org).toSet shouldBe 
            Set()
        allTasks("Sales")(org).toSet shouldBe 
            Set("call")
        allTasks("Research")(org).toSet shouldBe 
            Set("abstract", "build", "design", "call")
    }
}

defined [32mclass[39m [36mTestAllTasks[39m

The basic queries of the data model allow us to obtain all the employees of an organization, and the tasks that they can perform. So, this a first step towards the solution:

In [46]:
// def allTasks(dpt: Department.Id)(org: Organization): List[Task.Id]
def allTasks(dpt: Department.Id)(org: Organization): List[List[Task.Id]] = 
    employeeIds(dpt)(org) map (
        emp => capabilities(emp)(org)
    )

defined [32mfunction[39m [36mallTasks[39m

However, this is not the signature that we need to implement, since we are returning a set of sets of tasks, not a set of tasks. In order to do it right we need also to _flatten_ the result, i.e. concatenate all the individual sets of tasks for each employee. In sum, we need the `flatMap` HOF:

In [65]:
def allTasks2(dpt: Department.Id)(org: Organization): List[Task.Id] = 
    employeeIds(dpt)(org).flatMap(
        emp => capabilities(emp)(org)
    ).distinct

defined [32mfunction[39m [36mallTasks2[39m

In [63]:
run(new TestAllTasks(allTasks))

[32mcell61$Helper$TestAllTasks:[0m
[32mallTasks[0m
[32m- should work[0m


__Compute the list of departments of an organization together with the number of tasks their employees can perform, sorted by the number of tasks__

In [47]:
class TestSortedDeps(
    sortedDeps: Organization => List[(Department.Id, Int)]
) extends FlatSpec with Matchers{
    
    "sortedDeps" should "work" in {
        sortedDeps(org).toSet shouldBe 
            Set(("Research",4), ("Product",1), ("Sales",1), ("Quality",0))
    }
}

defined [32mclass[39m [36mTestSortedDeps[39m

In [48]:
def sortedDeps(org: Organization): List[(Department.Id, Int)] = 
    departmentIds(org).map( dep => 
        (dep, allTasks(dep)(org).size)
    ).sortWith( (tuple1, tuple2) => 
        tuple1._2 > tuple2._2
    )

defined [32mfunction[39m [36msortedDeps[39m

or with pattern matching syntax:

In [66]:
def sortedDeps2(org: Organization): List[(Department.Id, Int)] = 
    departmentIds(org).map( dep => 
        (dep, allTasks2(dep)(org).size)
    ).sortWith{ case ((_, n1), (_, n2)) => 
        n1 > n2
    }

defined [32mfunction[39m [36msortedDeps2[39m

In [67]:
run(new TestSortedDeps(sortedDeps2))

[32mcell47$Helper$TestSortedDeps:[0m
[32msortedDeps[0m
[32m- should work[0m


__Which are the employees who can perform tasks of a given duration?__

In [None]:
class TestPersistentEmps(
    persistentEmps: Int => Organization => List[Employee.Id]
) extends FlatSpec with Matchers{
    
    "persistentEmps" should "work" in {
        persistentEmps(3)(org).toSet shouldBe 
            Set("Cora", "Drew", "Edna")
    }
}

In [None]:
def persistentEmps(min: Int)(org: Organization): List[Employee.Id] = 
    org.knows.flatMap{ case (emp, task) => 
        getTask(task)(org) filter { task =>
            task.hours > min
        } map { _ => 
            emp
        }
    }.distinct

Alterrnatively, we can also bet by with `filter` and `exists`:

In [None]:
def persistentEmps(min: Int)(org: Organization): List[Employee.Id] = 
    employeeIds(org).filter( empId =>  
        capabilities(empId)(org).flatMap(
            taskId => getTask(taskId)(org)
        ).exists(task => task.hours > min)
    )

In [None]:
run(new TestPersistentEmps(persistentEmps))

__Which are the departments whose employees, as a team, know how to perform a given set of tasks?__

In [68]:
class TestDptsThatKnowHowTo(
    dptsThatKnowHowTo: Set[Task.Id] => Organization => List[Department.Id]
) extends FlatSpec with Matchers{
    
    "dptsThatKnowHowTo" should "work" in {
        dptsThatKnowHowTo(Set())(org).toSet shouldBe 
            Set("Sales", "Product", "Quality", "Research")
        dptsThatKnowHowTo(Set("call"))(org).toSet shouldBe 
            Set("Sales", "Research")
        dptsThatKnowHowTo(Set("call", "abstract"))(org).toSet shouldBe 
            Set("Research")
    }
}

defined [32mclass[39m [36mTestDptsThatKnowHowTo[39m

We can build upon the previous function `allTasks`:

In [69]:
def dptsThatKnowHowTo(tasks: Set[Task.Id])(org: Organization): List[Department.Id] = 
    departmentIds(org) filter {
        dpt: Department.Id => tasks subsetOf allTasks(dpt)(org).toSet
    }

defined [32mfunction[39m [36mdptsThatKnowHowTo[39m

In [70]:
run(new TestDptsThatKnowHowTo(dptsThatKnowHowTo))

[32mcell68$Helper$TestDptsThatKnowHowTo:[0m
[32mdptsThatKnowHowTo[0m
[32m- should work[0m


__Obtain a list of employees sorted by the number of tasks that they can perform__

In [77]:
class TestSortedEmployees(
    sortedEmployees: Organization => List[(Employee.Id, Int)]
) extends FlatSpec with Matchers{
    
    "sortedEmployees" should "work" in {
        sortedEmployees(org) shouldBe 
            List(
              ("Alex", 1),
              ("Fred", 1),
              ("Bert", 1),
              ("Drew", 2),
              ("Cora", 3),
              ("Edna", 3))
    }
}

defined [32mclass[39m [36mTestSortedEmployees[39m

We may attempt the following:

In [80]:
def sortedEmployees(org: Organization): List[(Employee.Id, Int)] = 
    org.knows
        .groupBy(_._1)     // Map[Employee.Id, List[Task.Id]]
        .mapValues(_.size) // Map[Employee.Id, Int]
        .toList            // List[(Employee.Id, Int)]
        .sortBy(_._2) 

1 deprecation (since 2.13.0); re-run with -deprecation for details


defined [32mfunction[39m [36msortedEmployees[39m

In [81]:
    org.knows
        .groupBy(_._1)     // Map[Employee.Id, List[Task.Id]]
        .mapValues(_.size) // Map[Employee.Id, Int]
        .toList            // List[(Employee.Id, Int)]
        .sortBy(_._2)

1 deprecation (since 2.13.0); re-run with -deprecation for details


[36mres81[39m: [32mList[39m[([32mEmployee[39m.[32mId[39m, [32mInt[39m)] = [33mList[39m(
  ([32m"Alex"[39m, [32m1[39m),
  ([32m"Fred"[39m, [32m1[39m),
  ([32m"Bert"[39m, [32m1[39m),
  ([32m"Drew"[39m, [32m2[39m),
  ([32m"Cora"[39m, [32m3[39m),
  ([32m"Edna"[39m, [32m3[39m)
)

and this is almost right: we are missing those employees that can't perform any task. This is the right one:

In [78]:
def sortedEmployees(org: Organization): List[(Employee.Id, Int)] = 
    employeeIds(org).map{ empId: Employee.Id => 
        (empId, capabilities(empId)(org).size)
    }.sortBy(_._2)

defined [32mfunction[39m [36msortedEmployees[39m

In [79]:
run(new TestSortedEmployees(sortedEmployees))

[32mcell77$Helper$TestSortedEmployees:[0m
[32msortedEmployees[0m
[32m- should work[0m


__Which are the departments whose employees are all able to perform a given task?__

In [82]:
class TestExpertDepsIn(
    expertDpts: Task.Id => Organization => List[Department.Id]
) extends FlatSpec with Matchers{
    
    "expertDpts" should "work" in {
        expertDpts("abstract")(org).toSet shouldBe 
            Set("Quality", "Research")
    }
}

defined [32mclass[39m [36mTestExpertDepsIn[39m

The conventional imperative solution is quite complex: 

In [83]:
def expertDepsIn(task: Task.Id)(org: Organization): List[Department.Id] = {
    var out: ListBuffer[Department.Id] = ListBuffer()
    for (dep <- departmentIds(org)){
        var employeesThatCantPerform: ListBuffer[Employee.Id] = 
            ListBuffer.from(employeeIds(dep)(org))
        for (emp <- employeeIds(dep)(org)){
            for (someTask <- capabilities(emp)(org))
                if (someTask == task)
                    employeesThatCantPerform -= emp
        }
        if (employeesThatCantPerform.isEmpty) 
            out += dep
    }
    out.toList
}

defined [32mfunction[39m [36mexpertDepsIn[39m

In [84]:
run(new TestExpertDepsIn(expertDepsIn))

[32mcell82$Helper$TestExpertDepsIn:[0m
[32mexpertDpts[0m
[32m- should work[0m


This is not only more complex to understand, but prone to error. In order to obtain a simpler (and functional) solution by first declaring in plain natural language the intended query:

In [None]:
def expertDepsIn(tsk: Task.Id)(org: Organization): List[Department.Id] = 
    // From all the departments of the organization, choose
    // those that for all its employees
    // the specified task is included in their capabilities
    ???

Then, we can formalize the natural language specification by relying on standard HOFs (`filter`, `forall`) and collection operations (`contains`):

In [85]:
def expertDepsIn(tsk: Task.Id)(org: Organization): List[Department.Id] = 
    departmentIds(org).filter(
        dpt => employeeIds(dpt)(org).forall(
            emp => capabilities(emp)(org).contains(tsk)
        )
    )

defined [32mfunction[39m [36mexpertDepsIn[39m

In [86]:
run(new TestExpertDepsIn(expertDepsIn))

[32mcell82$Helper$TestExpertDepsIn:[0m
[32mexpertDpts[0m
[32m- should work[0m
