# Scala's Collection Library

### References

A gentle guide to the Scala's collection library can be found online in the Scala site:

https://docs.scala-lang.org/overviews/collections-2.13/introduction.html

In [None]:
import $ivy.`org.scalatest::scalatest:3.0.8`
import _root_.org.scalatest._

## Collection types

Collection types can be classified along two major dimensions:
- Mutable/Immutable: mutable collections can be modified in-place; immutable collections return a new instance when they are updated.
- Sequences/Sets/Maps. These are ordered/unordered collections and key-value collections, respectively. 

There are many different implementations of sequences, sets and maps, as summarized by these figures (cf. Scala's [guide](https://docs.scala-lang.org/overviews/collections-2.13/overview.html))

![](images/collections-hierarchy.svg)

![](images/collections-hierarchy-inmutable.svg)

![](images/collections-mutable.svg)

How do we choose from this huge number of collection types and implementations? The colletion type is chosen according to the functionality that we demand from our collection, e.g., should it be _ordered_?, can there be repeated elements?, are elements associated with keys?, etc. In order to choose the right implementation of the collection type (e.g. should I use a `Vector` or a `List`?) we must take into account the complexity of the corresponding implementation: 

https://docs.scala-lang.org/overviews/collections-2.13/performance-characteristics.html

From now on, we will work with immutable collection types only.

## `Map`s of key-value pairs

`Map`s are key-value collections (i.e. they are like sets of key-value pairs indexed by their keys).

Specially for key-value pairs, we commonly write `Tuple2` values with the following syntax: 

Common operations on maps: 

In [None]:
// retrieving values of existing and non-existing keys



In [None]:
// retrieving all keys and values



The default implementation types of these `Iterable`s are `collection.immutable.Set` and the general `View` type. Views are _lazy_ collections which offer better performance. For more information on views consult the Scala [guide](https://docs.scala-lang.org/overviews/collections-2.13/views.html). The only thing that we need from views and the general `Iterable` collection type is that they can be converted to a concrete collection type with conversors `toList`, `toSet`, `toMap`, etc. 

In [None]:
// Converting to proper implementation types



In [None]:
// Mapping values (toMap is required to convert the view type `MapView` to `Map`)



In [None]:
// Mapping whole entries, not only values



In [None]:
// Filter also return a general iterable (a list, by default)



We can obtain a `Map` from a list of pairs with `toMap`:

Note that only one value is kept for a single key. If we want all values, we can use `groupBy`:

# Set collections

`Set`s are unordered collections of unique elements. The following two sets are equal: 

In [None]:

// compare with lists:



Common operations on sets: 

In [None]:
// Filtering elements


In [None]:
// Mapping elements


In [None]:
// Flatmapping elements


In [None]:
// Common set operations


## Implementing & querying data models

The Scala collections greatly facilitate the implementation and querying of data models. For instance, the following classes model the structure of an organization which consists of departments, employees and tasks that employees can perform. 

In [None]:
// departments


// tasks 


// Employees


// The whole organization



This implementation is an example of a _flat_ data model. The key feature of these kinds of models are that the different entities (employees, departments and tasks, in this case) refer to each other by using _keys_. This is a possible instance of the organization data model:

In [None]:
/*
val org: Organization = Organization(
    Map(
        "Product"  -> Department("Product"),
        "Quality"  -> Department("Quality"),
        "Research" -> Department("Research"),
        "Sales"    -> Department("Sales")),
    
    Map("build"    -> Task("build", 3), 
        "abstract" -> Task("abstract", 5), 
        "design"   -> Task("design", 2),
        "call"     -> Task("call", 1),
        "program"  -> Task("program", 3)),
    
    Map("Alex"     -> Employee("Alex", "Product"), 
        "Bert"     -> Employee("Bert", "Product"), 
        "Cora"     -> Employee("Cora", "Research"), 
        "Drew"     -> Employee("Drew", "Research"), 
        "Edna"     -> Employee("Edna", "Research"), 
        "Fred"     -> Employee("Fred", "Sales")),
    
    Set(
        ("Alex", "build"),
        ("Bert", "build"),
        ("Cora", "abstract"),
        ("Cora", "build"),
        ("Cora", "design"),
        ("Drew", "abstract"),
        ("Drew", "design"),
        ("Edna", "abstract"),
        ("Edna", "call"),
        ("Edna", "design"),
        ("Fred", "call")))
*/

Flat data models are actually very close to the common _relational_ data models used in SQL persistent stores. This is the equivalent relational model of the organization database: 

![](images/relational-model.png)

According to this mapping: 
- The `Organization` class represents the whole relational _database_.
- Members of this class correspond to the different _tables_ of the database, represented as `Map`s or simple `Set`s. We have four tables: the table of departments, employees, tasks, and a table which stores which tasks employees can perform.
- The key type of `Map` can be understood as the primary key of the relational table. The value type specifies the columns of the table. By convention, the identifier type is defined by the `Id` type alias in the companion object of the value type. For instance, the `employees` table is indexed by the employee identifier (a string value), and stores the department to which the employee belongs to.
- If the primary key consists of several keys, as in the `knows` table, we use tuples. 
- If the table just consists of the key (simple or composed) we use `Set` instead of `Map` (as the `knows` table also illustrates).



### Basic queries

Complex queries typically builds upon basic queries which are directly related to the structure of the data model. In particular, they are identified from the primary key and foreign-key relations in the relational model. In the organizational database we can identify the following queries:

In [None]:

object BasicQueries{

    // Queries from single primary keys
    
    // Queries from foreign-keys
    
    // Queries from compound primary keys
    
}

import BasicQueries._

### Sample queries

__Which are the tasks of the organization which can't be performed by any employee?__

In [None]:
class TestImpossibleTasks(
    impossibleTasks: Organization => Set[Task.Id]
) extends FlatSpec with Matchers{
    
    "impossibleTasks" should "work" in {
        impossibleTasks(org) shouldBe 
            ???
    }
}

This is a conventional imperative implementation, using mutable variables:

In [None]:
def impossibleTasks(org: Organization): Set[Task.Id] = 
    ???

In [None]:
run(new TestImpossibleTasks(impossibleTasks))

This works but it is not the _functional_ style. The following version is closer to what we are looking for:

In [None]:
def impossibleTasks(org: Organization): Set[Task.Id] =
    ???

or with pattern matching syntax:

In [None]:
def impossibleTasks(org: Organization): Set[Task.Id] =
    ???

In [None]:
run(new TestImpossibleTasks(impossibleTasks))

But we can do it even better. We will endorse the following implementation that uses hight-level set operations (`diff`) and HOFs (`map`):

In [None]:
def impossibleTasks(org: Organization): Set[Task.Id] = 
    ???

In [None]:
run(new TestImpossibleTasks(impossibleTasks))

Arguably, this implementation conveys the intent of the function more clearly. It's more _declarative_. Moreover, it is more reliable since it builds upon standard methods of the Scala library (`diff` and `map`). It's true that the imperative version is also easy to read, but this is only because this is such a very simple function. We will see later on more complex examples where the functional solution shines brighter. 

__Which tasks can be performed by the employees of a given department?__

In [None]:
class TestAllTasks(
    allTasks: Department.Id => Organization => Set[Task.Id]
) extends FlatSpec with Matchers{
    
    "allTasks" should "work" in {
        ???
    }
}

The basic queries of the data model allow us to obtain all the employees of an organization, and the tasks that they can perform. So, this a first step towards the solution:

In [None]:
def allTasks(dpt: Department.Id)(org: Organization): Set[Task.Id] = 
    ???

However, this is not the signature that we need to implement, since we are returning a set of sets of tasks, not a set of tasks. In order to do it right we need also to _flatten_ the result, i.e. concatenate all the individual sets of tasks for each employee. In sum, we need the `flatMap` HOF:

In [None]:
def allTasks(dpt: Department.Id)(org: Organization): Set[Task.Id] = 
    ???

In [None]:
run(new TestAllTasks(allTasks))

__Which are the departments whose employees, as a team, know how to perform a given set of tasks?__

In [None]:
class TestDptsThatKnowHowTo(
    dptsThatKnowHowTo: Set[Task.Id] => Organization => Set[Department.Id]
) extends FlatSpec with Matchers{
    
    "dptsThatKnowHowTo" should "work" in {
        ???
    }
}

We can build upon the previous function `allTasks`:

In [None]:
def dptsThatKnowHowTo(tasks: Set[Task.Id])(org: Organization): Set[Department.Id] = 
    ???

In [None]:
run(new TestDptsThatKnowHowTo(dptsThatKnowHowTo))

__Obtain a list of employees sorted by the number of tasks that they can perform__

In [None]:
class TestSortedEmployees(
    sortedEmployees: Organization => List[(Employee.Id, Int)]
) extends FlatSpec with Matchers{
    
    "sortedEmployees" should "work" in {
        sortedEmployees(org) shouldBe 
            ???
    }
}

In [None]:
def sortedEmployees(org: Organization): List[(Employee.Id, Int)] = 
    ???

In [None]:
run(new TestSortedEmployees(sortedEmployees))

__Which are the departments whose employees are all able to perform a given task?__

In [None]:
class TestExpertDepsIn(
    expertDpts: Task.Id => Organization => Set[Department.Id]
) extends FlatSpec with Matchers{
    
    "expertDpts" should "work" in {
        expertDpts("abstract")(org) shouldBe 
            ???
    }
}

The conventional imperative solution is quite complex: 

In [None]:
def expertDepsIn(task: Task.Id)(org: Organization): Set[Department.Id] = {
    ???
}

In [None]:
run(new TestExpertDepsIn(expertDepsIn))

This is not only more complex to understand, but prone to error. In order to obtain a simpler (and functional) solution by first declaring in plain natural language the intended query:

In [None]:
def expertDepsIn(tsk: Task.Id)(org: Organization): Set[Department.Id] = 
    ???

Then, we can formalize the natural language specification by relying on standard HOFs (`filter`, `forall`) and collection operations (`contains`):

In [None]:
def expertDepsIn(tsk: Task.Id)(org: Organization): Set[Department.Id] = 
    ???

In [None]:
run(new TestExpertDepsIn(expertDepsIn))