# Collections in Scala

* Iterable - collection that can yield an iterator.
* Iterator has methods like `hasNext` and `next`

* Seq - ordered sequence of values. Ex: Array, List
* IndexedSeq - integer indices ex: ArrayBuffer
* Set - collection of unique items. No duplicates.
* Map - collection of key value pairs.

* All scala collections have companion objects with `apply` method.
* Methods like `toSeq`, `toSet`, `toMap` are available.
* Collections of similar kinds can be compared using `==

## Mutable and Immutable collections

In [1]:
// scala.collection.Map is the supertype of mutable and immutable Map
import scala.collection.immutable.{Set, Map}
import scala.collection.mutable.{
                                Set => MutableSet,
                                Map => MutableMap
                                }

[32mimport [39m[36mscala.collection.immutable.{Set, Map}
[39m
[32mimport [39m[36mscala.collection.mutable.{
                                Set => MutableSet,
                                Map => MutableMap
                                }[39m

* In Scala, apply methods of companion objects return immutable collections by
default.

## Sequences

Hierarchy for immutable sequences

* Seq is the supertype. It has IndexedSeq trait, then List, Stream, Stack, Queue.
* Vector and Range are IndexedSeq


Hierarchy for mutable sequences
* Seq is the super type. contains IndexedSeq, Stack, Queue,
ListBuffer, PriorityQueue.
* IndexedSeq is the supertype of ArrayBuffer.

Vector provides extremely fast random access.

## Lists

* Nil is empty List. List has head and tail fields.
* we can create lists using `::` class

In [4]:
// :: is a case class. So we can write case classese with
// two parameters using infix notation
// 1 :: Nil becomes ::(1, Nil)
val list = 1 :: Nil
for (i <- list)
    println(i)

1


[36mlist[39m: [32mList[39m[[32mInt[39m] = [33mList[39m([32m1[39m)

In [7]:
import scala.annotation.tailrec

@tailrec
//tail is either a list or Nil
def printList(list: List[Int]): Unit = {
    if (list != Nil) {
        println(list.head)
        printList(list.tail)
    }
}

printList(List((1 to 5): _*))

1
2
3
4
5


[32mimport [39m[36mscala.annotation.tailrec

[39m
defined [32mfunction[39m [36mprintList[39m

In [11]:
def printListUsingPatternMatch(list: List[Int]): Unit = {
    list match {
        case Nil => ()
        case head :: tail => {
            println(head)
            printListUsingPatternMatch(tail)
        }
    }
}

printListUsingPatternMatch(List((1 to 5): _*))

1
2
3
4
5


defined [32mfunction[39m [36mprintListUsingPatternMatch[39m

## Sets

* Set contains distinct elements only
* Set by default is unordered. SortedSet contains elements in
order and LinkedHashSet retains the order in which the
items were added to the Set. These are available under 
`collection.mutable` package.

* BitSet for small set containing only nonnegative integers.

Useful methods on set are
* contains
* subsetOf
* union or `|` or `++`
* intersect or `&`
* diff or `--`

Operations on immutable collections

* `+` - add to unordered collection

Ordered collection addition
* `+:` - add to beginning (+ at the start means at start)
* `:+` - adds to the end

* `-` to remove the element
* `++`, `--` for bulk addition and removal

Operations on mutable collections
* `+=` - adds element to the collection
* `++=` and `--=` - bulk addition and removal

## Important operations

* map - 1 to 1 transformation
* transform - inplace equivalent that can be used on mutable collections
* flatmap - 1 to many transformation followed by flattening
* filter - prune collection
* foreach - iterate through each item in the collection 
* groupBy -key to collection of grouped elements
* zip - collections to collection of pairs.
* zipAll
* zipWithIndex - like enumerate in Python, elements with their index.
* sum, min, max, count

* reduceLeft, reduceRight
* foldLeft, foldRight - offers initial value to be passed.
* scanLeft, scanRight - combines folding and mapping.

In [12]:
// returns all the intermediate results
// inaddition to folding
(1 to 10).scanLeft(0)(_ + _)

[36mres11[39m: [32mIndexedSeq[39m[[32mInt[39m] = [33mVector[39m([32m0[39m, [32m1[39m, [32m3[39m, [32m6[39m, [32m10[39m, [32m15[39m, [32m21[39m, [32m28[39m, [32m36[39m, [32m45[39m, [32m55[39m)

## Iterators

* Calling `.iterator` on any collection returns an iterator that can
be iterated exactly once.

* Iterators can be used with for loop or can use methods like map, filter etc.

## Streams

* Each call to Iterator changes its state internally. Streams offer immutable
alternative.
* Streams offer lazy tail evaluation.
* We can convert iterator to Stream using iterator's `toStream`
method.

In [19]:
import scala.collection.immutable.Stream

def getIntegers(min: Int, max: Int): Stream[Int] = {
    if(min > max)
        Stream.empty[Int]
    else
        min #:: getIntegers(min + 1, max)
}

[32mimport [39m[36mscala.collection.immutable.Stream

[39m
defined [32mfunction[39m [36mgetIntegers[39m

In [24]:
val intStream = getIntegers(10, 10000)
// notice the stream's tail is lazily computed
println(intStream)

Stream(10, <not computed>)


[36mintStream[39m: [32mStream[39m[[32mInt[39m] = [33mStream[39m(
  [32m10[39m,
  [32m11[39m,
  [32m12[39m,
  [32m13[39m,
  [32m14[39m,
  [32m15[39m,
  [32m16[39m,
  [32m17[39m,
  [32m18[39m,
  [32m19[39m,
  [32m20[39m,
  [32m21[39m,
  [32m22[39m,
  [32m23[39m,
  [32m24[39m,
  [32m25[39m,
  [32m26[39m,
  [32m27[39m,
  [32m28[39m,
  [32m29[39m,
  [32m30[39m,
  [32m31[39m,
  [32m32[39m,
  [32m33[39m,
  [32m34[39m,
  [32m35[39m,
  [32m36[39m,
  [32m37[39m,
  [32m38[39m,
  [32m39[39m,
  [32m40[39m,
  [32m41[39m,
  [32m42[39m,
  [32m43[39m,
  [32m44[39m,
  [32m45[39m,
  [32m46[39m,
  [32m47[39m,
...

In [25]:
// get a subset of a stream
// take returns a stream
// to materialize the stream we need to call force method
val intStreamPart = intStream.take(10)
println(intStreamPart)

intStreamPart.force
println(intStreamPart) // materialized

Stream(10, <not computed>)
Stream(10, 11, 12, 13, 14, 15, 16, 17, 18, 19)


[36mintStreamPart[39m: [32mStream[39m[[32mInt[39m] = [33mStream[39m([32m10[39m, [32m11[39m, [32m12[39m, [32m13[39m, [32m14[39m, [32m15[39m, [32m16[39m, [32m17[39m, [32m18[39m, [32m19[39m)
[36mres24_2[39m: [32mStream[39m[[32mInt[39m] = [33mStream[39m([32m10[39m, [32m11[39m, [32m12[39m, [32m13[39m, [32m14[39m, [32m15[39m, [32m16[39m, [32m17[39m, [32m18[39m, [32m19[39m)

In [26]:
// here we iteratre over the stream
for (i <- 1 to 100) {
    intStream(i)
}

for (i <- 1 to 100) {
    intStream(i)
}


// Once a part of the stream is computed, it is stored
// its not recomputed.
println(intStream)

Stream(10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, <not computed>)


## Lazy views

* Lazy views are equivalent to Streams in Java 8.
* Calling `view` method on the collections returns a lazy view.
* No elements are precomputed in the lazy view. Save CPU and memory.(kind of similar to generators in Python)
* Lazy views do not cache values. Every time the computation starts over.
* To convert a view to a concrete collection we need to use `.to` method on
the view.
* Never call `force` on a lazy view particularly when the view is backed up by a very large collection. It defeats the whole purpose of using the view.

In [27]:
// Here the RangeVector(1 to 10000) is the underlying collection
// and view is a lazy iterator over the underlying
// collection
// NOTE: Any transformation like map, filter etc also return a view
// Hence those operations are invoked only when an action is encountered
// like mkString, force, to etc
val lazyViewColl = (1 to 10000).view
.map(x => x * x)
.filter(_ % 2 == 1)
// notice even the first element is not computed
println(lazyViewColl)

View(<not computed>)


[36mlazyViewColl[39m: [32mcollection[39m.[32mView[39m[[32mInt[39m] = [33mView[39m(
  [32m1[39m,
  [32m9[39m,
  [32m25[39m,
  [32m49[39m,
  [32m81[39m,
  [32m121[39m,
  [32m169[39m,
  [32m225[39m,
  [32m289[39m,
  [32m361[39m,
  [32m441[39m,
  [32m529[39m,
  [32m625[39m,
  [32m729[39m,
  [32m841[39m,
  [32m961[39m,
  [32m1089[39m,
  [32m1225[39m,
  [32m1369[39m,
  [32m1521[39m,
  [32m1681[39m,
  [32m1849[39m,
  [32m2025[39m,
  [32m2209[39m,
  [32m2401[39m,
  [32m2601[39m,
  [32m2809[39m,
  [32m3025[39m,
  [32m3249[39m,
  [32m3481[39m,
  [32m3721[39m,
  [32m3969[39m,
  [32m4225[39m,
  [32m4489[39m,
  [32m4761[39m,
  [32m5041[39m,
  [32m5329[39m,
  [32m5625[39m,
...

In [31]:
val lazyViewSubset = lazyViewColl.take(10)
println(lazyViewSubset)

// actions like mkString causes the lazy view to materialize
println(lazyViewSubset.mkString("_"))

// values are not cached
println(lazyViewSubset)

// computation starts over
println(lazyViewSubset.mkString("_"))

View(<not computed>)
1_9_25_49_81_121_169_225_289_361
View(<not computed>)
1_9_25_49_81_121_169_225_289_361


[36mlazyViewSubset[39m: [32mcollection[39m.[32mView[39m[[32mInt[39m] = [33mView[39m(
  [32m1[39m,
  [32m9[39m,
  [32m25[39m,
  [32m49[39m,
  [32m81[39m,
  [32m121[39m,
  [32m169[39m,
  [32m225[39m,
  [32m289[39m,
  [32m361[39m
)

In [32]:
// force is an action to materialize the view
// force returns materialized view
println(lazyViewSubset.force)

// but views dont cache value, so this prints not-computed
println(lazyViewSubset)

Vector(1, 9, 25, 49, 81, 121, 169, 225, 289, 361)
View(<not computed>)


## Java Interoperability

* `import scala.jdk.CollectionConverters._` and call `asScala`
and `asJava` to convert java to scala and scala to java.

## Parallel collections

* From 2.13 onwards, parallel collections have been moved out of
scala standard library.

* To depend on scala-parallel-collections in sbt, 
add this to your **build.sbt**
```Scala
libraryDependencies +=
  "org.scala-lang.modules" %% "scala-parallel-collections" % "1.0.1"
```

* Parallelize computation on large collections.

* Parallel collections extend `ParSeq`, `ParSet` and `ParMap` traits

* [Parallel collections](https://alvinalexander.com/scala/how-to-use-parallel-collections-in-scala-performance/) are explained in this tutorial.

* [Parallel collections in Scala 2.13](http://allaboutscala.com/tutorials/chapter-8-beginner-tutorial-using-scala-collection-functions/par-function-2-13/)

In [13]:
//adds the maven dependency
import $ivy.`org.scala-lang.modules::scala-parallel-collections:1.0.1`

// We should import this to access par attribute on collections
import scala.collection.parallel.CollectionConverters._

val total = (1 to 10000).par.sum

Downloading https://repo1.maven.org/maven2/org/scala-lang/modules/scala-parallel-collections_2.13/1.0.1/scala-parallel-collections_2.13-1.0.1.pom
Downloaded https://repo1.maven.org/maven2/org/scala-lang/modules/scala-parallel-collections_2.13/1.0.1/scala-parallel-collections_2.13-1.0.1.pom
Downloading https://repo1.maven.org/maven2/org/scala-lang/modules/scala-parallel-collections_2.13/1.0.1/scala-parallel-collections_2.13-1.0.1-sources.jar
Downloading https://repo1.maven.org/maven2/org/scala-lang/modules/scala-parallel-collections_2.13/1.0.1/scala-parallel-collections_2.13-1.0.1.jar
Downloaded https://repo1.maven.org/maven2/org/scala-lang/modules/scala-parallel-collections_2.13/1.0.1/scala-parallel-collections_2.13-1.0.1-sources.jar
Downloaded https://repo1.maven.org/maven2/org/scala-lang/modules/scala-parallel-collections_2.13/1.0.1/scala-parallel-collections_2.13-1.0.1.jar


[32mimport [39m[36m$ivy.$                                                         
[39m
[32mimport [39m[36mscala.collection.parallel.CollectionConverters._

[39m
[36mtotal[39m: [32mInt[39m = [32m50005000[39m

In [15]:
// parallelize for loop
for (i <- (1 to 10).par)
    println(i)

1
2
6
4
5
3
7
8
9
10


In [17]:
// first part contains function to process two inputs
// next function specifies the logic to combine the results
// In order to parallelize operation using aggregate, the operation needs to be associative
"Hello".toSeq.par.aggregate(Set[Char]())(
                                    // here we add the character to the set
                                    _ + _,
                                    // here we specify the logic to combine two sets
                                   _ ++ _)

[36mres16[39m: [32mSet[39m[[32mChar[39m] = [33mSet[39m([32m'H'[39m, [32m'e'[39m, [32m'l'[39m, [32m'o'[39m)