# GeoTrellis Scala Design Patterns

This juptyer notebook is used to explain through some design patterns or concepts that GeoTrellis uses.

It's broken up into the following subtopics:
- Performance
- Organization
- Type Constraints

# Performance

In [3]:
/** This method will be used in the performance section
  * to produce timing results.
  */
def time(msg: String)(f: => Unit): Unit = {
    val s = System.currentTimeMillis
    f
    println(s"[$msg] Took: ${System.currentTimeMillis - s} ms")
}

### Macros

Motivating example:

In [6]:
val len = 1000000000
time("Using foreach over a range") {
    var x = 0
    for(i <- 0 until len) {
        x += i
    }
}

time("using a while loop") {
    var x = 0
    var i = 0
    while(i < len) {
        x += i
        i += 1
    }
}

import spire.syntax.cfor._

time("spire's cfor macro") {
    var x = 0
    cfor(0)(_ < len, _ + 1) { i =>
        x += i
    }
}

[Using foreach over a range] Took: 2873 ms
[using a while loop] Took: 326 ms
[spire's cfor macro] Took: 321 ms


Notice the `cfor` performs as well as the `while` loop. This is a macro that is supplid by the [Spire](https://github.com/non/spire/) library.

`Macros`, specifically `Def Macros`, allow you to rewrite the abstract syntax tree of the code at the callsite of a `def`. For a more complete definition, [see the docs](http://docs.scala-lang.org/overviews/macros/overview.html).

`cfor` let's us iterate over our core raster type (called `Tile`) very quickly, and we use it _everywhere_. A lot of the performance critical codepaths and batch jobs that GeoTrellis is used for spend a majority of CPU time iterating over rasters cells. We always either use the `while` loop or use a `cfor` for this type of iteration, e.g.:

```scala
cfor(0)(_ < rows, _ + 1) { row =>
  cfor(0)(_ < cols, _ + 1) { col =>
    var count = 0
    var sum = 0.0
    cfor(0)(_ < layerCount, _ + 1) { i =>
      val v = rs(i).getDouble(col, row)
      if(isData(v)) {
        count += 1
        sum += v
      }
    }

    if(count > 0) {
      tile.setDouble(col, row, sum/count)
    } else {
      tile.setDouble(col, row, Double.NaN)
    }
  }
}
```
[Source](https://github.com/locationtech/geotrellis/blob/a00c35b928e96083188d91734252d66574e3b4a7/raster/src/main/scala/geotrellis/raster/mapalgebra/local/Mean.scala#L44-L62)

As we can see, this is ugly mutable optimized code. We wrap this functionality is exposed via an immutable API:

In [None]:
def tileMeanScope = {
    import geotrellis.raster._
    
    val tiles: Seq[Tile] = ???
    val result: Tile = tiles.localMean
}

Unit

This is an example of a driving design principle used in GeoTrellis:

![geotrellis-values](http://i.imgur.com/HbF9yV3.jpg?1)


#### Using macros to inline overloading for performance critical comparisions

Raster data can contain the concept of NoData, which is a designated value to mean the lack of data.
If a cell has that value, many algorithms will ignore that cell as not pertinent data.
While we are iterating over tiles, many times we are checking if a given value is NoData or not.
By default, GeoTrellis considers `Int.MinValue` as NoData for integer cell types, and `Double.NaN` as
NoData for double cell types.

```scala
val i: Int = ???
val d: Double = ???

if(i == NODATA) { /* This is NoData. NODATA is a val equal to Int.MinValue */ }
if(java.lang.Double.isNaN(d)) { /* This is NoData */ }
```


A common cause of bugs in earlier version of GeoTrellis were caused by developers checked a double value against `Int.MinValue` (or the `NODATA` constant we provide), an integer value with `java.lang.double.isNaN`:

```scala
val x: Double = ???
val y: Int = ???

if(x == NODATA) { /* WRONG! */ }
if(java.lang.Double.isNaN(y)) { /* WRONG! */ }
```

A bug would also occur if we checked a double value by checking
```scala
val d: Double = Double.NaN
if(d == Double.NaN) { /* Nope, never gonna happen */ }
```

since that equality check will always evaluate to `false`, even if `d` is `Double.NaN`.

One solution to this problem is to use simple overloading. But check out the difference in timings for doing the check against a constant or calling the `NaN` check directly:

In [10]:
object NoData {
    def isNoData(v: Int): Boolean = v == Int.MinValue
    def isNoData(v: Double): Boolean = java.lang.Double.isNaN(v)
}

time("Ints: constant check") { 
    var i = 0
    while(i < 10000000) {
        if(i != Int.MinValue) { i += 1 }
    }
}

time("Ints: overloaded method") { 
    var i = 0
    while(i < 10000000) {
        if(!NoData.isNoData(i)) { i += 1 }
    }
}

time("Doubles: java.lang.Double.isNaN") {
   var i = 0.0
    while(i < 10000000.0) {
        if(!java.lang.Double.isNaN(i)) { i += 1.0 }
    }
}

time("Doubles: overloaded method") {
   var i = 0.0
    while(i < 10000000.0) {
        if(!NoData.isNoData(i)) { i += 1.0 }
    }
}

[Ints: constant check] Took: 33 ms
[Ints: overloaded method] Took: 67 ms
[Doubles: java.lang.Double.isNaN] Took: 35 ms
[Doubles: overloaded method] Took: 74 ms


This is because the check against a constant, or direct call to the method, performs many less lines of bytecode
to execute on the JVM than creating an overloaded method that requires a VTable lookup.
To get around this, we've created a macro that inlines the check:

In [11]:
import geotrellis.raster._

time("Ints: constant check") { 
    var i = 0
    while(i < 10000000) {
        if(i != Int.MinValue) { i += 1 }
    }
}

time("Ints: isNoData macro") { 
    var i = 0
    while(i < 10000000) {
        if(!isNoData(i)) { i += 1 }
    }
}

time("Doubles: java.lang.Double.isNaN") {
   var i = 0.0
    while(i < 10000000.0) {
        if(!java.lang.Double.isNaN(i)) { i += 1.0 }
    }
}

time("Doubles: isNoData macro") {
   var i = 0.0
    while(i < 10000000.0) {
        if(!isNoData(i)) { i += 1.0 }
    }
}

[Ints: constant check] Took: 3 ms
[Ints: isNoData macro] Took: 3 ms
[Doubles: java.lang.Double.isNaN] Took: 35 ms
[Doubles: isNoData macro] Took: 34 ms


This macro quite simply inlines the check into the call site. If you want to learn more about how macros work, it's a good one to check out since it really does a dead simple thing, but for a very useful purpose.

In [macros/src/main/scala/geotrellis/macros/NoDataMacros.scala](https://github.com/locationtech/geotrellis/blob/a00c35b928e96083188d91734252d66574e3b4a7/macros/src/main/scala/geotrellis/macros/NoDataMacros.scala#L16-L19)
```scala
object NoDataMacros {
  // ...

  def isNoDataInt_impl(ct: Context)(i: ct.Expr[Int]): ct.Expr[Boolean] = {
    import ct.universe._
    ct.Expr(q"""$i == Int.MinValue""")
  }

  // ...
}
```

In [raster/src/main/scala/geotrellis/raster/package.scala](https://github.com/locationtech/geotrellis/blob/a00c35b928e96083188d91734252d66574e3b4a7/raster/src/main/scala/geotrellis/raster/package.scala#L162)
```scala
package object raster {
  // ...
  
  def isNoData(i: Int): Boolean = macro NoDataMacros.isNoDataInt_impl

  // ...
}
```


#### Using macros to get around FunctionN not being specialized where N > 2

Functions like `{ x: Int => x + 1 }` are _specialized_, in that they don't box over certain primatives. Boxing is the act of the JVM requiring type erasures on generics being fulfilled by primatives: if you call some `def foo[T](x: T)` with a `Double`, that primitive `Double` will be "boxed" into an Object before being passed in, and "unboxed" internally to the function. This is causes dramatic slowdowns when working with primitive types. You can read more about boxing [here](https://en.wikipedia.org/wiki/Object_type_%28object-oriented_programming%29#Boxing).

Functions like `{ (x: Int, y: Int, z: Int) => x + y + z }` are not specialized. So we see a dramatic slowdown.

For example, if we define a function `map` that maps the values of each raster cell into another raster (`Tile` type):


In [13]:
def map(tile: Tile)(f: Int => Int): Tile = {
    val result: MutableArrayTile = ArrayTile.alloc(IntCellType, 1500, 1600)
    cfor(0)(_ < tile.rows, _  + 1) { row =>
        cfor(0)(_ < tile.cols, _ + 1) { col =>
            result.set(col, row, f(tile.get(col, row)))
        }
    }
    result
}

and compare that to the definition that maps an `(Int, Int, Int) => Int` over the cell values that also takes into consideration the colum and row of the cell:

In [14]:
def map2(tile: Tile)(f: (Int, Int, Int) => Int): Tile = {
    val result: MutableArrayTile = ArrayTile.alloc(IntCellType, 1500, 1600)
    cfor(0)(_ < tile.rows, _  + 1) { row =>
        cfor(0)(_ < tile.cols, _ + 1) { col =>
            result.set(col, row, f(col, row, tile.get(col, row)))
        }
    }
    result
}

we can see that the `Function1` executes much faster than the `Function3`.

In [22]:
val tile = ArrayTile(Array.ofDim[Int](1500 * 1600).fill(scala.util.Random.nextInt), 1500, 1600)

time("Function1") { 
    map(tile) { z => z + 1 }
}

time("Function3") { 
    map2(tile) { (col, row, z) => col * row * z + 1 }
}

[Function1] Took: 16 ms
[Function3] Took: 55 ms


This is because the `Function3` is not specialized, so is boxing. We can see in the definitions for those traits that this is in fact the case:
- https://github.com/scala/scala/blob/v2.12.0/src/library/scala/Function1.scala#L32
- https://github.com/scala/scala/blob/v2.12.0/src/library/scala/Function3.scala#L16

One way aroudn this is to define a trait with concrete primative parameters. Notice the performance of this version:

In [23]:
trait Mapper { def apply(col: Int, row: Int, z: Int): Int }
def map3(tile: Tile)(f: Mapper): Tile = {
    val result: MutableArrayTile = ArrayTile.alloc(IntCellType, 1500, 1600)
    cfor(0)(_ < tile.rows, _  + 1) { row =>
        cfor(0)(_ < tile.cols, _ + 1) { col =>
            result.set(col, row, f(col, row, tile.get(col, row)))
        }
    }
    result
}

In [26]:
time("Function1") { 
    map(tile) { z => z + 1 }
}

time("Function3") { 
    map2(tile) { (col, row, z) => col * row * z + 1 }
}

time("Mapper Trait") { 
    map3(tile)(new Mapper { def apply(col: Int, row: Int, z: Int): Int = col * row * z + 1 })
}

[Function1] Took: 23 ms
[Function3] Took: 101 ms
[Mapper Trait] Took: 21 ms


The Mapper trait performs similarly to the `Function1` case, because it avoid boxing primatives. However, the client call to that method not pretty.

GeoTrellis deals with this exact case in a way that solves both the performance and API problems. Using macros, GeoTrellis enables client code to avoid the performance hit of boxing, and have the function calls be inutitive. The `map` functions we built up above exist as methods on the `Tile` type:

In [29]:
time("Tile map with Function1") {
    tile.map { z => z + 1 }
}

time("Tile map with Function3") {
    tile.map { (col, row, z) => col * row * z + 1 }
}

[Tile map with Function1] Took: 21 ms
[Tile map with Function3] Took: 21 ms


To see how we pull this off, let's look at the code.

In [macros/src/main/scala/geotrellis/macros/TileMacros.scala](https://github.com/locationtech/geotrellis/blob/v1.0.0/macros/src/main/scala/geotrellis/macros/TileMacros.scala#L12-L19), we define a trait that declares necessary methods:
```scala
trait MacroMappableTile[T] {
  def mapIntMapper(mapper: IntTileMapper): T
  def mapDoubleMapper(mapper: DoubleTileMapper): T
}
```

and also define a trait that is similar to the `Mapper` trait above:

```scala
trait IntTileMapper {
  def apply(col: Int, row: Int, z: Int): Int
}
```

We then define a macro which inlines the Function3 into an implementation of the apply method of an anonymous instance of `IntTileMapper` ([source](https://github.com/locationtech/geotrellis/blob/v1.0.0/macros/src/main/scala/geotrellis/macros/TileMacros.scala#L34-L39)):

```scala
object TileMacros {
  def intMap_impl[T <: MacroMappableTile[T]](c: Context)(f: c.Expr[(Int, Int, Int) => Int]): c.Expr[T] = {
    import c.universe._
    val self = c.Expr[MacroMappableTile[T]](c.prefix.tree)
    val tree = q"""$self.mapIntMapper(new geotrellis.macros.IntTileMapper { def apply(col: Int, row: Int, z: Int): Int = $f(col, row, z) })"""
    new InlineUtil[c.type](c).inlineAndReset[T](tree)
  }
  
  // ...
}
```

This uses some utility functionality from the `spire` project that they lean on to create similar inlining macros.

The macro definition that uses this macro implementation can be found in [raster/src/main/scala/geotrellis/raster/MappableTile.scala](https://github.com/locationtech/geotrellis/blob/v1.0.0/raster/src/main/scala/geotrellis/raster/MappableTile.scala)

```scala

trait MappableTile[T <: MappableTile[T]] extends MacroMappableTile[T] {

  /**
    * Map over the tiles using a function which accepts the column,
    * row, and value at that position and returns an integer.
    */
  def map(f: (Int, Int, Int) => Int): T =
    macro TileMacros.intMap_impl[T]

  // ... 
}
```

which is a trait implemented by our `Tile` type. Because the tile type implements MacroMappableTile, it must provide an implementation of the trait-based map call, found in subtype implementation so `Tile` such as in [raster/src/main/scala/geotrellis/raster/ArrayTile.scala](https://github.com/locationtech/geotrellis/blob/v1.0.0/raster/src/main/scala/geotrellis/raster/ArrayTile.scala#L165-L173)

```scala
trait ArrayTile extends Tile with Serializable {
  // ...
  
  def mapIntMapper(mapper: IntTileMapper): Tile = {
    val tile = ArrayTile.alloc(cellType, cols, rows)
    cfor(0)(_ < rows, _ + 1) { row =>
      cfor(0)(_ < cols, _ + 1) { col =>
        tile.set(col, row, mapper(col, row, get(col, row)))
      }
    }
    tile
  }
  
  // ...
}
```


# Organization

### The Probem

You have core types, and a whole slew of functionality defined in which the core types are the subject. 
In GeoTrellis's instance, we have a `Tile` type, and we can perform a wide array of operations _on_ a tile.
There's map algebra operations, statistics, polygonal summary methods, etc. 

### Some solutions

#### Just throw 'em in

We could dump all of this functionality into the core types, as methods of those types. 
But that would lead to awfully bloated classes and code that was hard to navigate.

#### Organize by object

You could define you functionality as methods on objects, that take the core type as the first parameter.
For instance, you can define some local operations on Tiles as:



In [30]:
import geotrellis.raster.Tile 

object LocalOperations {
    def add(tile1: Tile, tile2: Tile): Tile = ???
    def add(tile: Tile, c: Int): Tile = ???
    def subtract(tile1: Tile, tile2: Tile): Tile = ???
    def subtract(tile: Tile, c: Int): Tile = ???
    def multiply(tile1: Tile, tile2: Tile): Tile = ???
    def multiply(tile: Tile, c: Int): Tile = ???
    def divide(tile1: Tile, tile2: Tile): Tile = ???
    def divide(tile: Tile, c: Int): Tile = ???
}

def foo: Tile = {
    val tile1: Tile = ???
    val tile2: Tile = ???

    import LocalOperations._

    divide(subtract(tile1, tile2), add(tile1, tile2))
}

Unit

object scala.Unit

That's not very pretty. What we really want to do is to allow the addition of 
operators on our core types, but not really add them. Implicit classes to the rescue.

In [31]:
trait Tile2 extends geotrellis.raster.Tile

object LocalOperations {
    def add(tile1: Tile2, tile2: Tile2): Tile2 = ???
    def add(tile: Tile2, c: Int): Tile2 = ???
    def subtract(tile1: Tile2, tile2: Tile2): Tile2 = ???
    def subtract(tile: Tile2, c: Int): Tile2 = ???
    def multiply(tile1: Tile2, tile2: Tile2): Tile2 = ???
    def multiply(tile: Tile2, c: Int): Tile2 = ???
    def divide(tile1: Tile2, tile2: Tile2): Tile2 = ???
    def divide(tile: Tile2, c: Int): Tile2 = ???
}


implicit class withLocalOperationMethods(val self: Tile2) {
    import LocalOperations._
    
    def +(other: Tile2): Tile2 = add(self, other)
    def +(c: Int): Tile2 = add(self, c)
    def -(other: Tile2): Tile2 = subtract(self, other)
    def -(c: Int): Tile2 = subtract(self, c)
    def *(other: Tile2): Tile2 = multiply(self, other)
    def *(c: Int): Tile2 = multiply(self, c)
    def /(other: Tile2): Tile2 = divide(self, other)
    def /(c: Int): Tile2 = divide(self, c)
}

def foo: Tile2 = {
    val tile1: Tile2 = ???
    val tile2: Tile2 = ???

   (tile1 - tile2) /  (tile1 + tile2)
}

Unit

object scala.Unit

If there are many more local operations (and in reality, there are), 
we might want to organize things even further, so that `LocalOperations` extends traits that
hold common functionality.

In [40]:
trait Tile2 extends geotrellis.raster.Tile

object LocalAdd {
    def apply(tile1: Tile2, tile2: Tile2): Tile2 = ???
    def apply(tile: Tile2, c: Int): Tile2 = ???
}

trait LocalAddMethods {
    def self: Tile2
    
    def +(other: Tile2): Tile2 = LocalAdd(self, other)
    def +(c: Int): Tile2 = LocalAdd(self, c)
}

object LocalSubtract {
    def apply(tile1: Tile2, tile2: Tile2): Tile2 = ???
    def apply(tile: Tile2, c: Int): Tile2 = ???
}

trait LocalSubtractMethods {
    def self: Tile2
    
    def -(other: Tile2): Tile2 = LocalSubtract(self, other)
    def -(c: Int): Tile2 = LocalSubtract(self, c)   
}

object LocalMultiply {
    def apply(tile1: Tile2, tile2: Tile2): Tile2 = ???
    def apply(tile: Tile2, c: Int): Tile2 = ???
}

trait LocalMultiplyMethods {
    def self: Tile2

    def *(other: Tile2): Tile2 = LocalMultiply(self, other)
    def *(c: Int): Tile2 = LocalMultiply(self, c)
}

object LocalDivide {
    def apply(tile1: Tile2, tile2: Tile2): Tile2 = ???
    def apply(tile: Tile2, c: Int): Tile2 = ???
}

trait LocalDivideMethods {
    def self: Tile2

    def /(other: Tile2): Tile2 = LocalDivide(self, other)
    def /(c: Int): Tile2 = LocalDivide(self, c)
}

object LocalOperations {
    implicit class withLocalOperationMethods(val self: Tile2)
        extends LocalAddMethods
        with LocalSubtractMethods
        with LocalMultiplyMethods
        with LocalDivideMethods
}

def foo: Tile2 = {
    import LocalOperations._

    val tile1: Tile2 = ???
    val tile2: Tile2 = ???

   (tile1 - tile2) /  (tile1 + tile2)
}

Unit

object scala.Unit

The ability to stack the traits like that onto `withLocalOperationMethods` is dependent on all of the traits naming the target type `self`. It would be nice to codify that, and that's what we've done with `MethodExtensions`:

```scala
/**
  * The base-trait from which all implicit classes containing
  * extension methods are derived.
  */
trait MethodExtensions[+T] extends Serializable {
  def self: T
}
```
[Source](https://github.com/locationtech/geotrellis/blob/v1.0.0/util/src/main/scala/geotrellis/util/MethodExtensions.scala)

so our code can now implement that trait to codify the pattern:

In [None]:
import geotrellis.util.MethodExtensions

object LocalOperations {
    object LocalAdd {
        def apply(tile1: Tile2, tile2: Tile2): Tile2 = ???
        def apply(tile: Tile2, c: Int): Tile2 = ???
    }

    trait LocalAddMethods extends MethodExtensions[Tile2] {
        def +(other: Tile2): Tile2 = LocalAdd(self, other)
        def +(c: Int): Tile2 = LocalAdd(self, c)
    }

    object LocalSubtract {
        def apply(tile1: Tile2, tile2: Tile2): Tile2 = ???
        def apply(tile: Tile2, c: Int): Tile2 = ???
    }

    trait LocalSubtractMethods extends MethodExtensions[Tile2] {   
        def -(other: Tile2): Tile2 = LocalSubtract(self, other)
        def -(c: Int): Tile2 = LocalSubtract(self, c)   
    }

    object LocalMultiply {
        def apply(tile1: Tile2, tile2: Tile2): Tile2 = ???
        def apply(tile: Tile2, c: Int): Tile2 = ???
    }

    trait LocalMultiplyMethods extends MethodExtensions[Tile2] {
        def *(other: Tile2): Tile2 = LocalMultiply(self, other)
        def *(c: Int): Tile2 = LocalMultiply(self, c)
    }

    object LocalDivide {
        def apply(tile1: Tile2, tile2: Tile2): Tile2 = ???
        def apply(tile: Tile2, c: Int): Tile2 = ???
    }

    trait LocalDivideMethods extends MethodExtensions[Tile2] {
        def /(other: Tile2): Tile2 = LocalDivide(self, other)
        def /(c: Int): Tile2 = LocalDivide(self, c)
    }

    implicit class withLocalOperationMethods(val self: Tile2) extends MethodExtensions[Tile2]
        with LocalAddMethods
        with LocalSubtractMethods
        with LocalMultiplyMethods
        with LocalDivideMethods
}

def foo: Tile2 = {
    import LocalOperations._

    val tile1: Tile2 = ???
    val tile2: Tile2 = ???

   (tile1 - tile2) /  (tile1 + tile2)
}

Unit

This is the `MethodExtensions` pattern. Very simple, but very effective for code organization if combined with some other general rules of thumb.

We use this in our actual Local Operations, as can be seen here:
- The functionality in [raster/src/main/scala/geotrellis/raster/mapalgebra/local/Add.scala](https://github.com/locationtech/geotrellis/blob/a00c35b928e96083188d91734252d66574e3b4a7/raster/src/main/scala/geotrellis/raster/mapalgebra/local/Add.scala)...
- is combined with other local map algebra operations in [raster/src/main/scala/geotrellis/raster/mapalgebra/local/LocalMethods.scala](https://github.com/locationtech/geotrellis/blob/a00c35b928e96083188d91734252d66574e3b4a7/raster/src/main/scala/geotrellis/raster/mapalgebra/local/LocalMethods.scala#L24)...
- which gets exposed through this implicit in [raster/src/main/scala/geotrellis/raster/package.scala](https://github.com/locationtech/geotrellis/blob/a00c35b928e96083188d91734252d66574e3b4a7/raster/src/main/scala/geotrellis/raster/package.scala#L53-L78):

```scala
package object raster {
  // ...
  
  implicit class withTileMethods(val self: Tile) extends MethodExtensions[Tile]
      with DelayedConversionTileMethods
      with costdistance.CostDistanceMethods
      with crop.SinglebandTileCropMethods
      with equalization.SinglebandEqualizationMethods
      with hydrology.HydrologyMethods
      with mapalgebra.focal.FocalMethods
      with mapalgebra.focal.hillshade.HillshadeMethods
      with mapalgebra.local.LocalMethods
      with mapalgebra.zonal.ZonalMethods
      with mask.SinglebandTileMaskMethods
      with matching.SinglebandMatchingMethods
      with merge.SinglebandTileMergeMethods
      with prototype.SinglebandTilePrototypeMethods
      with regiongroup.RegionGroupMethods
      with render.ColorMethods
      with render.JpgRenderMethods
      with render.PngRenderMethods
      with reproject.SinglebandTileReprojectMethods
      with resample.SinglebandTileResampleMethods
      with sigmoidal.SinglebandSigmoidalMethods
      with split.SinglebandTileSplitMethods
      with summary.polygonal.PolygonalSummaryMethods
      with summary.SinglebandTileSummaryMethods
      with vectorize.VectorizeMethods
      with viewshed.ViewshedMethods
      
    // ...
}
```

`Tile` is a very core type, and has a ton of functionality built on top of it. 
A more general pattern that is the current way to add any functionality can be found elsewhere. 
The `geotrellis-spark` subpackage gives good examples of this:

- The functionality defined in [spark/src/main/scala/geotrellis/spark/mask/Mask.scala](https://github.com/locationtech/geotrellis/blob/v1.0.0/spark/src/main/scala/geotrellis/spark/mask/Mask.scala)...
- Is exposed as extension methods in [spark/src/main/scala/geotrellis/spark/mask/TileRDDMaskMethods.scala](https://github.com/locationtech/geotrellis/blob/v1.0.0/spark/src/main/scala/geotrellis/spark/mask/TileRDDMaskMethods.scala)...
- Which is implemented by the implicit class in [spark/src/main/scala/geotrellis/spark/mask/Implicits.scala](https://github.com/locationtech/geotrellis/blob/v1.0.0/spark/src/main/scala/geotrellis/spark/mask/Implicits.scala)
- Which can be imported via `import geotrellis.spark.mask.Implicits._`, or simply by `geotrellis.spark._`, because the main package object extends the `Implicits` trait as seen in [spark/src/main/scala/geotrellis/spark/package.scala](https://github.com/locationtech/geotrellis/blob/v1.0.0/spark/src/main/scala/geotrellis/spark/package.scala#L51) 

### How to deal with options, a.k.a Mixing  overloads and default parameters in Scala

### The problem

There are some situations that arise where you want default arguments in object `apply` methods, but you also want to overload the `apply`. For instance, in GeoTrellis, we have an `S3LayerWriter` which allows you to write an RDD of rasters out of Amazon's S3 storage backend. In order to operate, it needs an AttributeStore, which is the type responsible for reading and writing metadata. A simplified (not real) signature of the attribute store looks like

```scala
case class AttributeStore(bucket: String, prefix: String)
```

An `S3LayerWriter` also takes some options, like whether or not to clobber an existing layer with that id, or if the key index will be one to one with the elements of the RDD. A simplified version of an `S3LayerWriter` might be:

```scala
class S3LayerWriter(attributeStore: AttributeStore, clobber: Boolean, oneToOne: Boolean) {
  def write(): Unit = println(s"S3LayerWriter $attributeStore $clobber $oneToOne")
}
```

And the companion object's apply method might look like this:

```scala
object S3LayerWriter {
  def apply(attributeStore: AttributeStore, clobber: Boolean, oneToOne: Boolean): S3LayerWriter =
    new S3LayerWriter(attributeStore, clobber, oneToOne)
}
```

However, there's some sane defaults to `clobber` and `oneToOne`, so let's put those in:

```scala
object S3LayerWriter {
  def apply(
    attributeStore: AttributeStore, 
    clobber: Boolean = true, 
    oneToOne: Boolean = false
  ): S3LayerWriter =
    new S3LayerWriter(attributeStore, clobber, oneToOne)
}
```

In developing the API, you may find it useful to have the `S3LayerWriter` construct the attribute store itself, just passing in the `bucket` and `prefix` parameters. To do that, you might want to overload `apply` like this:

```scala
object S3LayerWriter {
  def apply(
    attributeStore: AttributeStore, 
    clobber: Boolean = true, 
    oneToOne: Boolean = false
  ): S3LayerWriter =
    new S3LayerWriter(attributeStore, clobber, oneToOne)

  def apply(
    bucket: String, 
    prefix: String, 
    clobber: Boolean = true, 
    oneToOne: Boolean = false
  ): S3LayerWriter =
    apply(AttributeStore(bucket, prefix), clobber, oneToOne)
}
```

Seems like a simple enough solution. However, this is not allowed.

### Overloads and Default Parameters

Your compiler might say that the above code is ok. I have had similar code compile, that is until I try to package the code into a JAR, at which point you'll get the compiler message

```console
Multiple overloaded alternatives of method define default arguments
```

It turns out [you just can't do it](http://stackoverflow.com/questions/4652095/why-does-the-scala-compiler-disallow-overloaded-methods-with-default-arguments). It's baked into the language that you can't have overloaded methods that specify default arguments.

### What to do?

You could code _a lot_ of overloads:

```scala
object S3LayerWriter {
  def apply(attributeStore: AttributeStore, clobber: Boolean, oneToOne: Boolean): S3LayerWriter =
    new S3LayerWriter(attributeStore, clobber, oneToOne)

  def apply(attributeStore: AttributeStore, clobber: Boolean): S3LayerWriter =
    apply(attributeStore, true, false)

  def apply(attributeStore: AttributeStore): S3LayerWriter =
    apply(attributeStore, true)

  def apply(bucket: String, prefix: String, clobber: Boolean, oneToOne: Boolean): S3LayerWriter =
    apply(AttributeStore(bucket, prefix), clobber, oneToOne)

  def apply(bucket: String, prefix: String, clobber: Boolean): S3LayerWriter =
    apply(AttributeStore(bucket, prefix), clobber, false)

  def apply(bucket: String, prefix: String): S3LayerWriter =
    apply(AttributeStore(bucket, prefix), true, false)
}
```

We exploded out to 6 methods, and this doesn't even cover all the permutations of possible parameters.

We could create an Options class, with a default:

```scala
object S3LayerWriter {
  case class Options(clobber: Boolean = true, oneToOne: Boolean = false)
  object Options {
    def DEFAULT = Options()
  }

  def apply(attributeStore: AttributeStore, options: Options): S3LayerWriter =
    new S3LayerWriter(attributeStore, clobber, oneToOne)

  def apply(attributeStore: AttributeStore): S3LayerWriter =
    apply(attributeStore, Options.DEFAULT)

  def apply(bucket: String, prefix: String, options: Options): S3LayerWriter =
    apply(AttributeStore(bucket, prefix), options)

  def apply(bucket: String, prefix: String, clobber: Boolean): S3LayerWriter =
    apply(bucket, prefix, Options.DEFAULT)
}
```

And that is what GeoTrellis does to solve that problem, as can be seen in [spark/src/main/scala/geotrellis/spark/mask/Mask.scala](https://github.com/locationtech/geotrellis/blob/v1.0.0/spark/src/main/scala/geotrellis/spark/mask/Mask.scala#L39-L47)

# Type constraints

The patterns of __Type Classes__ and usage of __Context Bounds__ allow for the development of generic functionality that requires specific capabilities from it's generic types without requiring those generic types to be part of the same type hierarchy.


### Context Bounds refresher

Context bounds use syntatic sugar to make requiring type classes on type parameters look nice.
For instance, if we want to have a sort function with a type parameter `T` that requires an `Ordering`, we can pass it in as an implicit parameter:

In [32]:
def sort[T](x: Seq[T])(implicit ord: Ordering[T]): Seq[T] =
    x.sorted

This version takes the type class as a _Context Bound_, which is stated in the type parameters.

In [34]:
def sort2[T: Ordering](x: Seq[T]): Seq[T] =
    x.sorted

These two methods compile down to pretty much the exact same thing.

In [35]:
case class Foo(x: Int)

implicit def ord: Ordering[Foo] = Ordering.by(_.x)

val foos = List(3, 1, 2).map(Foo.apply)
sort2(sort(foos) ++ foos)

List(Foo(1), Foo(1), Foo(2), Foo(2), Foo(3), Foo(3))

The usage of Context Bounds can clean up code. For instance, take a look at this method from 
[spark/src/main/scala/geotrellis/spark/io/LayerReader.scala](https://github.com/locationtech/geotrellis/blob/v1.0.0/spark/src/main/scala/geotrellis/spark/io/LayerReader.scala):

```scala
trait LayerReader[ID] {
    // ...
    
    def read[
        K: AvroRecordCodec: Boundable: JsonFormat: ClassTag,
        V: AvroRecordCodec: ClassTag,
        M: JsonFormat: GetComponent[?, Bounds[K]]
      ](id: ID, numPartitions: Int): RDD[(K, V)] with Metadata[M]
      
    // ...
}
```

If we were to write this out without context bounds, it would look something like this:
```scala
trait LayerReader[ID] {
    // ...
    
    def read[K, V, M](id: ID, numPartitions: Int)(
      implicit ev1: AvroRecordCodec[K], ev2: Boundable[K], ev3: JsFormat[K], ev4: ClassTag[K],
      ev5: AvroRecordCodec[V], ev6: ClassTag[V],
      ev7: JsonFormat[M], ev8: GetComponent[M, Bounds[K]]
    ): RDD[(K, V)] with Metadata[M]
      
    // ...
}
```

To me, the Context Bound method is much more readable.

### Type Lambdas and Kind Projector

You may notice a strange `GetComponent[?, Bounds[K]]` in the above example of a method with Context bounds.
This code is actually not valid scala code, and requires the [Kind Projector](https://github.com/non/kind-projector)
compiler plugin to compile. There is a way to write this without the compiler, and we'll see what that 
looks like. First, look at what a `GetComponent` is.

[util/src/main/scala/geotrellis/util/GetComponent.scala](https://github.com/locationtech/geotrellis/blob/v1.0.0/util/src/main/scala/geotrellis/util/GetComponent.scala)
```scala
trait GetComponent[T, C] extends Serializable {
  def get: T => C
}

object GetComponent {
  def apply[T, C](_get: T => C): GetComponent[T, C] =
    new GetComponent[T, C] {
      val get = _get
    }
}
```
A `GetComponent` allows us to define a type class that acts as a partial `Lens`, which is a well known pattern that is implemented for Scala by libraries like [Monocle](https://julien-truffaut.github.io/Monocle/). It allows us to get some value of type `C` out of a type `T`.

We can use the `GetComponent` as a type class to require that a generic type has to be able to provide some other type. In the instance above, we are guaranteeing that the metadata can produce some `Bounds[K]`. A simplified version of a function that might act like our example is as follows:

In [36]:
import geotrellis.util.GetComponent
import geotrellis.spark.Bounds

def foo[M, K](metadata: M)(implicit gc: GetComponent[M, Bounds[K]]): Bounds[K] = 
    gc.get(metadata)

The problem with using `GetComponent` as a Context Bounds is that a context bound must only have one type parameter, or one "hole" that is filled with the type it is bounding. `GetComponent` has two type parameters. However we really only want to use one of the parameters: the first one. One way you accomplish this is to create a type alias; however since `K` is also used in the `Bounds[K]` that would be required for the type alias, this will not work.

The solution comes with __type lambdas__. Type lambdas allow you to create anonymous structural types.
And without kind projector, the solution looks like this:

In [37]:
def foo[M: ({ type B[X] = GetComponent[X, Bounds[K]] })#B, K](metadata: M): Bounds[K] = 
    implicitly[GetComponent[M, Bounds[K]]].get(metadata)

This is quite atrocious looking. Luckily, Kind Projector comes to the rescue. 
It provides a way to state type lambdas in a way that resembles the use of the underscore 
in anonymous function, like `(_ + 1)`. Instead of an underscore, it uses a question mark:

```scala
def foo[M: GetComponent[?, Bounds[K]], K](metadata: M): Bounds[K] = 
    implicitly[GetComponent[M, Bounds[K]]].get(metadata)
```

(This will not compile in Toree, as it doesn't compile us Kind Projector)

### Guiding Principle: Never ask for more than you need

#### or, "Don't over-concretize" - Michael Pilquist

The usage of typeclasses and context bounds has mechanisms that can be understood and utilized. 
But the utility of these tools are restricted by the purpose behind how they are used.

A guiding principle that I try to stick to, and which is enabled by the use of type classes and context bounds, is the principle to never ask more of the generic types then you actually need. To take a look at our IO example again:

```scala
trait LayerReader[ID] {
    // ...
    
    def read[
        K: AvroRecordCodec: Boundable: JsonFormat: ClassTag,
        V: AvroRecordCodec: ClassTag,
        M: JsonFormat: GetComponent[?, Bounds[K]]
      ](id: ID, numPartitions: Int): RDD[(K, V)] with Metadata[M]
      
    // ...
}
```

It seems like we are asking a lot here. But we are in fact asking the bare minimum of our types that enable us to do the thing the client code is asking us to do. Every type class is required, and no overt restrictions are put onto the types. This turns into wins down the road when you want to use the functionality you are writing against types that you did not consider at the time you wrote the code. For instance, we were able to save RDDs of vector tiles and point clouds to all of our supported backends quite easily because the layer IO was architected in a way that required only certain typeclasses to be created and provided in the implicit scope.

### Bringing it all together: Method Extensions and Context Bounds

![metaism comment](http://i.imgur.com/lDcKvYv.png)

We can use type lambdas on context bounds to require that our types have method extensions which give it certain capabilities.

For instance, in [spark/src/main/scala/geotrellis/spark/split/Split.scala](https://github.com/locationtech/geotrellis/blob/v1.0.0/spark/src/main/scala/geotrellis/spark/split/Split.scala):

```scala
object Split {
  /** Splits an RDD of tiles into tiles of size (tileCols x tileRows), and updates the ProjectedExtent component of the keys.
    */
  def apply[K: Component[?, ProjectedExtent], V <: CellGrid: (? => SplitMethods[V])](
      rdd: RDD[(K, V)], tileCols: Int, tileRows: Int
  ): RDD[(K, V)] =
    rdd
      .flatMap { case (key, tile) =>
        val splitLayout =
          TileLayout(
            math.ceil(tile.cols / tileCols.toDouble).toInt,
            math.ceil(tile.rows / tileRows.toDouble).toInt,
            tileCols,
            tileRows
          )

        if(!splitLayout.isTiled) {
          Array((key, tile))
        } else {
          val ProjectedExtent(extent, crs) = key.getComponent[ProjectedExtent]
          Raster(tile, extent).split(splitLayout, Options(extend = false, cropped = false))
            .map { raster => (key.setComponent(ProjectedExtent(raster.extent, crs)), raster.tile) }
        }
      }
}
```

The `(? => SplitMethods[V])` type lambda context bound means that given the `V`, we will have an implicit conversion from `V` to `SplitMethods[V]`. We are able to then call `split` on `Raster[V]`, since the split on a `Raster[T]` is defined for any `T` that has split methods, as seen in [raster/src/main/scala/geotrellis/raster/split/RasterSplitMethods.scala](https://github.com/locationtech/geotrellis/blob/v1.0.0/raster/src/main/scala/geotrellis/raster/split/RasterSplitMethods.scala):

```scala
abstract class RasterSplitMethods[T <: CellGrid: (? => SplitMethods[T])] extends SplitMethods[Raster[T]] {
  def split(tileLayout: TileLayout, options: Options): Array[Raster[T]] =
    self.rasterExtent.split(tileLayout, options)
      .zip(self.tile.split(tileLayout, options))
      .map { case (re, tile) => Raster(tile, re.extent) }
}
```

Notice that you must us an `abstract class` instead of a `trait` if context bounds are required.

Because we wrote the functionality not on any concrete class, but for anything which you could create a `SplitMethods` with, we were able to give our users the flexibility to use our functionality by creating only the requisite type classes and MethodExtensions.