New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break Map #1760

Merged
merged 14 commits into from Nov 4, 2016

Conversation

Projects
4 participants
@fosskers
Contributor

fosskers commented Oct 31, 2016

TODO

  • @specialized BreakMap class
  • Decide whether or not to use subclasses
  • Reconcile Tile.mapWith API
  • (optional) Make ColorMap use BreakMap
  • Benchmarks (ongoing)
  • Docstrings for BreakMap usage with map
  • Revert changes to the Tile API
  • Fix tests

Motivation

Addresses (but does not close) #1509 .

fosskers added some commits Oct 31, 2016

(ValueMap) `ValueMap` first pass
- We specialize on an abstract class to avoid duplicate code.
The choice of abstract class over trait is supposed to reduce
compile time.
(BreakMap) Subclasses may or may not prove useful
- Using subclasses like this, we can ask for a `BreakMap[Int,
Int]` in `Tile`, keep a clean implementation of `BreakMap`
(and the binary search logic), and only take a ~3%
performance hit. This 3% might be due to `Tile.map` or
because of the inheritance, I'm not sure yet.

- Implementing a sexier `foreach` which allows us more
fine-grained access when writing the target array might
overcome that 3%.
@fosskers

This comment has been minimized.

Contributor

fosskers commented Oct 31, 2016

So far, we are able to keep a @specialized BreakMap implementation/usage and only pay ~3% in speed. Recall the ~50% increase gained by BTree in the first place.

I only barely dipped into spire for this to work: Spire's Order typeclass is necessary to write the Binary Search branch predicate generically. Order provides generic comparison operators (< etc), which in vanilla Scala are built-in to each type and not typeclass-bound.

Otherwise, the rest of the generic coding is vanilla @specialized use.

@fosskers

This comment has been minimized.

Contributor

fosskers commented Oct 31, 2016

Current thoughts: instead of trying to force some sort of inheritance structure between the monomorphic ColorMap and polymorphic BreakMap, ColorMaps should just use a BreakMap internally to reduce the duplicate code currently sitting in IntColorMap and DoubleColorMap.

@fosskers

This comment has been minimized.

Contributor

fosskers commented Nov 1, 2016

Naming discussion:

The point of this data structure is to provide a mechanism for mapping over a Tile, where instead of passing a function, we pass what amounts to a Map[A, B] specialized for number types. Using class-break boundary conditions, we leverage an internal BTree to quickly find what bin the source value should be mapped to. This generalizes the current behaviour of ColorMap and Tile.color.

Question: What should this data structure be called?

Initially we had ValueMap, but thinking that didn't have enough semantic clarity I switched it to BreakMap.

@echeipesh

This comment has been minimized.

Contributor

echeipesh commented Nov 1, 2016

I think the GIS term for this is reclassify, which I really like. How about ValueReclassify or just Reclassify ?

You'd get signatures like Reclassify[Int, Int].

EDIT: But BreakMap is better than ValueMap because at least it has more information about what is going on.

@fosskers

This comment has been minimized.

Contributor

fosskers commented Nov 1, 2016

The use-case is:

val t: Tile = ...
val m: Reclassify = ...

Tile.mapWith(m)

Does the signature Tile.mapWith: Reclassify => Tile feel good semantically?

@lossyrob

This comment has been minimized.

Member

lossyrob commented Nov 1, 2016

I think Reclassify is a general term for what is going on, but also that could be said for our map.

+1 BreakMap

I'm into the method being named as an overload of map:

val t: Tile = ???
val m: BreakMap = ???

t.map(m)

@lossyrob lossyrob added this to the 1.0 milestone Nov 1, 2016

(BreakMap) Don't use subclasses
- Given a well-defined `MapStrategy`, we can use a
non-abstract `BreakMap` and avoid API proliferation from
unnecessary subclasses.
@echeipesh

This comment has been minimized.

Contributor

echeipesh commented Nov 1, 2016

Yes, Tile.mapWith: Reclassify => Tile does not jive, +1 BreakMap based on that.

@fosskers

This comment has been minimized.

Contributor

fosskers commented Nov 1, 2016

overload of map

I'm not the biggest fan of overloading, I think it increases complexity / makes it harder to grock APIs. An example from my personal past being Matlab work, where every function is so heavily overloaded that typos/oversights often still run and produce debugging misery.

fosskers added some commits Nov 1, 2016

@lossyrob

This comment has been minimized.

Member

lossyrob commented Nov 1, 2016

My thought on overloading map is

  • we already are doing it for map(f: Int => Int) and map(f: (Int, Int, Int) => Int)
  • it's pretty clear what the context of the map is based on the object you're passing in
  • I don't see a method name that would better explain what we're doing (mapWith doesn't help me understand what is going on, just seems like differentiating for the sake of differentiating)

What do other people think?

@fosskers

This comment has been minimized.

Contributor

fosskers commented Nov 1, 2016

Updated benchmark results with ColorMap backed by BreakMap:

 1280x1280 pixel Tile, 50 Breaks, 5 trials each, same JVM

 `Tile.color` ORIGINAL LINEAR
 ----------------------------
 Rendering => 176.53 ms
 Rendering => 180.56 ms
 Rendering => 170.98 ms
 Rendering => 170.94 ms
 Rendering => 170.47 ms

 Mean: 173.89 ms

 `Tile.color` WITH BTREE
 -----------------------
 Rendering => 120.25 ms
 Rendering => 117.32 ms
 Rendering => 121.19 ms
 Rendering => 117.35 ms
 Rendering => 116.86 ms

 Mean: 118.6 ms

 Speedup: ~47%

 (The above two are old results. The `WITH BTREE` test clocked a few ms slower today.)

 `Tile.color` BACKED BY `BREAKMAP`
 ---------------------------------
 Rendering => 133.06 ms
 Rendering => 123.85 ms
 Rendering => 120.52 ms
 Rendering => 122.59 ms
 Rendering => 119.76 ms

 Mean: 123.96 ms

 Slowdown: ~3.5%

 `Tile.mapWith` (uses `Tile.map`)
 --------------------------------
 Rendering => 128.96 ms
 Rendering => 126.02 ms
 Rendering => 125.55 ms
 Rendering => 123.38 ms
 Rendering => 121.14 ms

 Mean: 125.01 ms
@echeipesh

This comment has been minimized.

Contributor

echeipesh commented Nov 1, 2016

Re: tile.mapWith

Conceptually BreakMap is a function, in fact if BreakMap.map was renamed to BreakMap.apply it should be able to extend Function1 without losing specialization. At that point using tile.map seems pretty natural.

@fosskers

This comment has been minimized.

Contributor

fosskers commented Nov 1, 2016

Would target cell types be set properly? If so that's genius.

@fosskers

This comment has been minimized.

Contributor

fosskers commented Nov 1, 2016

Clarification @echeipesh : do you mean if BreakMap extended Function, we'd be able to use the preexisting Tile.map?

@lossyrob

This comment has been minimized.

Member

lossyrob commented Nov 1, 2016

We wouldn't be able to, due to [A, B] stuff in the map...

@fosskers

This comment has been minimized.

Contributor

fosskers commented Nov 1, 2016

Why not class BreakMap[... A, ... B] extends Function1[A, B]?

@lossyrob

This comment has been minimized.

Member

lossyrob commented Nov 1, 2016

You could do that, just rename the method map to apply. Doesn't solve the overloads in Tile question though, meaning having to have

def map(m: BreakMap[Int, Int], targetCellType: CellType): Tile = ???
def map(m: BreakMap[Int, Double], targetCellType: CellType): Tile = ???
def map(m: BreakMap[Double, Int], targetCellType: CellType): Tile = ???
def map(m: BreakMap[Double, Double], targetCellType: CellType): Tile = ???

(we talked about it a bit on Gitter in case you didn't notice the ping)

@fosskers

This comment has been minimized.

Contributor

fosskers commented Nov 1, 2016

Right, since Tile.map: Int => Int. Hmm...

@fosskers

This comment has been minimized.

Contributor

fosskers commented Nov 1, 2016

Our hands might be forced by the monomorphic Tile:

class BreakMap[... A, ... B] extends (A => B) { ... }

trait Tile extends ... {
  def mapd2i(f: Double => Int): Tile

  def mapi2d(f: Int => Double): Tile
}
@lossyrob

This comment has been minimized.

Member

lossyrob commented Nov 1, 2016

We don't need the separate names, since the behavior of the map function depends on the values of the type params.

If there's "same type after erasue" issues, just use dummy implicits:

def map(m: BreakMap[Int, Int], targetCellType: CellType): Tile = ???
def map(m: BreakMap[Int, Double], targetCellType: CellType)(implicit d: DummyImplicit): Tile = ???
def map(m: BreakMap[Double, Int], targetCellType: CellType)(implicit d: DummyImplicit, d2: DummyImplicit): Tile = ???
def map(m: BreakMap[Double, Double], targetCellType: CellType)(implicit d: DummyImplicit, d2: DummyImplicit, d3: DummyImplicit): Tile = ???
(BreakMap) `BreakMap` extends `A => B`
- This allows us raw use of `Tile.map` and `Tile.mapDouble`
for free.
@fosskers

This comment has been minimized.

Contributor

fosskers commented Nov 1, 2016

Exposed API:

trait Tile extends ... {
  def map(f: Int => Int): Tile  /* unchanged */

  def map(f: Int => Double, tct: CellType): Tile /* new */

  def map(f: Double => Int, tct: CellType)(implicit ev: DummyImplicit): Tile /* new */

  def mapDouble(f: Double => Double): Tile /* unchanged */
}
@fosskers

This comment has been minimized.

Contributor

fosskers commented Nov 1, 2016

Needs docstrings to explain the speedup you can get via BreakMap.

@fosskers fosskers referenced this pull request Nov 2, 2016

Closed

Add target CellType overload to Tile #1766

0 of 6 tasks complete
@lossyrob

This comment has been minimized.

Member

lossyrob commented Nov 2, 2016

Overloading map like that will confuse the compiler when you try to pass things in. My preference is to keep map and mapDouble the same (and avoid conflict with the other PR) and just create maps that specifically take BreakMap

fosskers added some commits Nov 2, 2016

(BreakMap) Clean up
- Made the `BreakMap` construction helper methods a bit more
user friendly.

- Cleaned out some unused imports.
val noDataValue: A,
val fallbackValue: A,
val strict: Boolean
)

This comment has been minimized.

@fosskers

fosskers Nov 2, 2016

Contributor

MapStrategy might deserve its own file, although it's strictly subordinate to BreakMap.

/** Helper methods for constructing a [[MapStrategy]]. */
object MapStrategy {
def int: MapStrategy[Int] = new MapStrategy(LessThanOrEqualTo, 0x00000000, 0x00000000, false)

This comment has been minimized.

@fosskers

fosskers Nov 2, 2016

Contributor

Is zero correct, or Int.MinValue? Zero is the old default from ColorMap.

Sorting.quickSort(a)
BTree.fromSortedSeq(a.toIndexedSeq).get

This comment has been minimized.

@fosskers

fosskers Nov 2, 2016

Contributor

quickSort needs an Array, but we could probably go faster if we avoided toIndexedSeq (makes it a Vector) and had another BTree constructor that took Array.

Within fromSortedSeq, we need the ability to slice and have O(1) access to elements.

This comment has been minimized.

@fosskers

fosskers Nov 2, 2016

Contributor

On the other hand, this might be negligible since ColorMap / BreakMap instantiation is usually a one-time cost.

options.strategy.strict
))
)
}

This comment has been minimized.

@fosskers

fosskers Nov 2, 2016

Contributor

These are ugly. It would be much cleaner with MapStrategy.copy, except that would require it to be a case class. Case classes generate more bytecode, but if their instantiation cost isn't terrible, it might be worth it to reduce code smell here.

@fosskers

This comment has been minimized.

Contributor

fosskers commented Nov 2, 2016

I'm also feeling like we should rip out Options in favour of using straight MapStrategy.

echeipesh and others added some commits Nov 2, 2016

Merge pull request #4 from echeipesh/feature/eac/value-map-options1
Preserve ColorMap.Options binary compatibility
object Options {
def DEFAULT = Options()
def DEFAULT: Options = new Options(LessThanOrEqualTo, 0x00000000, 0x00000000, false)

This comment has been minimized.

@lossyrob

lossyrob Nov 3, 2016

Member

Why do we restate the defaults here? We already state the defaults in the default param values in the case class. We shouldn't have them in two places, it makes it possible to get them out of sync.

implicit def classBoundaryTypeToOptions(classBoundaryType: ClassBoundaryType): Options =
Options(classBoundaryType)
implicit def classBoundaryTypeToOptions(cbt: ClassBoundaryType): Options = {
new Options(cbt, 0x00000000, 0x00000000, false)

This comment has been minimized.

@lossyrob

lossyrob Nov 3, 2016

Member

again, shouldn't restate defaults in multiple places

/** Helper methods for constructing BreakMaps. */
object BreakMap {
def i2i(m: Map[Int, Int], s: MapStrategy[Int] = MapStrategy.int): BreakMap[Int, Int] =

This comment has been minimized.

@lossyrob

lossyrob Nov 3, 2016

Member

Not into these shortened names - it's hard to grok if people come to this not knowing about our (very internal) functions that look like this.

Why not just use apply overloads? (implicit d: DummyImplicit) will probably be necessary here.

This comment has been minimized.

@fosskers

fosskers Nov 3, 2016

Contributor

The purpose of these functions is pretty clear from their type signatures, and I'd argue that having overloaded applys with DummyImplicit is actually harder to grok. Does the average Scala dev know about the type erasure problems, and that this is a hack to get around them?

This comment has been minimized.

@lossyrob

lossyrob Nov 3, 2016

Member

I think the average Scala dev knows that object applies create the thing from the passed in the params. The dummy implicits will be pretty invisible to him or her. So I would say the overloaded applies are more intuitive.

With applies, this just works:

val m: Map[Int, Int] = ???
val m2: Map[Double, Double] = ???
val m3: Map[Int, Double] = ???
val m4: Map[Double, Int] = ???

val breakMap = BreakMap(m)
val breakMap2 = BreakMap(m2)
val breakMap3 = BreakMap(m3)
val breakMap4 = BreakMap(m4)

Opinions, anyone?

This comment has been minimized.

@pomadchin

pomadchin Nov 3, 2016

Member

Basically I don't like c-styled function names (i2i, i2d, etc), but I remember that somewhere in our codebase we have smth similar (i believe it was related to function call performance);
So +1 for overloads using DummyImplicits, it's a usual Scala way to resolve such overloads. ¯\_(ツ)_/¯

At least we need one more opinion (I think).

This comment has been minimized.

@echeipesh

echeipesh Nov 4, 2016

Contributor

+1 to DummyImplicits, that gives the least surprising application.

This comment has been minimized.

@fosskers

fosskers Nov 4, 2016

Contributor

Although I'm opposed to overloads in general, after thinking about it, overloaded apply isn't that terrible so long as the return type matches the parent class, which it would here.

I'll switch to apply.

fosskers added some commits Nov 3, 2016

@lossyrob lossyrob merged commit aef6c13 into locationtech:master Nov 4, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment