RasterizeRDD for Geometry #2266

echeipesh · 2017-06-28T17:40:56Z

Resolves: #2228
Resolves: #2168

jbouffard · 2017-07-06T15:38:24Z

spark/src/main/scala/geotrellis/spark/rasterize/RasterizeRDD.scala

+  ): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = {
+    val intValue = implicitly[Numeric[T]].toInt(value)
+    val dblValue = implicitly[Numeric[T]].toDouble(value)
+    val options = Options(includePartial=true, sampleType=PixelIsArea)


Is there a certain reason why you defined options in this method when it's already being passed in as a parameter (line 44)?

Nope, thank you for the catch.

jbouffard · 2017-07-06T15:48:20Z

spark/src/main/scala/geotrellis/spark/rasterize/RasterizeRDD.scala

+import org.apache.spark.rdd._
+import scala.collection.immutable.VectorBuilder
+
+object RasterizeRDD {


Do you think it'd be worth it to have another overload method or two that doesn't require options?

def fromGeometry[G <: Geometry, T: Numeric]( geoms: RDD[G], layout: LayoutDefinition, ct: CellType, value: T ): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = { fromGeometry(geoms, layout, ct, value, new HashPartitioner(geoms.sparkContext.defaultParallelism), Options.DEFAULT) } def fromGeometry[G <: Geometry, T: Numeric]( geoms: RDD[G], layout: LayoutDefinition, ct: CellType, value: T, numPartitions: Int ): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = { fromGeometry(geoms, layout, ct, value, new HashPartitioner(numPartitions), Options.DEFAULT) }

Thats a good point. Thank you for taking the time to type them up.

echeipesh · 2017-07-07T18:09:13Z

RasterizeRDD Method Extensions
- Proposal

RasterizeRDD Method Extensions

When creating method extensions for `RasterizeRDD.fromGeometry` we have to consder oprtional arguments.

def fromGeometry[G <: Geometry, T: Numeric](
  geoms: RDD[G],
  value: T,
  layout: LayoutDefinition,
  ct: CellType,
  partitioner: Partitioner,
  options: Rasterizer.Options
): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = ???

There are two or three optional arguments:

def fromGeometry[G <: Geometry, T: Numeric](
  geoms: RDD[G],
  value: T,
  layout: LayoutDefinition,
  ct: CellType = IntConstantNoDataCellType,
  partitioner: Partitioner = new HasPartitioner(geoms.getNumPartitions),
  options: Rasterizer.Options= Rasterizer.Options.DEFAULT
): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = ???

An existing pattern is to create RasterizeRDD.Options as follows

object RasterizeRDD {
  case class Options(
    rasterizerOptions: rasterizerOptions = Rasterizer.Options.DEFAULT,
    partitioner: Option[Partitioner] = None
  )
  val DEFAULT = Options()
}

trait GeometryRDDRasterizeMethods[G <: Geometry] extends MethodExtensions[RDD[G]] {
  def rasterizeWithValue[T: Numeric](
    value: T,
    cellType: CellType
    options: RasterizeRDD.Options = RasterizeRDD.Options.DEFAULT
  ): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = ???
}

val geoms: RDD[Polygon] = ???

geoms.rasterizeWithValue(1, IntConstantNoDataCellType,
  RasterizeRDD.Options(
    partitioner = Some(new HashPartitioner(geoms.getNumPartitions)),
    rasterizerOptions = Rasterizer.Options(true, PixelIsArea)
  )
)

This is all well and good but it has the following two drawbacks:

Specifying the options for RDD of geoms gets very nested
Something somewhere has to match the None cases for Partitioner
Moving more common option of CellType makes the ugly call more frequent

The main problem is that the true defaults in this case depend on parameters. There is a way scala handles that case is with multiple parameter lists.

Proposal

object RasterizeRDD {
  case class Options(
    rasterizerOptions: rasterizerOptions = Rasterizer.Options.DEFAULT,
    partitioner: Partitioner
  )

}

trait GeometryRDDRasterizeMethods[G <: Geometry] extends MethodExtensions[RDD[G]] {
  def rasterizeWithValue[T: Numeric](
    value: T
  )(
    cellType: CellType = narrowestCellType(value),
    includePartial: Boolean = true,
    sampleType: PixelSampleType = PixelIsPoint,
    partitioner = new HashPartitioner(self.getNumPartitions),
  ): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = {
    val rasterizerOptions = Rasterizer.Options(includePartial, sampleType)
    val options = RasterizeRDD.Options(Some(rasterizerOptions), partitioner)
    ??? // make the call
  }
}

val geoms: RDD[Polygon] = ???

geoms.rasterizeWithValue(1)
geoms.rasterizeWithValue(4)(cellType = ByteCellType)
geoms.rasterizeWithValue(4)(
  cellType = ByteCellType,
  sampleType = PixelIsArea)

What we get for the trouble:

We flattened out nested Options object at the call site
The method call is now tab-completable
Preserve Option objects as composition pattern
Default velues are figured out at method definition site
We can clearly use input dependant defaults like CellType

echeipesh · 2017-07-07T18:46:50Z

This is what the scaladoc looks like for the method:

jbouffard · 2017-07-07T19:17:24Z

I think the proposal looks good. I never liked how nested defining options could become in GeoTrellis. Do you think it'd be worth having more overloads just in case someone already has Options defined?

echeipesh · 2017-07-07T20:32:18Z

Not sure that is possible with this pattern. You may only have one instance of a method name with default arguments, which means you couldn't define def rasterizeWithValue(value: Double, options: RasterizeRDD.Options = RasterizeRDD.Options.DEFAULT).

And it wouldn't make sense to put it into the same method signature because you'd have to choose if the rest parameters trump the contents of the options or not.

I suppose with this pattern if somebody needed to use the Options instance they would be required to use the RasterizeRDD.fromGeometry directly.

jbouffard · 2017-07-07T20:36:22Z

Ah, I wasn't aware of that. Then I think the current implementation is fine then. It might be worth noting somewhere in the docs that RasterizeRDD.fromGeometry should be used if the user has an Options instance ready to go.

aklink

tested with ShapeFileReader.readSimpleFeatures(...) using MultiPolygons, works fine for me

Edited:
I am using
val rasterizedRDD : RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = RasterizeRDD.fromGeometry( polyRdd, 1, IntConstantNoDataCellType, ld, Options.DEFAULT)

What is currently not working on my site:
val rasterizedRdd = polyRdd.rasterizeWithValue(1, IntConstantNoDataCellType, ld)

Error:(112, 33) value rasterizeWithValue is not a member of org.apache.spark.rdd.RDD[geotrellis.vector.MultiPolygon]
val rasterizedRdd = polyRdd.rasterizeWithValue(1, IntConstantNoDataCellType, ld)

Maybe some missing import statement on my site, or a wrong usage of types in combination with:
ShapeFileReader.readSimpleFeatures(shapefileName)

Also didn't manage it to pass different rasterization values (only one value per RDD allowed) if not all geometries in RDD are same "class" (if using raster value as object class, specified by a field/column in shapefile). But maybe this is not intended to be supported by RasterizeRDD. If all geometries have same value, then it is working fine.

aklink · 2017-07-10T12:02:55Z

I have one suggestion how RasterizeRDD could be extended to support one value per geometry instead one value per RDD:

package geotrellis.spark.rasterize

import geotrellis.raster._
import geotrellis.raster.rasterize._
import geotrellis.spark._
import geotrellis.spark.tiling._
import geotrellis.vector._
import org.apache.spark.rdd._
import org.apache.spark.{HashPartitioner, Partitioner}

object RasterizeFeaturesRDD {

  /**
   * Rasterize an RDD of Geometry objects into a tiled raster RDD.
   * Cells not intersecting any geometry will left as NODATA.
   * Value will be converted to type matching specified [[CellType]].
   *
   * @param features Cell values for cells intersecting a feature consisting of (geometry,value)
   * @param layout Raster layer layout for the result of rasterization
   * @param cellType [[CellType]] for creating raster tiles
   * @param options Rasterizer options for cell intersection rules
   * @param partitioner Partitioner for result RDD
   */
  def fromFeature[G <: Geometry, D <: Double](

~~//geoms: RDD[G], value: Double,~~
~~features: RDD[(G,D)],~~
features: RDD[Feature[G,D]],

    cellType: CellType,
    layout: LayoutDefinition,
    options: Rasterizer.Options = Rasterizer.Options.DEFAULT,
    partitioner: Option[Partitioner] = None
  ): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = {
    val layoutRasterExtent = RasterExtent(layout.extent, layout.layoutCols, layout.layoutRows)
    val layoutRasterizerOptions = Rasterizer.Options(includePartial=true, sampleType=PixelIsArea)

    /** Key geometry by spatial keys of intersecting tiles */
    def keyGeom(feature: (Geometry, Double)): Iterator[(SpatialKey, ((Geometry, Double), SpatialKey))] = {
      var keySet = Set.empty[SpatialKey]
      feature._1.foreach(layoutRasterExtent, layoutRasterizerOptions){ (col, row) =>
        keySet = keySet + SpatialKey(col, row)
      }
      keySet.toIterator.map { key => (key, (feature, key)) }
    }

    // key the geometry to intersecting tiles so it can be rasterized in the map-side combine
    val keyed: RDD[(SpatialKey, ((Geometry, Double), SpatialKey))] =

~~features.flatMap { case (geom,value) => keyGeom(geom, value) }~~
features.flatMap { feature => keyGeom(feature.geom, feature.data) }

    val createTile = (tup: ((Geometry, Double), SpatialKey)) => {
      val ((geom,value), key) = tup
      val tile = ArrayTile.empty(cellType, layout.tileCols, layout.tileRows)
      val re = RasterExtent(layout.mapTransform(key), layout.tileCols, layout.tileRows)
      geom.foreach(re, options){ tile.setDouble(_, _, value) }
      tile: MutableArrayTile
    }

    val updateTile = (tile: MutableArrayTile, tup: ((Geometry, Double), SpatialKey)) => {
      val ((geom,value), key) = tup
      val re = RasterExtent(layout.mapTransform(key), layout.tileCols, layout.tileRows)
      geom.foreach(re, options){ tile.setDouble(_, _, value) }
      tile: MutableArrayTile
    }

    val mergeTiles = (t1: MutableArrayTile, t2: MutableArrayTile) => {
      t1.merge(t2).mutable
    }

    val tiles = keyed.combineByKeyWithClassTag[MutableArrayTile](
      createCombiner = createTile,
      mergeValue = updateTile,
      mergeCombiners = mergeTiles,
      partitioner.getOrElse(new HashPartitioner(features.getNumPartitions))
    )

    ContextRDD(tiles.asInstanceOf[RDD[(SpatialKey, Tile)]], layout)
  }
}

This works perfectly for me when using:

val shapeFileName = ???
val attribName = ???
val layout = ???
val extent = ???
val ld = LayoutDefinition(extent, layout)
val multipolygonsWithValue = ShapeFileReader.readSimpleFeatures(shapefileName)
      .filter { feat => "MultiPolygon" != feat.getFeatureType.getGeometryDescriptor.getType.getName }
      .map { feat => (MultiPolygon.jts2MultiPolygon(feat.geom[jts.MultiPolygon].get), feat.attribute(attribName)) }
      .map { case (mp: MultiPolygon, value: Long)  => Feature(mp, value.toDouble) }
val multipolygonsRdd = sc.parallelize(multipolygonsWithValue)
val rasterizedRDD : RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = RasterizeFeaturesRDD
      .fromFeature( multipolygonsRdd, IntConstantNoDataCellType, ld, Options.DEFAULT)

aklink · 2017-07-10T12:38:50Z

Should have taken into account #2168 to distinguish between fromGeometry and fromFeature, but with some modifications above propsal could be adjusted to fit this, I hope. Edit: applied change above

…eotrellis#2266

aklink · 2017-07-10T15:59:25Z

An other Option would be using

def fromFeature[G <: Geometry, T : Numeric](
    features: RDD[(G,T)],
    cellType: CellType,
    layout: LayoutDefinition,
    options: Rasterizer.Options = Rasterizer.Options.DEFAULT,
    partitioner: Option[Partitioner] = None
  ): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = ???

then use (Geometry, T) instead of (Geometry,Double) but then value can't be used directly, rather a value.toString.toDouble transform would be needed in createTile and updateTile for geom.foreach(re, options){ tile.setDouble(_, _, value.toString.toDouble) }

echeipesh · 2017-07-11T00:00:07Z

@aklink thank you for testing and poking at this!

The fromFeature should totally work like that, but it should be around Feature[G, Double] type since we have it and it will be generated by reading GeoJSON and such.

The other point is that after looking at the code there doesn't seem to be any point in using Numeric[T] type class. Following the same logic as CellType.withNoData Double is wide enough to represent any possible cell value and using MutableTile.setDouble makes sure that the assignment is consistent with CellType of the generated tiles.

From that perspective Numeric[T] doesn't provide any real flexibility, just a useless type parameter. A map step to get a Feature[G, D] to Feature[G, Double] is cheap, explicit and unsurprising way to use this feature. By the same token RasterizeRDD.fromGeometry can actually delegate to RasterizerRDD.fromFeature by mapping geoms into features with the constant value.

What are you thoughts on z-order when rasterizing features with different values?
There is no way to control the order in which reduction happens in combineByKeyWithClassTag so I'm sure some really weird results as possible where the apparent z-order flips on tile boundaries when features with differing values overlap.

aklink · 2017-07-11T08:49:10Z

I have changed to features: RDD[Feature[G,Double]]
Regarding Z-Order: I didn't think that far. Since the Shapefile used to get Geometries has no overlapping Geometries (overlapping is invalid for Shapefiles, possible - but not allowed) this case can not happen on my site. My scala skills are also not sufficient to handle this.

echeipesh · 2017-07-11T18:03:56Z

Its valuable to know that its not an issue for a pretty significant use case. Perhaps z-order functionality can be part of a different PR.

aklink

aa880d1 missing Signed-off-by, but identical to commit a8c4203 with Signed-off-by
may be causing ip-validation failure (maybe removing this commit aa880d1 from history could solve issue, it was accidentally pushed)

aklink

84d45b1 missing Signed-off-by, but no changes

… on RasterizeRDD PR locationtech#2266 Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>

This compensates for RDD being invariant on its type parameter

Squashed commits: [f449338] Add scaladocs for rdd.rasterizeWithValue (+1 squashed commit) Squashed commits: [a42724a] fix: Pass through Rasterizer.Options correctly

Expect that for large layers a geometry will intersect small fraction of the tiles. In such a case keeping a set of keys is more efficient than a bitmap. For small layer it doesn't matter what choice we made. If this ever fails the next step is a BloomFilter instead of a Set.

… on RasterizeRDD PR locationtech#2266 Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>

Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>

…,Double]]] Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>

Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>

This reverts commit ea5f593.

The line rasterizer converts edge endpoints to integers, whereas the polygon rasterizer preserves the precision of vertices. The loss of precision caused by conversion to integers makes the line rasterizer inappropriate for this application.

This will allow it to be re-used in the per-tile rasterizer

This can be re-used and tested when outside of line function

Signed-off-by: Eugene Cheipesh <echeipesh@gmail.com>

echeipesh added this to the 1.2 milestone Jun 28, 2017

echeipesh added in progress needs review and removed in progress labels Jun 28, 2017

aklink mentioned this pull request Jul 6, 2017

TODO: Convert shapefile to raster biggis-project/biggis-landuse#4

Closed

jbouffard suggested changes Jul 6, 2017

View reviewed changes

jbouffard reviewed Jul 6, 2017

View reviewed changes

aklink reviewed Jul 7, 2017

View reviewed changes

aklink mentioned this pull request Jul 10, 2017

Rasterize RDD[Geometry] or RDD[Feature[G,D]] #2168

Closed

aklink added a commit to biggis-project/biggis-landuse that referenced this pull request Jul 10, 2017

add experimental RasterizeFeaturesRDD.fromFeature see: locationtech/g…

dd18520

…eotrellis#2266

aklink mentioned this pull request Jul 10, 2017

WIP: RasterizeFeaturesRDD for Features echeipesh/geotrellis#10

Merged

aklink reviewed Aug 8, 2017

View reviewed changes

jamesmcclain pushed a commit to jamesmcclain/geotrellis that referenced this pull request Aug 17, 2017

add geotrellis.spark.rasterize.RasterizeFeaturesRDD.fromFeature based…

242d575

… on RasterizeRDD PR locationtech#2266 Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>

This was referenced Aug 18, 2017

4-Connected Line Drawing #2336

Merged

Convert Lines to Polygons, Add Z-Buffering echeipesh/geotrellis#11

Merged

echeipesh removed the needs review label Aug 31, 2017

echeipesh added 2 commits August 30, 2017 22:49

Add RasterizeRDD forGeometry

273e6b4

RasterizeRDD.fromGeometry is generic on Geometry

d9631c8

This compensates for RDD being invariant on its type parameter

echeipesh and others added 24 commits August 30, 2017 22:49

RasterizeRDD.fromGeometry lines test

096d52e

RasterizeRDD.fromGeometry polygon test

812b5c5

rasterizeWithValue method extension for RDD[G<: Geometry]

fda3f99

Optimize RasterizeRDD API and implemintation (+1 squashed commit)

3aed6de

Squashed commits: [f449338] Add scaladocs for rdd.rasterizeWithValue (+1 squashed commit) Squashed commits: [a42724a] fix: Pass through Rasterizer.Options correctly

add geotrellis.spark.rasterize.RasterizeFeaturesRDD.fromFeature based…

5e30290

… on RasterizeRDD PR locationtech#2266 Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>

WIP: change features RDD[(G,D)] to RDD[Features[G,D]]

908b116

Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>

WIP: change features RDD[Feature[G,D]] to RDD[Feature[G,Double]]

4a27ed7

Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>

add FeatureRDDRasterizeMethods extends MethodExtensions[RDD[Feature[G…

67645b8

…,Double]]] Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>

add Implicits for FeatureRDDRasterizeMethods[G]

065996a

Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>

Revert "add Implicits for FeatureRDDRasterizeMethods[G]"

9e38157

This reverts commit ea5f593.

Unify Feature and Geometry Rasterizers

c845101

Convert Lines and MultiLines to Polygons

72d9208

The line rasterizer converts edge endpoints to integers, whereas the polygon rasterizer preserves the precision of vertices. The loss of precision caused by conversion to integers makes the line rasterizer inappropriate for this application.

Carry Priority Informaton

fa06544

Add Z-Buffer Capability

c5eb313

Add Z-Buffer Unit Tests

ed068da

Rename FeatureInfo to CellValue and move it to raster package

243a50e

This will allow it to be re-used in the per-tile rasterizer

Use iterator when rasterizing lines

6875a33

Factor out keyGeomToLayout

bf6e22d

This can be re-used and tested when outside of line function

Refactor: remove usePriority argument

769a73d

Update RasterizeRDD Methods

3698a39

Update RasterizeRDD docs

2a374f4

cell treat priority as an integeger

9e04bd5

Update RasterizeRDD tests

4d70f62

Signed-off-by: Eugene Cheipesh <echeipesh@gmail.com>

echeipesh force-pushed the feature/rasterize-rdd branch from c678999 to 4d70f62 Compare August 31, 2017 02:56

echeipesh merged commit 5c5ebed into locationtech:master Aug 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RasterizeRDD for Geometry #2266

RasterizeRDD for Geometry #2266

echeipesh commented Jun 28, 2017 •

edited

Loading

jbouffard Jul 6, 2017

echeipesh Jul 6, 2017

jbouffard Jul 6, 2017

echeipesh Jul 6, 2017

echeipesh commented Jul 7, 2017 •

edited

Loading

echeipesh commented Jul 7, 2017

jbouffard commented Jul 7, 2017

echeipesh commented Jul 7, 2017

jbouffard commented Jul 7, 2017

aklink left a comment •

edited

Loading

aklink commented Jul 10, 2017 •

edited

Loading

aklink commented Jul 10, 2017 •

edited

Loading

aklink commented Jul 10, 2017

echeipesh commented Jul 11, 2017

aklink commented Jul 11, 2017 •

edited

Loading

echeipesh commented Jul 11, 2017

aklink left a comment •

edited

Loading

aklink left a comment •

edited

Loading

RasterizeRDD for Geometry #2266

RasterizeRDD for Geometry #2266

Conversation

echeipesh commented Jun 28, 2017 • edited Loading

jbouffard Jul 6, 2017

Choose a reason for hiding this comment

echeipesh Jul 6, 2017

Choose a reason for hiding this comment

jbouffard Jul 6, 2017

Choose a reason for hiding this comment

echeipesh Jul 6, 2017

Choose a reason for hiding this comment

echeipesh commented Jul 7, 2017 • edited Loading

RasterizeRDD Method Extensions

Proposal

echeipesh commented Jul 7, 2017

jbouffard commented Jul 7, 2017

echeipesh commented Jul 7, 2017

jbouffard commented Jul 7, 2017

aklink left a comment • edited Loading

Choose a reason for hiding this comment

aklink commented Jul 10, 2017 • edited Loading

aklink commented Jul 10, 2017 • edited Loading

aklink commented Jul 10, 2017

echeipesh commented Jul 11, 2017

aklink commented Jul 11, 2017 • edited Loading

echeipesh commented Jul 11, 2017

aklink left a comment • edited Loading

Choose a reason for hiding this comment

aklink left a comment • edited Loading

Choose a reason for hiding this comment

echeipesh commented Jun 28, 2017 •

edited

Loading

echeipesh commented Jul 7, 2017 •

edited

Loading

aklink left a comment •

edited

Loading

aklink commented Jul 10, 2017 •

edited

Loading

aklink commented Jul 10, 2017 •

edited

Loading

aklink commented Jul 11, 2017 •

edited

Loading

aklink left a comment •

edited

Loading

aklink left a comment •

edited

Loading