-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RasterizeRDD for Geometry #2266
RasterizeRDD for Geometry #2266
Conversation
): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = { | ||
val intValue = implicitly[Numeric[T]].toInt(value) | ||
val dblValue = implicitly[Numeric[T]].toDouble(value) | ||
val options = Options(includePartial=true, sampleType=PixelIsArea) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a certain reason why you defined options
in this method when it's already being passed in as a parameter (line 44)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, thank you for the catch.
import org.apache.spark.rdd._ | ||
import scala.collection.immutable.VectorBuilder | ||
|
||
object RasterizeRDD { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it'd be worth it to have another overload method or two that doesn't require options
?
def fromGeometry[G <: Geometry, T: Numeric](
geoms: RDD[G],
layout: LayoutDefinition,
ct: CellType,
value: T
): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = {
fromGeometry(geoms, layout, ct, value, new HashPartitioner(geoms.sparkContext.defaultParallelism), Options.DEFAULT)
}
def fromGeometry[G <: Geometry, T: Numeric](
geoms: RDD[G],
layout: LayoutDefinition,
ct: CellType,
value: T,
numPartitions: Int
): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = {
fromGeometry(geoms, layout, ct, value, new HashPartitioner(numPartitions), Options.DEFAULT)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a good point. Thank you for taking the time to type them up.
RasterizeRDD Method ExtensionsWhen creating method extensions for `RasterizeRDD.fromGeometry` we have to consder oprtional arguments. def fromGeometry[G <: Geometry, T: Numeric](
geoms: RDD[G],
value: T,
layout: LayoutDefinition,
ct: CellType,
partitioner: Partitioner,
options: Rasterizer.Options
): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = ??? There are two or three optional arguments: def fromGeometry[G <: Geometry, T: Numeric](
geoms: RDD[G],
value: T,
layout: LayoutDefinition,
ct: CellType = IntConstantNoDataCellType,
partitioner: Partitioner = new HasPartitioner(geoms.getNumPartitions),
options: Rasterizer.Options= Rasterizer.Options.DEFAULT
): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = ??? An existing pattern is to create object RasterizeRDD {
case class Options(
rasterizerOptions: rasterizerOptions = Rasterizer.Options.DEFAULT,
partitioner: Option[Partitioner] = None
)
val DEFAULT = Options()
}
trait GeometryRDDRasterizeMethods[G <: Geometry] extends MethodExtensions[RDD[G]] {
def rasterizeWithValue[T: Numeric](
value: T,
cellType: CellType
options: RasterizeRDD.Options = RasterizeRDD.Options.DEFAULT
): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = ???
}
val geoms: RDD[Polygon] = ???
geoms.rasterizeWithValue(1, IntConstantNoDataCellType,
RasterizeRDD.Options(
partitioner = Some(new HashPartitioner(geoms.getNumPartitions)),
rasterizerOptions = Rasterizer.Options(true, PixelIsArea)
)
) This is all well and good but it has the following two drawbacks:
The main problem is that the true defaults in this case depend on parameters. There is a way scala handles that case is with multiple parameter lists. Proposalobject RasterizeRDD {
case class Options(
rasterizerOptions: rasterizerOptions = Rasterizer.Options.DEFAULT,
partitioner: Partitioner
)
}
trait GeometryRDDRasterizeMethods[G <: Geometry] extends MethodExtensions[RDD[G]] {
def rasterizeWithValue[T: Numeric](
value: T
)(
cellType: CellType = narrowestCellType(value),
includePartial: Boolean = true,
sampleType: PixelSampleType = PixelIsPoint,
partitioner = new HashPartitioner(self.getNumPartitions),
): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = {
val rasterizerOptions = Rasterizer.Options(includePartial, sampleType)
val options = RasterizeRDD.Options(Some(rasterizerOptions), partitioner)
??? // make the call
}
}
val geoms: RDD[Polygon] = ???
geoms.rasterizeWithValue(1)
geoms.rasterizeWithValue(4)(cellType = ByteCellType)
geoms.rasterizeWithValue(4)(
cellType = ByteCellType,
sampleType = PixelIsArea) What we get for the trouble:
|
I think the proposal looks good. I never liked how nested defining options could become in GeoTrellis. Do you think it'd be worth having more overloads just in case someone already has |
Not sure that is possible with this pattern. You may only have one instance of a method name with default arguments, which means you couldn't define And it wouldn't make sense to put it into the same method signature because you'd have to choose if the rest parameters trump the contents of the I suppose with this pattern if somebody needed to use the |
Ah, I wasn't aware of that. Then I think the current implementation is fine then. It might be worth noting somewhere in the docs that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested with ShapeFileReader.readSimpleFeatures(...) using MultiPolygons, works fine for me
Edited:
I am using
val rasterizedRDD : RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = RasterizeRDD.fromGeometry( polyRdd, 1, IntConstantNoDataCellType, ld, Options.DEFAULT)
What is currently not working on my site:
val rasterizedRdd = polyRdd.rasterizeWithValue(1, IntConstantNoDataCellType, ld)
Error:(112, 33) value rasterizeWithValue is not a member of org.apache.spark.rdd.RDD[geotrellis.vector.MultiPolygon]
val rasterizedRdd = polyRdd.rasterizeWithValue(1, IntConstantNoDataCellType, ld)
Maybe some missing import statement on my site, or a wrong usage of types in combination with:
ShapeFileReader.readSimpleFeatures(shapefileName)
Also didn't manage it to pass different rasterization values (only one value per RDD allowed) if not all geometries in RDD are same "class" (if using raster value as object class, specified by a field/column in shapefile). But maybe this is not intended to be supported by RasterizeRDD. If all geometries have same value, then it is working fine.
I have one suggestion how RasterizeRDD could be extended to support one value per geometry instead one value per RDD: package geotrellis.spark.rasterize
import geotrellis.raster._
import geotrellis.raster.rasterize._
import geotrellis.spark._
import geotrellis.spark.tiling._
import geotrellis.vector._
import org.apache.spark.rdd._
import org.apache.spark.{HashPartitioner, Partitioner}
object RasterizeFeaturesRDD {
/**
* Rasterize an RDD of Geometry objects into a tiled raster RDD.
* Cells not intersecting any geometry will left as NODATA.
* Value will be converted to type matching specified [[CellType]].
*
* @param features Cell values for cells intersecting a feature consisting of (geometry,value)
* @param layout Raster layer layout for the result of rasterization
* @param cellType [[CellType]] for creating raster tiles
* @param options Rasterizer options for cell intersection rules
* @param partitioner Partitioner for result RDD
*/
def fromFeature[G <: Geometry, D <: Double](
cellType: CellType,
layout: LayoutDefinition,
options: Rasterizer.Options = Rasterizer.Options.DEFAULT,
partitioner: Option[Partitioner] = None
): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = {
val layoutRasterExtent = RasterExtent(layout.extent, layout.layoutCols, layout.layoutRows)
val layoutRasterizerOptions = Rasterizer.Options(includePartial=true, sampleType=PixelIsArea)
/** Key geometry by spatial keys of intersecting tiles */
def keyGeom(feature: (Geometry, Double)): Iterator[(SpatialKey, ((Geometry, Double), SpatialKey))] = {
var keySet = Set.empty[SpatialKey]
feature._1.foreach(layoutRasterExtent, layoutRasterizerOptions){ (col, row) =>
keySet = keySet + SpatialKey(col, row)
}
keySet.toIterator.map { key => (key, (feature, key)) }
}
// key the geometry to intersecting tiles so it can be rasterized in the map-side combine
val keyed: RDD[(SpatialKey, ((Geometry, Double), SpatialKey))] =
val createTile = (tup: ((Geometry, Double), SpatialKey)) => {
val ((geom,value), key) = tup
val tile = ArrayTile.empty(cellType, layout.tileCols, layout.tileRows)
val re = RasterExtent(layout.mapTransform(key), layout.tileCols, layout.tileRows)
geom.foreach(re, options){ tile.setDouble(_, _, value) }
tile: MutableArrayTile
}
val updateTile = (tile: MutableArrayTile, tup: ((Geometry, Double), SpatialKey)) => {
val ((geom,value), key) = tup
val re = RasterExtent(layout.mapTransform(key), layout.tileCols, layout.tileRows)
geom.foreach(re, options){ tile.setDouble(_, _, value) }
tile: MutableArrayTile
}
val mergeTiles = (t1: MutableArrayTile, t2: MutableArrayTile) => {
t1.merge(t2).mutable
}
val tiles = keyed.combineByKeyWithClassTag[MutableArrayTile](
createCombiner = createTile,
mergeValue = updateTile,
mergeCombiners = mergeTiles,
partitioner.getOrElse(new HashPartitioner(features.getNumPartitions))
)
ContextRDD(tiles.asInstanceOf[RDD[(SpatialKey, Tile)]], layout)
}
} This works perfectly for me when using: val shapeFileName = ???
val attribName = ???
val layout = ???
val extent = ???
val ld = LayoutDefinition(extent, layout)
val multipolygonsWithValue = ShapeFileReader.readSimpleFeatures(shapefileName)
.filter { feat => "MultiPolygon" != feat.getFeatureType.getGeometryDescriptor.getType.getName }
.map { feat => (MultiPolygon.jts2MultiPolygon(feat.geom[jts.MultiPolygon].get), feat.attribute(attribName)) }
.map { case (mp: MultiPolygon, value: Long) => Feature(mp, value.toDouble) }
val multipolygonsRdd = sc.parallelize(multipolygonsWithValue)
val rasterizedRDD : RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = RasterizeFeaturesRDD
.fromFeature( multipolygonsRdd, IntConstantNoDataCellType, ld, Options.DEFAULT) |
Should have taken into account #2168 to distinguish between fromGeometry and fromFeature, but with some modifications above propsal could be adjusted to fit this, I hope. Edit: applied change above |
An other Option would be using def fromFeature[G <: Geometry, T : Numeric](
features: RDD[(G,T)],
cellType: CellType,
layout: LayoutDefinition,
options: Rasterizer.Options = Rasterizer.Options.DEFAULT,
partitioner: Option[Partitioner] = None
): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = ??? then use |
@aklink thank you for testing and poking at this! The The other point is that after looking at the code there doesn't seem to be any point in using From that perspective What are you thoughts on z-order when rasterizing features with different values? |
I have changed to features: RDD[Feature[G,Double]] |
Its valuable to know that its not an issue for a pretty significant use case. Perhaps z-order functionality can be part of a different PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
84d45b1 missing Signed-off-by, but no changes
… on RasterizeRDD PR locationtech#2266 Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>
This compensates for RDD being invariant on its type parameter
Squashed commits: [f449338] Add scaladocs for rdd.rasterizeWithValue (+1 squashed commit) Squashed commits: [a42724a] fix: Pass through Rasterizer.Options correctly
Expect that for large layers a geometry will intersect small fraction of the tiles. In such a case keeping a set of keys is more efficient than a bitmap. For small layer it doesn't matter what choice we made. If this ever fails the next step is a BloomFilter instead of a Set.
… on RasterizeRDD PR locationtech#2266 Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>
Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>
Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>
…,Double]]] Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>
Signed-off-by: Adrian Klink <Adrian.Klink@eftas.com>
This reverts commit ea5f593.
The line rasterizer converts edge endpoints to integers, whereas the polygon rasterizer preserves the precision of vertices. The loss of precision caused by conversion to integers makes the line rasterizer inappropriate for this application.
This will allow it to be re-used in the per-tile rasterizer
This can be re-used and tested when outside of line function
Signed-off-by: Eugene Cheipesh <echeipesh@gmail.com>
c678999
to
4d70f62
Compare
Resolves: #2228
Resolves: #2168