Spatial Filtering and Predicate Pushdown via JTS #46

metasim · 2018-01-12T18:05:42Z

This branch captures features associated with providing spatial filtering based on JTS types, via the upcoming geomesa-spark-jts library. This branch depends on a custom build of this in-progress branch of GeoMesa. Therefore, release of this feature should probably wait until it becomes more stable.

codecov · 2018-01-12T18:18:24Z

Codecov Report

Merging #46 into develop will decrease coverage by 3.11%.
The diff coverage is 78%.

@@             Coverage Diff             @@
##           develop      #46      +/-   ##
===========================================
- Coverage    86.74%   83.63%   -3.12%     
===========================================
  Files           46       68      +22     
  Lines          981     1332     +351     
  Branches        58       70      +12     
===========================================
+ Hits           851     1114     +263     
- Misses         130      218      +88

Impacted Files	Coverage Δ
...rames/functions/LocalTileOpAggregateFunction.scala	`100% <ø> (ø)`	⬆️
...frames/functions/LocalStatsAggregateFunction.scala	`100% <ø> (ø)`	⬆️
.../scala/org/apache/spark/sql/rf/KryoBackedUDT.scala	`100% <ø> (ø)`
...rframes/functions/HistogramAggregateFunction.scala	`100% <ø> (ø)`	⬆️
...rframes/functions/CellStatsAggregateFunction.scala	`100% <ø> (ø)`	⬆️
...a/org/apache/spark/sql/gt/types/HistogramUDT.scala	`80% <ø> (ø)`	⬆️
...frames/functions/LocalCountAggregateFunction.scala	`100% <ø> (ø)`	⬆️
...park/rasterframes/extensions/MetadataMethods.scala	`75% <ø> (ø)`
...rframes/functions/CellCountAggregateFunction.scala	`93.75% <ø> (-6.25%)`	⬇️
.../astraea/spark/rasterframes/encoders/package.scala	`100% <ø> (+14.28%)`	⬆️
... and 90 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3abae6e...47369c1. Read the comment docs.

vpipkt · 2018-02-09T17:53:15Z

Runtime error found:

Code:

val layer:astraea.spark.rasterframes.datasource.geotrellis.Layer = ???
val startTime: ZonedDateTime =???
val endTime: ZonedDateTime =???

val rf_GeoThenTime = spark.read.geotrellis
    .loadRF(layer) 
    .where(EXTENT_COLUMN intersects geotrellis.vector.Point(-75.0, 35.0))
    .where(TIMESTAMP_COLUMN betweenTimes(startTime, endTime))

rf_GeoThenTime.count

Stack trace

scala.MatchError: GreaterThanOrEqual(timestamp,2017-07-01 00:00:00.0) (of class org.apache.spark.sql.sources.GreaterThanOrEqual)
  at astraea.spark.rasterframes.datasource.geotrellis.GeoTrellisRelation.applyFilter(GeoTrellisRelation.scala:185)
  at astraea.spark.rasterframes.datasource.geotrellis.GeoTrellisRelation.applyFilterTemporal(GeoTrellisRelation.scala:208)
  at astraea.spark.rasterframes.datasource.geotrellis.GeoTrellisRelation$$anonfun$query$2$$anonfun$10.apply(GeoTrellisRelation.scala:274)
  at astraea.spark.rasterframes.datasource.geotrellis.GeoTrellisRelation$$anonfun$query$2$$anonfun$10.apply(GeoTrellisRelation.scala:274)
  at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
  at scala.collection.immutable.List.foldLeft(List.scala:84)
  at astraea.spark.rasterframes.datasource.geotrellis.GeoTrellisRelation$$anonfun$query$2.apply(GeoTrellisRelation.scala:274)
  at astraea.spark.rasterframes.datasource.geotrellis.GeoTrellisRelation$$anonfun$query$2.apply(GeoTrellisRelation.scala:269)
  at scala.util.Either.fold(Either.scala:99)
  at astraea.spark.rasterframes.datasource.geotrellis.GeoTrellisRelation.query(GeoTrellisRelation.scala:239)
  at astraea.spark.rasterframes.datasource.geotrellis.GeoTrellisRelation.buildScan(GeoTrellisRelation.scala:233)
  at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$12.apply(DataSourceStrategy.scala:293)
  at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$12.apply(DataSourceStrategy.scala:293)
  at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:330)
  at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:329)
  at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:421)
  at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:325)
  at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:289)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:62)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:62)
  at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:77)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:74)
  at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
  at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
  at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
  at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:74)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:66)
  at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:77)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:74)
  at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
  at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
  at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
  at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:74)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:66)
  at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
  at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84)
  at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80)
  at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89)
  at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2837)
  at org.apache.spark.sql.Dataset.count(Dataset.scala:2434)
  ... 102 elided

with other DataSource-s.

based on layer metadata.

…asses.

of layer id and other columns.

…Frame.

…Layer.

NB: Currently drops data component until a generic way of capturing the data without a pre-defined schema is determined.

version of tile data.

Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>

Fixes #53. Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>

Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>

metasim mentioned this pull request Feb 15, 2018

Match error in predicate pushdown construction #54

Closed

metasim added 27 commits February 20, 2018 15:51

Added dependency and resolvers for custom build of geomesa-jts-spark.

f2e2643

Initial prototype providing encoder for GT Point via the JTS UDT.

718ab1a

Prototyped join using spatial relations.

076b9e0

Test of JTS-based joins and filtering.

8654e2a

Moved Geotrellis DataSource into separate module, to be augmented

dd9d18e

with other DataSource-s.

Reworking GT DataSource to determine schema and query configuration

3798a3a

based on layer metadata.

Skeleton for writing logical plan rewriting.

6132881

Moving toward implementing spatial predicates as expressions.

c5cf151

Checkpoint commit before bailing on push-down filters.

461cfa0

Foray into filtering based on relation updating.

c0633b6

Additional experiment with using XZ3 indexes.

521c9a8

Initial implementation of manual predicate push-down.

e31ac4f

Regression test fixing.

5f006e3

Added convenience intersects extension methods on Extent columns.

d939a22

Predicate push-down via 2-stage translation, via DataSource filter cl…

c2a5e67

…asses.

Added temporal column push-down filters in GT relation.

bf77d04

Prototyped GeoTrellis catalog reader.

5700cb6

Rewrote catalog construction to provide better assurances in alignment

7624e2b

of layer id and other columns.

Added convenience method for loading a layer from the catalog Dataset.

4734ec2

Added initial support for reading Cloud Optimized GeoTiff as a Raster…

080671d

…Frame.

Added multiband tile support for GeoTiffRelation.

5f480f5

Upgraded to latest spark-jts WIP version.

74c7b30

Further refined DSL associated with reading a GeoTrellis catalog and …

79f01cb

…Layer.

Updated SpatialPredicates to use Expression form of relations.

31b74c3

Working toward multiband support in writing GT layers.

7c3befb

Updated docs to reflect use of GeoTrellis catalog and layer DataSources.

d130e8a

Fixed package name for BuildInfo.

3c1e751

metasim and others added 18 commits February 20, 2018 15:51

Made bintray settings global.

286edee

Sonatype publishing preparation.

ad2d62b

Added support for reading layers written with TileFeature values.

83769ad

NB: Currently drops data component until a generic way of capturing the data without a pre-defined schema is determined.

Updated TileTeature handling to create a column containing JSON string

0d85105

version of tile data.

Removed unsafe RasterFrame reader from catalog results.

f517e0f

New set of shims for supporting multiple versions of Spark.

2c9922b

Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>

Fix for #52.

ecf8204

Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>

Non-generic fix for #54.

dba2de6

Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>

Exposed numPartitions as an option to GeoTrellis datasource.

b9bb691

Fixes #53. Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>

Made failing on unmatched pushdown filter a configurable parameter.

56cf40f

Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>

Initial port to updated spark-jts library.

1a76a92

Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>

Documentation tweaks.

b8009c7

Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>

Incremental integration of geomesa-spark-jts restructure.

5313f10

Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>

Removed vanity authorship tags.

5ab3b78

Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>

Refactoring extension methods into own package.

0a3438e

Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>

GeoTrellisRelation respect numPartitions option on SpaceTimeKey

b1dc185

Fixed filter capture and pushdown.

c1d60cf

Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>

Expanded code test coverage .

47369c1

Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>

metasim force-pushed the feature/jts branch from d749538 to 47369c1 Compare February 20, 2018 20:52

metasim changed the title ~~[WIP] Spatial Filtering and Predicate Pushdown via JTS~~ Spatial Filtering and Predicate Pushdown via JTS Feb 20, 2018

metasim merged commit ee7e134 into develop Feb 20, 2018

metasim deleted the feature/jts branch March 16, 2018 00:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spatial Filtering and Predicate Pushdown via JTS #46

Spatial Filtering and Predicate Pushdown via JTS #46

metasim commented Jan 12, 2018

codecov bot commented Jan 12, 2018 •

edited

vpipkt commented Feb 9, 2018

Spatial Filtering and Predicate Pushdown via JTS #46

Spatial Filtering and Predicate Pushdown via JTS #46

Conversation

metasim commented Jan 12, 2018

codecov bot commented Jan 12, 2018 • edited

Codecov Report

vpipkt commented Feb 9, 2018

codecov bot commented Jan 12, 2018 •

edited