Skip to content
This repository has been archived by the owner on Mar 6, 2019. It is now read-only.

Spatial Filtering and Predicate Pushdown via JTS #46

Merged
merged 45 commits into from
Feb 20, 2018
Merged

Conversation

metasim
Copy link
Contributor

@metasim metasim commented Jan 12, 2018

This branch captures features associated with providing spatial filtering based on JTS types, via the upcoming geomesa-spark-jts library. This branch depends on a custom build of this in-progress branch of GeoMesa. Therefore, release of this feature should probably wait until it becomes more stable.

@codecov
Copy link

codecov bot commented Jan 12, 2018

Codecov Report

Merging #46 into develop will decrease coverage by 3.11%.
The diff coverage is 78%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop      #46      +/-   ##
===========================================
- Coverage    86.74%   83.63%   -3.12%     
===========================================
  Files           46       68      +22     
  Lines          981     1332     +351     
  Branches        58       70      +12     
===========================================
+ Hits           851     1114     +263     
- Misses         130      218      +88
Impacted Files Coverage Δ
...rames/functions/LocalTileOpAggregateFunction.scala 100% <ø> (ø) ⬆️
...frames/functions/LocalStatsAggregateFunction.scala 100% <ø> (ø) ⬆️
.../scala/org/apache/spark/sql/rf/KryoBackedUDT.scala 100% <ø> (ø)
...rframes/functions/HistogramAggregateFunction.scala 100% <ø> (ø) ⬆️
...rframes/functions/CellStatsAggregateFunction.scala 100% <ø> (ø) ⬆️
...a/org/apache/spark/sql/gt/types/HistogramUDT.scala 80% <ø> (ø) ⬆️
...frames/functions/LocalCountAggregateFunction.scala 100% <ø> (ø) ⬆️
...park/rasterframes/extensions/MetadataMethods.scala 75% <ø> (ø)
...rframes/functions/CellCountAggregateFunction.scala 93.75% <ø> (-6.25%) ⬇️
.../astraea/spark/rasterframes/encoders/package.scala 100% <ø> (+14.28%) ⬆️
... and 90 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3abae6e...47369c1. Read the comment docs.

@vpipkt
Copy link
Contributor

vpipkt commented Feb 9, 2018

Runtime error found:

Code:

val layer:astraea.spark.rasterframes.datasource.geotrellis.Layer = ???
val startTime: ZonedDateTime =???
val endTime: ZonedDateTime =???

val rf_GeoThenTime = spark.read.geotrellis
    .loadRF(layer) 
    .where(EXTENT_COLUMN intersects geotrellis.vector.Point(-75.0, 35.0))
    .where(TIMESTAMP_COLUMN betweenTimes(startTime, endTime))

rf_GeoThenTime.count

Stack trace

scala.MatchError: GreaterThanOrEqual(timestamp,2017-07-01 00:00:00.0) (of class org.apache.spark.sql.sources.GreaterThanOrEqual)
  at astraea.spark.rasterframes.datasource.geotrellis.GeoTrellisRelation.applyFilter(GeoTrellisRelation.scala:185)
  at astraea.spark.rasterframes.datasource.geotrellis.GeoTrellisRelation.applyFilterTemporal(GeoTrellisRelation.scala:208)
  at astraea.spark.rasterframes.datasource.geotrellis.GeoTrellisRelation$$anonfun$query$2$$anonfun$10.apply(GeoTrellisRelation.scala:274)
  at astraea.spark.rasterframes.datasource.geotrellis.GeoTrellisRelation$$anonfun$query$2$$anonfun$10.apply(GeoTrellisRelation.scala:274)
  at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
  at scala.collection.immutable.List.foldLeft(List.scala:84)
  at astraea.spark.rasterframes.datasource.geotrellis.GeoTrellisRelation$$anonfun$query$2.apply(GeoTrellisRelation.scala:274)
  at astraea.spark.rasterframes.datasource.geotrellis.GeoTrellisRelation$$anonfun$query$2.apply(GeoTrellisRelation.scala:269)
  at scala.util.Either.fold(Either.scala:99)
  at astraea.spark.rasterframes.datasource.geotrellis.GeoTrellisRelation.query(GeoTrellisRelation.scala:239)
  at astraea.spark.rasterframes.datasource.geotrellis.GeoTrellisRelation.buildScan(GeoTrellisRelation.scala:233)
  at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$12.apply(DataSourceStrategy.scala:293)
  at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$12.apply(DataSourceStrategy.scala:293)
  at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:330)
  at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:329)
  at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:421)
  at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:325)
  at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:289)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:62)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:62)
  at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:77)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:74)
  at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
  at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
  at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
  at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:74)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:66)
  at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:77)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:74)
  at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
  at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
  at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
  at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:74)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:66)
  at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
  at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84)
  at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80)
  at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89)
  at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2837)
  at org.apache.spark.sql.Dataset.count(Dataset.scala:2434)
  ... 102 elided

metasim and others added 18 commits February 20, 2018 15:51
NB: Currently drops data component until a generic way of capturing
the data without a pre-defined schema is determined.
Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>
Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>
Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>
Fixes #53.

Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>
Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>
Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>
Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>
Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>
Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>
Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>
Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>
Signed-off-by: Simeon H.K. Fitch <fitch@astraea.io>
@metasim metasim changed the title [WIP] Spatial Filtering and Predicate Pushdown via JTS Spatial Filtering and Predicate Pushdown via JTS Feb 20, 2018
@metasim metasim merged commit ee7e134 into develop Feb 20, 2018
@metasim metasim deleted the feature/jts branch March 16, 2018 00:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants