New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repartition in ETL when re-tiling increases layer resolution #2135

Merged
merged 8 commits into from Apr 20, 2017

Conversation

Projects
None yet
3 participants
@echeipesh
Contributor

echeipesh commented Apr 11, 2017

During ETL process there are two ways in which the job specification can cause increase in resolution:

  • When maxZoom parameter forces a higher than necessary level using ZoomedLayoutScheme
  • When LayoutDefinition is provided that has higher resolution than source imagery

This is possible both in per-tile reprojected and buffered reproject methods.

If the increase in resolution is significant it will dramatically increase the size of the partitions, which were originally mapped to source imagery. Past certain point the partitions become too large to be processed.

For per-tile reproject this PR handles the logic in geotrellis.spark.etl.Etl class by inspecting the difference in resolutions of pre-tiled metadata and post-tiled metadata.

However in buffered reproject the check must happen during after the reprojection took place. The resolutions in different CRS are not comparable so we must check the tiles covered by the KeyBounds, assuming that the tiles themselves remain mostly constant in size.

Incidental Fixes

  • Render module was not callable due to incorrect name specification
  • Logging added to CutTiles for when a single resample may OOM
  • Etl must return maxZoom as zoom level when its specified

Resolves: #2129

@echeipesh

This comment has been minimized.

Contributor

echeipesh commented Apr 11, 2017

Buffered reproject ingest

buffered-ingest

echeipesh added some commits Apr 11, 2017

Bounds convertible to Option[KeyBounds[K]]
Signed-off-by: Eugene Cheipesh <echeipesh@gmail.com>
TiledRDDReproject will increase the number of partitions if upsampling
Signed-off-by: Eugene Cheipesh <echeipesh@gmail.com>
ETL Output enforces maxZoom exclusivity with FloatingLayoutScheme
Signed-off-by: Eugene Cheipesh <echeipesh@gmail.com>
ETL tile function adjusts partition count when upsampling is happening
Signed-off-by: Eugene Cheipesh <echeipesh@gmail.com>
Fix: SpatialRender is callable using hadoop output module
Signed-off-by: Eugene Cheipesh <echeipesh@gmail.com>
fix: use maxZoom as zoom level when specified
Signed-off-by: Eugene Cheipesh <echeipesh@gmail.com>

@echeipesh echeipesh force-pushed the echeipesh:fix/upsampling-etl-retile branch from c07af03 to 63a68c4 Apr 11, 2017

import org.apache.spark.rdd._
import scala.reflect.ClassTag
object CutTiles {
@transient private lazy val logger = LazyLogging(this)

This comment has been minimized.

@pomadchin

pomadchin Apr 11, 2017

Member

Why you don't want to extend LazyLogging? API compatibility? Looks ugly and like we can forget about it ):

This comment has been minimized.

@echeipesh

echeipesh Apr 19, 2017

Contributor

Yeah, that was the intent is to avoid adding types to classes when all we want is this private field. I think this should be the prefered approach.

@@ -29,3 +29,9 @@ trait LazyLogging {
@transient protected lazy val logger: Logger =
Logger(LoggerFactory.getLogger(getClass.getName))

This comment has been minimized.

@pomadchin

pomadchin Apr 11, 2017

Member

It you are sure in this approach, LazyLogging(this)

@lossyrob lossyrob added this to the 1.1 milestone Apr 19, 2017

@echeipesh echeipesh removed the in progress label Apr 19, 2017

@lossyrob lossyrob merged commit 0799a69 into locationtech:master Apr 20, 2017

1 of 2 checks passed

ip-validation The pull request did not pass Eclipse validation. The following users have invalid Signed-off-by footers: echeipesh@gmail.com
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment