New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming/Windowed GeoTiff Reading #1905

Merged
merged 90 commits into from Mar 10, 2017

Conversation

Projects
None yet
4 participants
@echeipesh
Contributor

echeipesh commented Dec 12, 2016

The main culprit here is StreamingSegmentBytes. In order to do its job it needs to know ahead of time which segments you will read. At that point it can use the TiffTags to group them into contiguous chunks which can be fetched in on-disk order. This requires slight inversion of control between segmentBytes and GeoTiffTile where we allow the bytes to decide the order in which we see the segment. Overall it looks like this:

cfor(0)(_ < segmentCount, _ + 1) { segmentIndex =>
   val segment = getSegment(segmentIndex).bytes
   val segmentSize = segment.size
   val bandSegmentCount = segmentSize / bandCount
   val bandSegment = Array.ofDim[Byte](bandSegmentCount)
...

is replaced by

getSegments(0 until segmentCount).foreach { case (segmentIndex, geoTiffSegment) =>
    val segment = geoTiffSegment.bytes
    val segmentSize = segment.size
    val bandSegmentCount = segmentSize / bandCount
    val bandSegment = Array.ofDim[Byte](bandSegmentCount)
...

Because getSegments gives us an iterator we can zip them and use them for combine as well:

getSegments(0 until segmentCount)
  .zip(otherGeoTiff.getSegments(0 until segmentCount))
  .foreach { case ((segmentIndex, segment), (otherIndex, otherSegment)) =>
    require(segmentIndex == otherIndex, s"Segment index mismatch: $segmentIndex != $otherIndex")
    val newBytes = segment.mapWithIndex { (i, z) =>
      f(z, otherSegment.getInt(i))
    }
    arr(segmentIndex) = compressor.compress(newBytes, segmentIndex)
  }

Things that need done before this is done:

  • Test on S3 and verify performance on reading fully and windowed
  • Test on S3 file reading multiple crops from the same GeoTiff (verify that byteReader doesn't Reset)
  • Replace all usages of getSegment with getSegments in GeoTiffMultiBandTile (some a tricksy)
  • Update the StreamingSegmentBytesSpec for current code (restored LazySegmentBytes)

jbouffard and others added some commits Dec 5, 2016

Added StreamingSegmentBytes
Signed-off-by: jbouffard <jbouffard@azavea.com>
Integrated StreamingSegmentBytes in SinglebandGeoTiffs
Signed-off-by: jbouffard <jbouffard@azavea.com>
Integrated StreamingSegmentBytes in MultibandGeoTiffs
Signed-off-by: jbouffard <jbouffard@azavea.com>
Integrated StreamingSegmentBytes in GeoTiffReader
Signed-off-by: jbouffard <jbouffard@azavea.com>
Integrated StreamingSegmentBytes into Int16GeoTiffTile
Signed-off-by: jbouffard <jbouffard@azavea.com>
Integrated StreamingSegmentBytes into UByteGeoTiffTile
Signed-off-by: jbouffard <jbouffard@azavea.com>
Integrated StreamingSegmentBytes into Int16GeoTiffTile
Signed-off-by: jbouffard <jbouffard@azavea.com>
Added StreamingSegmentBytesSpec
Signed-off-by: jbouffard <jbouffard@azavea.com>
Integrated StreamingSegmentBytes into ByteGeoTiffTile
Signed-off-by: jbouffard <jbouffard@azavea.com>
Integrated StreamingSegmentBytes into UInt16GeoTiffTile
Signed-off-by: jbouffard <jbouffard@azavea.com>
Integrated StreamingSegmentBytes into UInt32GeoTiffTile
Signed-off-by: jbouffard <jbouffard@azavea.com>
Integrated StreamingSegmentBytes into Float32GeoTiffTile
Signed-off-by: jbouffard <jbouffard@azavea.com>
Integrated StreamingSegmentBytes into Float64GeoTiffTile
Signed-off-by: jbouffard <jbouffard@azavea.com>
Added intersectingSegments to ArraySegmentBytes
Signed-off-by: jbouffard <jbouffard@azavea.com>
Removed LazySegmentBytes
Signed-off-by: jbouffard <jbouffard@azavea.com>
Added intersectingSegments to SegmentBytes
Signed-off-by: jbouffard <jbouffard@azavea.com>
Continuing debugging work
Signed-off-by: jbouffard <jbouffard@azavea.com>
Continued to debug StreamingSegmentBytes
Signed-off-by: jbouffard <jbouffard@azavea.com>
Removed LazySegmentBytes from SegmentBytesSpec
Signed-off-by: jbouffard <jbouffard@azavea.com>
Removed some debugging code and extranious methods from StreamingSegm…
…entBytes

Signed-off-by: jbouffard <jbouffard@azavea.com>
Updated StreamingSegmentBytesSpec to reflect the changes to Streaming…
…SegmentBytes

Signed-off-by: jbouffard <jbouffard@azavea.com>
Moved MergeQueue from the Spark to Util
Signed-off-by: jbouffard <jbouffard@azavea.com>
Updated the import path for MergeQueue in all files that use it
Signed-off-by: jbouffard <jbouffard@azavea.com>
Began work on reading consecutive segments as one chunk
Signed-off-by: jbouffard <jbouffard@azavea.com>
Removed bug that created duplicate chunks
Signed-off-by: jbouffard <jbouffard@azavea.com>
Switched paramters around in ByteReaderExtensions
Signed-off-by: jbouffard <jbouffard@azavea.com>
Updated TiffTagsReader to reflect the changes in ByteReaderExtensions
Signed-off-by: jbouffard <jbouffard@azavea.com>
Continued to bug fix
Signed-off-by: jbouffard <jbouffard@azavea.com>
Updated the various GeoTiffTiles crop methods to reflect the changes …
…made in StreamingSegmentBytes

Signed-off-by: jbouffard <jbouffard@azavea.com>

@pomadchin pomadchin force-pushed the echeipesh:chunky-streaming branch from 4b0a995 to 2e53e9c Mar 2, 2017

pomadchin added some commits Mar 2, 2017

fix s3 unit tests
Signed-off-by: Grigory Pomadchin <gr.pomadchin@gmail.com>
deinterleave functions fix
Signed-off-by: Grigory Pomadchin <gr.pomadchin@gmail.com>

@pomadchin pomadchin force-pushed the echeipesh:chunky-streaming branch from e19c4fd to 0d48955 Mar 2, 2017

pomadchin added some commits Mar 2, 2017

improve bands and subset functions
Signed-off-by: Grigory Pomadchin <gr.pomadchin@gmail.com>
_combine(initValueHolder: ...) function refactor
Signed-off-by: Grigory Pomadchin <gr.pomadchin@gmail.com>
combiners arity 2 refactor
Signed-off-by: Grigory Pomadchin <gr.pomadchin@gmail.com>
use getSegments in GeoTiffWriter
Signed-off-by: Grigory Pomadchin <gr.pomadchin@gmail.com>
multiband combiners improvements, moved boilerplate to use getSegment…
…s function

Signed-off-by: Grigory Pomadchin <gr.pomadchin@gmail.com>
fix multiband combiners
Signed-off-by: Grigory Pomadchin <gr.pomadchin@gmail.com>

@pomadchin pomadchin force-pushed the echeipesh:chunky-streaming branch from 8acb1f6 to ff0ec4b Mar 2, 2017

@pomadchin

This comment has been minimized.

Member

pomadchin commented Mar 2, 2017

Took into account all comments; StreamingSegmentBytesSpec is still needs to be done.
@echeipesh @lossyrob

restore LazySegmentBytes tests
Signed-off-by: Grigory Pomadchin <gr.pomadchin@gmail.com>

@pomadchin pomadchin force-pushed the echeipesh:chunky-streaming branch from ab15c26 to e1258ec Mar 3, 2017

@@ -266,7 +275,7 @@ object GeoTiffReader {
}
}
private def readGeoTiffInfo(byteReader: ByteReader, decompress: Boolean, streaming: Boolean): GeoTiffInfo = {
private def readGeoTiffInfo(byteReader: ByteReader, decompress: Boolean, streaming: Boolean, extent: Option[Extent]): GeoTiffInfo = {

This comment has been minimized.

@lossyrob

lossyrob Mar 8, 2017

Member

Where is the Option[Extent] used? Seems unused.

) extends SegmentBytes with LazyLogging {
import LazySegmentBytes.Segment
// TODO: verify this is correct

This comment has been minimized.

@lossyrob

lossyrob Mar 8, 2017

Member

Have we addressed this TODO?

val createSegment: Int => BitGeoTiffSegment = { i =>
val (segmentCols, segmentRows) = segmentLayout.getSegmentDimensions(i)
// val size = segmentCols * segmentRows
val decompressGeoTiffSegment = { (i: Int, bytes: Array[Byte]) =>

This comment has been minimized.

@echeipesh

echeipesh Mar 8, 2017

Contributor

@lossyrob
This breaks "public" API by removing createSegment aside from having a bad name the signature of createSegment forces it to use getSegment, which breaks streaming.

Question: getSegment be added back for this PR with deprecation and a warning or is this a forgivable sin?

@echeipesh echeipesh force-pushed the echeipesh:chunky-streaming branch from 0bdcbb8 to 95c6a27 Mar 9, 2017

* The base trait of SegmentBytes. It can be implemented either as
* an Array[Array[Byte]] or as a ByteBuffer that is lazily read in.
*/
trait SegmentBytes extends Seq[Array[Byte]] {

This comment has been minimized.

@echeipesh

echeipesh Mar 9, 2017

Contributor

This had to changes from Traversable to Seq because the only way the former can tell its length is to iterate through the collection, which is obviously not great for streaming.

@echeipesh

This comment has been minimized.

Contributor

echeipesh commented Mar 9, 2017

Here is a handy test I'm using:

import geotrellis.spark.io.s3._
import geotrellis.spark.io._
import geotrellis.spark._
import geotrellis.raster._
import geotrellis.raster.io._
import geotrellis.raster.io.geotiff._
import geotrellis.proj4._
import geotrellis.vector._

val client = S3Client.DEFAULT

import geotrellis.spark.io.s3.util._
val rr = S3RangeReader("geotrellis-test", "rf-test/356f564e3a0dc9d15553c17cf4583f21-6.tif", client)
val tiff = MultibandGeoTiff.streaming(rr)
val sube = tiff.extent.center.buffer(tiff.extent.width * 0.01).envelope
tiff.crop(sube)

@echeipesh echeipesh removed the in progress label Mar 9, 2017

@echeipesh echeipesh referenced this pull request Mar 9, 2017

Merged

Windowed GeoTiff Ingest #1224

4 of 4 tasks complete

@echeipesh echeipesh merged commit 796f651 into locationtech:master Mar 10, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@lossyrob lossyrob modified the milestones: 1.1, 1.0.1 Mar 12, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment