New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR: HDFS Raster Layers #1582

Merged
merged 2 commits into from Aug 9, 2016

Conversation

Projects
None yet
2 participants
@echeipesh
Contributor

echeipesh commented Jul 11, 2016

This ADR describes current implementation of the HDFS layers and key design decisions that were considered. This is intended to as both documentation and starting off point for further improvements.

In a single value query we are given an instance of `K` and we must produce a corresponding `V` or an error. The first step is to locate the `MapFile` which potentially contains `(K, V)` record. Because the layer records are indexed by their SFC index we map `K` to `i: Long` and determine which file contains potential match by examining the file listing and finding the file with maximum starting index that is less than equal `i`. At this point the `MapFile` must be opened and queried for the key.
Because both listing all the files in the layer directory and opening a single MapFile we cache these steps. The file listing needs to be done only once per `HadoopValueReader` instance and we maintain an LRU cache of open `MapFile`s. Because SFC preserves some spatial locality of the records, geographically close records are likely to be close in SFC index, and we expect key/value queries to be geographically grouped, for instance requests from a map viewer, the `MapFile` LRU cache can be expected to have very high hit-rate.

This comment has been minimized.

@lossyrob

lossyrob Jul 25, 2016

Member

Because both listing all the files in the layer directory and opening a single MapFile we cache these steps. I feel like this is missing some words.

## Context
Raster layer is a regular grid of raster tiles, represented as a `RDD[(K, V)]` where `K` contains the column, row, and/or time. Raster layer storage scheme must support two forms of queries with different requirements:

This comment has been minimized.

@lossyrob

lossyrob Jul 25, 2016

Member

Should have Markdown style one-line-per-sentence structure.

@lossyrob

This comment has been minimized.

Member

lossyrob commented Jul 25, 2016

+1 after changes.

@echeipesh echeipesh merged commit a7c8b5d into locationtech:master Aug 9, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@lossyrob lossyrob added this to the 1.0 milestone Oct 18, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment