New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HadoopGeoTiffRDD and S3GeoTiffRDD add overloads to use file names in keys #2050

Merged
merged 6 commits into from Mar 14, 2017

Conversation

Projects
None yet
4 participants
@pomadchin
Member

pomadchin commented Mar 10, 2017

Fixes #1958 without a significant API change. This PR introduces a keyTransform: (URI, I) => K function to have ability to read necessary metadata from files paths, and to modify Input key.

  • Tests

@pomadchin pomadchin changed the title from Collect metadata from file names to [WIP] Collect metadata from file names Mar 10, 2017

add keyTransform function
Signed-off-by: Grigory Pomadchin <gr.pomadchin@gmail.com>

@pomadchin pomadchin force-pushed the pomadchin:feature/input-path-handle branch from 50f42c3 to 80ffe05 Mar 10, 2017

@pomadchin pomadchin added this to the 1.1 milestone Mar 12, 2017

@lossyrob lossyrob modified the milestones: 1.2, 1.1 Mar 12, 2017

@lossyrob

This comment has been minimized.

Member

lossyrob commented Mar 12, 2017

This breaks API - I'd want to think through this in a way that won't break the API if possible... Is this the only way to accomplish this?

@lossyrob

This comment has been minimized.

Member

lossyrob commented Mar 12, 2017

Maybe if we just create the necessary overloads of public methods to maintain the signatures that existed before, this will work.

@lossyrob

Needs overloads to not break public API.

* @tparam K
* @return
*/
def keyTransformId[K] = (_: URI, key: K) => key

This comment has been minimized.

@lossyrob

lossyrob Mar 12, 2017

Member

Unclear that this is necessary, if the overloads are properly aligned.

@pomadchin

This comment has been minimized.

Member

pomadchin commented Mar 12, 2017

@lossyrob sure: first approach to create smth like spark did: ~ NewHadoopGeoTiffRDD and to deprecate the old one, or I can create functions with keyTransform different and still to keep them. P.S. all public signatures are here, except old apply

P.P.S. I wrote it before comments above, didn't notice;

@lossyrob

This comment has been minimized.

Member

lossyrob commented Mar 12, 2017

We can just overload apply to pass through the same sort of default keyTransform as the other methods.

code cleanup according to Robs comments
Signed-off-by: Grigory Pomadchin <gr.pomadchin@gmail.com>

@pomadchin pomadchin changed the title from [WIP] Collect metadata from file names to Collect metadata from file names Mar 13, 2017

add tests to demonstrate how key transform function can be used
Signed-off-by: Grigory Pomadchin <gr.pomadchin@gmail.com>

@pomadchin pomadchin force-pushed the pomadchin:feature/input-path-handle branch from ece8450 to fb01b0b Mar 13, 2017

code cleanup
Signed-off-by: Grigory Pomadchin <gr.pomadchin@gmail.com>
@pomadchin

This comment has been minimized.

Member

pomadchin commented Mar 13, 2017

Usage example are in tests and here.

@pomadchin pomadchin removed the in progress label Mar 13, 2017

@lossyrob

This comment has been minimized.

Member

lossyrob commented Mar 13, 2017

Is there a good spot in docs for noting this?

@pomadchin pomadchin modified the milestones: 1.2, 1.1 Mar 13, 2017

@pomadchin

This comment has been minimized.

Member

pomadchin commented Mar 13, 2017

@lossyrob eh we don't have a section for it, @fosskers can you navigate me where it would be better to write this information? Looks like it should be done as a separate PR, as it would include the whole new section with this part of API description.

@fosskers

This comment has been minimized.

Contributor

fosskers commented Mar 13, 2017

What needs to be noted, exactly? If there was a deprecation, it should be explained in a @deprecated annotation on the old, overloaded method. The note will appear at compile time for the user, so that they'll know they have to switch eventually.

I don't think deprecation notices need to go into ReadTheDocs, other than in the CHANGELOG.

rr.readWindow(reader, pixelWindow, options)
val (k, v) = rr.readWindow(reader, pixelWindow, options)
keyTransform(new URI(objectRequest.getKey), k) -> v

This comment has been minimized.

@echeipesh

echeipesh Mar 13, 2017

Contributor

getKey will only give you the path without the s3://bucket/ prefix

* @param keyTransform function to transform input key basing on the URI information.
*/
def singleband[I, K](bucket: String, prefix: String, keyTransform: (URI, I) => K, options: Options)(implicit sc: SparkContext, rr: RasterReader[Options, (I, Tile)]): RDD[(K, Tile)] =
apply[I, K, Tile](bucket, prefix, keyTransform, options)

This comment has been minimized.

@echeipesh

echeipesh Mar 13, 2017

Contributor

keyTransform is a little overloaded with the way we use it for SFC. This should be renamed to uriToKey or something similar.

* @param options An instance of [[Options]] that contains any user defined or default settings.
*/
def apply[K, V](path: Path, options: Options = Options.DEFAULT)(implicit sc: SparkContext, rr: RasterReader[Options, (K, V)]): RDD[(K, V)] = {
def apply[I, K, V](path: Path, keyTransform: (URI, I) => K, options: Options)(implicit sc: SparkContext, rr: RasterReader[Options, (I, V)]): RDD[(K, V)] = {

This comment has been minimized.

@echeipesh

echeipesh Mar 13, 2017

Contributor

Not certain if use of URI here provides opportunity to use a more general function. We should probably just use Path here.

@echeipesh echeipesh force-pushed the pomadchin:feature/input-path-handle branch from ec71d1f to 004d984 Mar 14, 2017

@echeipesh echeipesh changed the title from Collect metadata from file names to HadoopGeoTiffRDD and S3GeoTiffRDD add overloads to use file names in keys Mar 14, 2017

@echeipesh echeipesh merged commit 580fa28 into locationtech:master Mar 14, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment