Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python raster reader argument refactor #329

Conversation

@vpipkt
Copy link
Member

commented Sep 2, 2019

Initial argument for spark.read.raster is called source and can now be a str, list of str, list of list of str, pandas DF catalog or spark DF catalog.

Add docstring to this important entry point for users.

Simplifies documentation pages to remove unneeded catalog named argument.

Expands unit test in python to all claimed use cases of source.

vpipkt added 10 commits Aug 22, 2019
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
…on-raster-reader-arg-refactor

Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
@vpipkt vpipkt requested a review from metasim Sep 2, 2019
Signed-off-by: Jason T. Brown <jason@astraea.earth>
@vpipkt vpipkt added this to the 0.8.2 milestone Sep 2, 2019
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Copy link
Member

left a comment

Awesome step forward in usability. Thank you for taking this on.

I suggest going back through the Scala API and thinking about how we can synchronize the names so we can maintain parity in naming (and functionality). As much as we can get away with, I'd like there to be a more or less mechanical translation between the Python API and Scala API, which starts with consistent names.

def fromCatalog(catalog: DataFrame, bandColumnNames: String*): RasterSourceDataFrameReader =
tag[RasterSourceDataFrameReaderTag][DataFrameReader] {
val tmpName = tmpTableName()
catalog.createOrReplaceTempView(tmpName)
reader
.option(RasterSourceDataSource.CATALOG_TABLE_PARAM, tmpName)
.option(RasterSourceDataSource.CATALOG_TABLE_COLS_PARAM, bandColumnNames.mkString(",")): DataFrameReader
}
def fromCatalog(tableName: String, bandColumnNames: String*): RasterSourceDataFrameReader =
tag[RasterSourceDataFrameReaderTag][DataFrameReader](
reader.option(RasterSourceDataSource.CATALOG_TABLE_PARAM, tableName)
.option(RasterSourceDataSource.CATALOG_TABLE_COLS_PARAM, bandColumnNames.mkString(","))
)
def fromCSV(catalogCSV: String, bandColumnNames: String*): RasterSourceDataFrameReader =
tag[RasterSourceDataFrameReaderTag][DataFrameReader](
reader.option(RasterSourceDataSource.CATALOG_CSV_PARAM, catalogCSV)
.option(RasterSourceDataSource.CATALOG_TABLE_COLS_PARAM, bandColumnNames.mkString(","))
)
def from(newlineDelimPaths: String): RasterSourceDataFrameReader =
tag[RasterSourceDataFrameReaderTag][DataFrameReader](
reader.option(RasterSourceDataSource.PATHS_PARAM, newlineDelimPaths)
)
def from(paths: Seq[String]): RasterSourceDataFrameReader =
from(paths.mkString("\n"))
def from(uris: Seq[URI])(implicit d: DummyImplicit): RasterSourceDataFrameReader =

Where in Python we have to do type inspection, in Scala we can use method overloading.

Also consider revising option key names, as appropraite:

final val SHORT_NAME = "raster"
final val PATH_PARAM = "path"
final val PATHS_PARAM = "paths"
final val BAND_INDEXES_PARAM = "bandIndexes"
final val TILE_DIMS_PARAM = "tileDimensions"
final val CATALOG_TABLE_PARAM = "catalogTable"
final val CATALOG_TABLE_COLS_PARAM = "catalogColumns"
final val CATALOG_CSV_PARAM = "catalogCSV"
final val LAZY_TILES_PARAM = "lazyTiles"

class RasterSourceTest(TestEnvironment):

@staticmethod
def path(scene, band):

This comment has been minimized.

Copy link
@metasim

metasim Sep 3, 2019

Member

Wondering if this should be promoted to TestEnvironment...

vpipkt added 4 commits Sep 3, 2019
…on-raster-reader-arg-refactor-and-docs
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
@vpipkt

This comment has been minimized.

Copy link
Member Author

commented Sep 6, 2019

@metasim I made some changes to data source parameters, which hopefully simplifies things in the direction you were thinking.

I am less clear on the vision for synchronizing naming between scala and python. Happy to discuss

@metasim
metasim approved these changes Sep 6, 2019
@metasim metasim merged commit 61d6b6e into locationtech:develop Sep 9, 2019
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
eclipsefdn/eca The author(s) of the pull request is covered by necessary legal agreements in order to proceed!
Details
@vpipkt vpipkt deleted the s22s:feature/python-raster-reader-arg-refactor-and-docs branch Oct 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.