New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create GeoMesa suproject #1621

Merged
merged 15 commits into from Oct 6, 2016

Conversation

Projects
None yet
4 participants
@pomadchin
Member

pomadchin commented Sep 5, 2016

  • layer reader
  • layer writer
  • tests coverage
  • GeoMesa 1.2.7
* http://www.opensource.org/licenses/apache2.0.php.
*************************************************************************/
package org.locationtech.geomesa.jobs.mapreduce

This comment has been minimized.

@pomadchin

pomadchin Sep 7, 2016

Member

should be removed after moving to geomesa 1.2.7 locationtech/geomesa#1077

This comment has been minimized.

@lossyrob

lossyrob Sep 23, 2016

Member

Can you please create an issue for this?

pomadchin added some commits Sep 7, 2016

@pomadchin pomadchin changed the title from [WIP] GeoMesa suproject to Create GeoMesa suproject Sep 7, 2016

if(temporal) sftb.add(whenField, classOf[java.util.Date])
val sft = sftb.buildFeatureType
if(temporal) sft.getUserData.put(Constants.SF_PROPERTY_START_TIME, whenField) // when field is date
sft.getUserData.put("geomesa.mixed.geometries", java.lang.Boolean.TRUE) // allow GeoMesa to index points and extents together

This comment has been minimized.

@echeipesh

echeipesh Sep 8, 2016

Contributor

Looks like these tags are specific to Geomesa, so the object should be in geotrellis.geomsea package.

val whenField = "when"
val whereField = "where"
def apply[G <: Geometry: ClassTag](featureName: String, crs: Option[GCRS] = Some(WebMercator), temporal: Boolean = false) = {

This comment has been minimized.

@echeipesh

echeipesh Sep 8, 2016

Contributor

need return type for API readability

val sftb = (new SimpleFeatureTypeBuilder).minOccurs(1).maxOccurs(1).nillable(false)
sftb.setName(featureName)
crs.foreach { crs => sftb.setSRS(s"EPSG:${crs.epsgCode.get}") }

This comment has been minimized.

@echeipesh

echeipesh Sep 8, 2016

Contributor

Is there a fallback if the CRS is not EPSG code? Not sure how common that is with vectors but in rasters we've seen that happen.

This comment has been minimized.

@pomadchin

pomadchin Sep 8, 2016

Member

James sets crs using org.geotools.referencing.CRS, we can try it that way; potentially we can fix these lines as well

sftb.setName(featureName)
crs.foreach { crs => sftb.setSRS(s"EPSG:${crs.epsgCode.get}") }
classTag[G].toString.split("\\.").last match {

This comment has been minimized.

@echeipesh

echeipesh Sep 8, 2016

Contributor

classTag[G].runtimeClass.getName

This comment has been minimized.

@echeipesh

echeipesh Sep 8, 2016

Contributor

Shame you can't easily match on the ClassTag instance.

import scala.collection.JavaConverters._
case class GeoMesaAttributeStore(instanceName: String, zookeepers: String, user: String, password: String, useMock: Boolean = false) extends Serializable {

This comment has been minimized.

@echeipesh

echeipesh Sep 8, 2016

Contributor

This is not an AttributeStore and doesn't really do anything like AttributeStore interface. lets just make it an apply. So maybe there is a need of an object that is like GeoMesaConnector that keeps the map as fields?

This comment has been minimized.

@pomadchin

pomadchin Sep 8, 2016

Member

Yes thought about it, mb GeoMesaInstance? and yes, probably we need to have just a Map of fields, to get connector in runtime and to abstract over backend types

import scala.reflect.ClassTag
class GeoMesaLayerReader(val attributeStore: GeoMesaAttributeStore, table: String)(implicit sc: SparkContext) extends Serializable {

This comment has been minimized.

@echeipesh

echeipesh Sep 8, 2016

Contributor

This should just take DataStore as constructor argument and provide apply overloads to help with construction. Also layer writer is generally for rasters. I would call this GeoWaveFeatureReader, @lossyrob thoughts?

This comment has been minimized.

@pomadchin

pomadchin Sep 8, 2016

Member

@echeipesh @lossyrob +1 ; are we omitting word Layer?

  • GeoMesaFeatureReader
  • GeoMesaFeatureValueReader // or we don't need it at all, as it can be covered by collections api, in fact geomesa api not provides to query data by "key" / "Point coords(?)"?
  • GeoMesaFeatureWriter
rdd
.zipWithIndex // it is necessary to provide unique id for ingest batch
.map { case (f, index) => val sf = f.toSimpleFeature(s"${attributeStore.layerIdString(layerId)}_${index}", layerId.name); sf.getFeatureType -> sf }.groupByKey

This comment has been minimized.

@echeipesh

echeipesh Sep 8, 2016

Contributor

Could you explain what is going on here? Where is this requirement for unique id coming from and how is it reflected in what is stored?

This comment has been minimized.

@pomadchin

pomadchin Sep 8, 2016

Member

@echeipesh the practise during one ingest batch features should have different ids; but they could be same during different batches; would ask this question in a geomesa channel to clarify it

if(temporal) sft.getUserData.put(Constants.SF_PROPERTY_START_TIME, whenField) // when field is date
sft.getUserData.put(SimpleFeatureTypes.MIXED_GEOMETRIES, java.lang.Boolean.TRUE) // allow GeoMesa to index points and extents together
sft.getUserData.put(Hints.USE_PROVIDED_FID, java.lang.Boolean.FALSE) // generate feature ids
sft

This comment has been minimized.

@lossyrob

lossyrob Sep 24, 2016

Member

Then again we don't really have those concepts in GeoTrellis layers, so this might be a good thing to make a note/issue out of and add once we put those concepts into GeoTrellis (perhaps through sfcurve usage?)

@elahrvivaz

Looks reasonable to me!

import org.opengis.feature.simple.SimpleFeature
trait FeatureToGeoMesaSimpleFeatureMethods[G <: Geometry, T] extends MethodExtensions[Feature[G, T]] {
def toSimpleFeature(featureName: String, featureId: Option[String] = Some(null), crs: Option[CRS] = None)(implicit transmute: T => Seq[(String, Any)]): SimpleFeature =

This comment has been minimized.

@elahrvivaz

elahrvivaz Sep 26, 2016

doesn't Some(null) kind of break the Option contract?

This comment has been minimized.

@pomadchin

This comment has been minimized.

@lossyrob

lossyrob Oct 6, 2016

Member

Weird, is this PR diff just not updating properly?

This comment has been minimized.

@pomadchin

pomadchin Oct 6, 2016

Member

._. looks like these comments attached to a certain sha; anyway it's perfect that this pr was not merged in (due to 8f962f3)

This comment has been minimized.

@lossyrob

lossyrob Oct 6, 2016

Member

Oh. Yeah you can comment commits rather than prs, might of been what happened here.

This comment has been minimized.

@lossyrob

lossyrob Oct 6, 2016

Member

Looks like it's just the new review functionality for github - it no longer removes comments when a new change comes in that addresses the issue. Which makes it harder to see if issues are addressed, which is pretty crappy of GitHub.

val sft = sftb.buildFeatureType
if(data.map(_._1).contains(whenField)) sft.getUserData.put(Constants.SF_PROPERTY_START_TIME, whenField) // when field is date
sft.getUserData.put(SimpleFeatureTypes.MIXED_GEOMETRIES, java.lang.Boolean.TRUE) // allow GeoMesa to index points and extents together

This comment has been minimized.

@elahrvivaz

elahrvivaz Sep 26, 2016

FYI you actually don't need this (although it won't hurt) - it's only if you specify the geometry binding as Geometry, which you don't do in your match above.

val sft = sftb.buildFeatureType
if(data.map(_._1).contains(whenField)) sft.getUserData.put(Constants.SF_PROPERTY_START_TIME, whenField) // when field is date
sft.getUserData.put(SimpleFeatureTypes.MIXED_GEOMETRIES, java.lang.Boolean.TRUE) // allow GeoMesa to index points and extents together
sft.getUserData.put(Hints.USE_PROVIDED_FID, java.lang.Boolean.FALSE) // generate feature ids

This comment has been minimized.

@elahrvivaz

elahrvivaz Sep 26, 2016

This won't get used actually - you have to set (or not set) the hint on a per-feature basis. If you don't explicitly set the hint then generating IDs is the default behavior.

This comment has been minimized.

@pomadchin

pomadchin Oct 6, 2016

Member

Thx, lines removed

}
data.foreach({ case (key, value) => sfb.add(value) })
sfb.buildFeature(featureId.orNull)

This comment has been minimized.

@elahrvivaz

elahrvivaz Sep 26, 2016

as an optimization, you might want to cache the simple feature types - it's probably going to be slow to generate it for each simple feature you create.

This comment has been minimized.

@pomadchin

This comment has been minimized.

@pomadchin
numPartitions: Option[Int] = None
): RDD[SimpleFeature] = {
val dataStore = instance.accumuloDataStore
dataStore.createSchema(simpleFeatureType)

This comment has been minimized.

@elahrvivaz

elahrvivaz Sep 26, 2016

this is a no-op if the schema already exists - but if the schema doesn't exist then there's not going to be any data in it... maybe that is ok here.

This comment has been minimized.

@pomadchin

pomadchin Oct 6, 2016

Member

added an existence check

try {
sf.foreach { rawFeature =>
val newFeature = featureWriter.next()
attrNames.foreach(an => newFeature.setAttribute(an, rawFeature.getAttribute(an)))

This comment has been minimized.

@elahrvivaz

elahrvivaz Sep 26, 2016

as a minor optimization, you could do

(0 until sft.getAttributeCount).foreach(i => newFeature.setAttribute(i, rawFeature.getAttribute(i))

lookup by index is always faster than lookup by attribute name

This comment has been minimized.

@pomadchin

pomadchin Oct 6, 2016

Member

thx! implemented there

val sft = sftb.buildFeatureType
if(temporal) sft.getUserData.put(Constants.SF_PROPERTY_START_TIME, whenField) // when field is date
sft.getUserData.put(SimpleFeatureTypes.MIXED_GEOMETRIES, java.lang.Boolean.TRUE) // allow GeoMesa to index points and extents together
sft.getUserData.put(Hints.USE_PROVIDED_FID, java.lang.Boolean.FALSE) // generate feature ids

This comment has been minimized.

@elahrvivaz

elahrvivaz Sep 26, 2016

same as noted above - these 2 options won't really have any effect here.

This comment has been minimized.

@pomadchin

pomadchin Oct 6, 2016

Member

lines removed and created an issue #1654

@lossyrob

This comment has been minimized.

Member

lossyrob commented Oct 5, 2016

Thanks for the review @elahrvivaz. @pomadchin are all the review comments addressed?

@pomadchin

This comment has been minimized.

Member

pomadchin commented Oct 5, 2016

@lossyrob yes, walked through all comments

@lossyrob

This comment has been minimized.

Member

lossyrob commented Oct 6, 2016

Looks like there's some optimizations and other comments that still need to be addressed...need to either make the changes or reply to comments about why the suggested change shouldn't happen.

@lossyrob lossyrob merged commit ed0fac8 into locationtech:master Oct 6, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@lossyrob lossyrob added this to the 1.0 milestone Oct 18, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment