Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EsHadoopIllegalStateException reading Geo-Shape into DataFrame - SparkSQL #607

Closed
randallwhitman opened this issue Nov 12, 2015 · 46 comments

Comments

@randallwhitman
Copy link

  1. Create an index type with a mapping consisting of a field of type geo_shape.
  2. Create an RDD[String] containing a polygon as GeoJSON, as the value of a field whose name matches the mapping:
    """{"rect":{"type":"Polygon","coordinates":[[[50,32],[69,32],[69,50],[50,50],[50,32]]],"crs":null}}"""
  3. Write to an index type in Elasticsearch:
    rdd1.saveJsonToEs(indexName+"/"+indexType, connectorConfig)
  4. Read into SparkSQL DataFrame with either esDF or read-format-load:
    • sqlContext.esDF(indexName+"/"+indexType, connectorConfig)
    • sqlContext.read.format("org.elasticsearch.spark.sql").options(connectorConfig).load(indexName+"/"+indexType)

Result is:
org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field 'rect' not found; typically this occurs with arrays which are not mapped as single value
Full stack trace in gist. Elasticsearch Hadoop v2.1.2

@costin
Copy link
Member

costin commented Dec 3, 2015

The issue here is that the connector doesn't know what to translate the geo_shape into (there's no such type in Spark) and thus skips it. A potential solution (configurable) would be to map all unknown/not-mapped types into a generic String and let the user deal with further mapping.

@randallwhitman
Copy link
Author

I think a String containing JSON would work for us, thanks. That would be much better than fatal exception.

@costin
Copy link
Member

costin commented Jan 8, 2016

@randallwhitman Hi,

I've taken a closer look at this and it's a bit more complicated. Fixable but it's not as easy as I thought.
The major issue with SparkSQL is that it requires a strict schema before loading any data so the connector can only rely on the mapping to provide it. However the underlying data (due to the flexibility of JSON) can be quite... loose which trips Spark and/or the connector as it doesn't fit exactly into the schema.

First off, field "crs" is null meaning it is not mapped - there's no type information associated with it and thus, no mapping. So the connector doesn't even see it when looking at the mapping so when it encounters it in the _source, it doesn't know what to do with it. This needs to be fixed - currently I've added a better exception message and raised #648
Second, the mapping information is incomplete for Spark SQL requirements. For example coordinates is a field of type long. Is it a primitive or an array? We don't know before hand. One can indicate that it's an array through the newly introduced es.read.field.as.array.include/exclude (ES 2.2 only). However this is not enough, as the array depth is unknown. The connector is told that this field is an array but is it [long], [[long]], [[[long]]] and so on?
I've raised yet another issue for this, namely #650.

@costin costin removed the v2.1.2 label Jan 8, 2016
@randallwhitman
Copy link
Author

Interesting, thanks for the update.

Is there any merit to an option to treat the field as String while reading, so the result is a raw String of JSON? We'd be able to post-process the JSON with a method such as GeometryEngine.geometryFromGeoJson.

@costin
Copy link
Member

costin commented Jan 8, 2016

You mean reading the field in raw json format instead of parsing it? You could do such a thing but plugging a customized ValueReader and basically ignoring the given type and simply concatenating the results. Note that ES-Hadoop already does the parsing and it would still not fix the issue.
It's not the JSON parsing that's the problem but rather the schema in Spark SQL that needs to be known before hand.
If you were to read the same information as an RDD for example things are easier. Note that currently there's a workaround for this where one would create the DataFrame programatically instead of relying on ES-Hadoop to infer it.

@randallwhitman
Copy link
Author

I'll take a look at the link you provided, thanks.

@randallwhitman
Copy link
Author

I roughed out some code that reads RDD and then creates DataFrame, essentially:

val base:RDD[String,Map[...]] = EsSpark.esRDD(...)
val rtmp:RDD[Row] = base.map(... case geo_shape => convertMapToString ...)
val schema = ...  // application-specific interpretation from mapping
val df = sqlContext.createDataFrame(rtmp, schema)

The workaround as I have it now converts JSON to Map and back to JSON again before parsing an object. Perhaps I could work around that with es.output.json.

But as this was referred to as a workaround, I understand it to be not the recommended approach, but rather a temporary workaround until this issue is resolved.

@costin costin added v2.2.0 and removed v2.2.0-rc1 labels Jan 8, 2016
@costin
Copy link
Member

costin commented Jan 8, 2016

Thanks for the update. The double JSON conversion is wasteful (not to mention the connector can/already does it).
And yet, I do consider it a temporary solution since the connector should provide ways for the user to declare the schema, not code around it.

@costin costin added the bug label Jan 8, 2016
@costin
Copy link
Member

costin commented Jan 15, 2016

@randallwhitman Hi, this has been fixed in master - can you please try the latest dev build ?
Basically with geo_shape, specify that the coordinate field is an array with double depth (ideally we would be able to detect ourselves this however the mapping has nothing geo about it):
es.read.field.as.array.include=rect.coordinates:2 means rect.coordinates is a [[<whatevertype]].

Please try it out and let me know if it works for you.
And one more thing, ES-Hadoop allows its documents to be returned in JSON format directly. Set es.output.json to true in your configuration and read away.
In fact, I should add a dedicated method in Spark for this to make it more obvious. Oh, and docs as well..

@costin
Copy link
Member

costin commented Jan 15, 2016

In fact, I should add a dedicated method in Spark for this to make it more obvious.

It's already in there - esJsonRDD - it returns each document in raw JSON format (and actually quite efficient as it does not reinterpret the data, it only parse it and chunks it in one go and serves the docs directly from the incoming buffer). Unfortunately it's not truly zero-copy since the raw bytes have to be converted into Strings (which are immutable and copy the data themselves) however we avoid a significant amount of parsing and charset conversions.

@randallwhitman
Copy link
Author

Today I am trying this out. I am trying it with polygon geometries.

    val testConfig =
        connectorConfig + ("es.read.field.as.array.include" -> "rect.coordinates:3")
println(testConfig)
    val df = sqlContext.esDF(shapeResource, testConfig)

I tried setting the value both to 2 (copy-paste) and to 3 which should be correct for polygon as GeoJson.

Either way I am still seeing the exception.

16/01/19 13:02:53 INFO Version: Elasticsearch Hadoop v2.2.0.BUILD-SNAPSHOT [6066f849b4]
[...]
Map(es.net.http.auth.user -> els_6y6yicd, es.batch.size.entries -> 0, es.net.http.auth.pass -> ea6kou54p1, es.read.field.as.array.include -> rect.coordinates:3, es.nodes -> RANDALL-WORKSTATION.ESRI.COM:9220, es.cluster.name -> ds_sgmmq9q8)
16/01/19 13:02:53 INFO ScalaEsRowRDD: Reading from [tests1453237331981/tests1453237331981]
16/01/19 13:02:53 INFO ScalaEsRowRDD: Discovered mapping {tests1453237331981=[mappings=[tests1453237331981=[rect=GEO_SHAPE]]]} for [tests1453237331981/tests1453237331981]
16/01/19 13:02:53 ERROR Executor: Exception in task 3.0 in stage 2.0 (TID 5)
org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field 'rect' not found; typically this occurs with arrays which are not mapped as single value
    at org.elasticsearch.spark.sql.RowValueReader$class.rowColumns(RowValueReader.scala:33)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.rowColumns(ScalaEsRowValueReader.scala:14)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.createMap(ScalaEsRowValueReader.scala:42)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:672)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:610)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:691)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:610)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:391)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:321)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:216)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:189)
    at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:438)
    at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:86)
    at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
    at scala.collection.AbstractIterator.to(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
    at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:885)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:885)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1767)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1767)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

@costin
Copy link
Member

costin commented Jan 19, 2016

What version of ES are you using?

@costin
Copy link
Member

costin commented Jan 19, 2016

I think I found the culprit - and that is not plugging the automatic array detection into the dedicated geo types. If you have some code / doc samples it would be great (I have some of my own but it's always good to have extra).

Cheers,

@randallwhitman
Copy link
Author

The server running is 1.6.2 though I'm pulling the 1.7.1 API through maven when testing - I can make them match.

@costin
Copy link
Member

costin commented Jan 19, 2016

Why not use 1.7.4 instead of 1.7.1?

@randallwhitman
Copy link
Author

API 1.6.2 and API 1.7.4 both, same exception.

costin added a commit that referenced this issue Jan 24, 2016
Add dedicated parsing and handling of Geo types and inferring of data
based on 'sampling' of data.
As Geo types are not properly described into their mappings (ES provides
only `geo_shape` and `geo_point` but there's no information about the
geo type used), ES-Hadoop now detects a geo field and will parse it in an
ad-hoc manner.
However for strongly-typed environments (such as Spark SQL), it will
'sample' the data, by asking for one document so the actual content will
be parsed in order to determine the format and use that for the inferred
data set.

relates #607
@costin
Copy link
Member

costin commented Jan 24, 2016

@randallwhitman I've just pushed a fix for your issue in master. It is fairly consistent especially on the spark side so please try it out.
There's no much you need to do or configure - after going through various variations I've realized that describing the mapping of the geo format (there are 4 for geo_point and 9 for geo_shape) it is not only cumbersome but also somewhat intuitive.
The challenge for ES-Hadoop is that the mapping is fairly abstract - it's a point or shape - however the mapping provides no information on the actual format used. Which is actually good for the user as ES allows a variety of formats but when dealing with strongly-typed APIs like Spark SQL, things fall apart.

So to go around this, ES-Hadoop now detects when a field is of geo type and, in case of Spark SQL, will sample the data (get one random doc contains all the geo fields), parse it, determine the format and in turn generate the schema.

tl;dr - you should just point the latest ES-Hadoop dev snapshot to your data set and that's it - the schema should be inferred automatically.
If possible, please try it out ASAP and report back - the release is approaching fast (1-2 days) and while there are a number of tests for it, there can never be too many.

Cheers,

costin added a commit that referenced this issue Jan 24, 2016
Add dedicated parsing and handling of Geo types and inferring of data
based on 'sampling' of data.
As Geo types are not properly described into their mappings (ES provides
only `geo_shape` and `geo_point` but there's no information about the
geo type used), ES-Hadoop now detects a geo field and will parse it in an
ad-hoc manner.
However for strongly-typed environments (such as Spark SQL), it will
'sample' the data, by asking for one document so the actual content will
be parsed in order to determine the format and use that for the inferred
data set.

relates #607
@randallwhitman
Copy link
Author

The first time I got the test to run today, I got a different error, but i had left in the ..array.include setting - I'll take that out and see if the error goes away.

�[31m  org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[yPIG6fkDTWaKtRbCDYxVbA][tests1453750973109][0]: SearchParseException[[tests1453750973109][0]: from[-1],size[1]: Parse Failure [Failed to parse source [{ "terminate_after":1, "size":1,�[0m
�[31m"_source": ["rect"],�[0m
�[31m"query":{ "bool": { "must":[�[0m
�[31m{ "exists":{ "field":"rect"} }�[0m
�[31m]}}}]]]; nested: QueryParsingException[[tests1453750973109] No query registered for [exists]]; }{[yPIG6fkDTWaKtRbCDYxVbA][tests1453750973109][1]: SearchParseException[[tests1453750973109][1]: from[-1],size[1]: Parse Failure [Failed to parse source [{ "terminate_after":1, "size":1,�[0m

@costin
Copy link
Member

costin commented Jan 25, 2016

Looks like you are running ES pre 2.0 - will look into adding a compatibility fix for that.

@randallwhitman
Copy link
Author

Yes, the server is running 1.6.2 version.

@costin
Copy link
Member

costin commented Jan 25, 2016

Pushed a dev build that should address the issue - can you please try it out?

@costin
Copy link
Member

costin commented Jan 25, 2016

It's the date that is important, more than the commit which only gets updates if the change is committed. If the code is not (for whatever reason), the commit signature will remain the same. In cases like these, I ended up publishing the build before committing hence why the git SHA is the same.

costin added a commit that referenced this issue Jan 25, 2016
[SPARK] Perform ES discovery before mapping discovery

relates #607
@costin
Copy link
Member

costin commented Jan 25, 2016

Published again another snapshot which should have the git SHA updated. Note the geo functionality is available only in the integration for Spark 1.3 or higher (if you are using DataFrames, you're fine).

@randallwhitman
Copy link
Author

16/01/25 14:46:31 INFO Version: Elasticsearch Hadoop v2.2.0.BUILD-SNAPSHOT [830dff9847]

�[31m  org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Unknown GeoShape [{coordinates=[[[50, 32], [69, 32], [69, 50], [50, 50], [50, 32]]], type=Polygon, crs=null}]�[0m
�[31m  at org.elasticsearch.hadoop.serialization.dto.mapping.MappingUtils.doParseGeoShapeInfo(MappingUtils.java:245)�[0m
�[31m  at org.elasticsearch.hadoop.serialization.dto.mapping.MappingUtils.parseGeoInfo(MappingUtils.java:210)�[0m
�[31m  at org.elasticsearch.hadoop.rest.RestRepository.sampleGeoFields(RestRepository.java:447)�[0m
�[31m  at org.elasticsearch.spark.sql.SchemaUtils$.discoverMappingAsField(SchemaUtils.scala:82)�[0m
�[31m  at org.elasticsearch.spark.sql.SchemaUtils$.discoverMapping(SchemaUtils.scala:65)�[0m
�[31m  at org.elasticsearch.spark.sql.EsSparkSQL$.esDF(EsSparkSQL.scala:27)�[0m
�[31m  org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Unknown GeoShape [{coordinates=[[[50, 32], [69, 32], [69, 50], [50, 50], [50, 32]]], type=Polygon, crs=null}]�[0m
�[31m  at org.elasticsearch.hadoop.serialization.dto.mapping.MappingUtils.doParseGeoShapeInfo(MappingUtils.java:245)�[0m
�[31m  at org.elasticsearch.hadoop.serialization.dto.mapping.MappingUtils.parseGeoInfo(MappingUtils.java:210)�[0m
�[31m  at org.elasticsearch.hadoop.rest.RestRepository.sampleGeoFields(RestRepository.java:447)�[0m
�[31m  at org.elasticsearch.spark.sql.SchemaUtils$.discoverMappingAsField(SchemaUtils.scala:82)�[0m
�[31m  at org.elasticsearch.spark.sql.SchemaUtils$.discoverMapping(SchemaUtils.scala:65)�[0m
�[31m  at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema$lzycompute(DefaultSource.scala:104)�[0m
�[31m  at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema(DefaultSource.scala:104)�[0m
�[31m  at org.elasticsearch.spark.sql.ElasticsearchRelation$$anonfun$schema$1.apply(DefaultSource.scala:108)�[0m
�[31m  at org.elasticsearch.spark.sql.ElasticsearchRelation$$anonfun$schema$1.apply(DefaultSource.scala:108)�[0m

costin added a commit that referenced this issue Jan 25, 2016
@costin
Copy link
Member

costin commented Jan 25, 2016

The bug is caused by the shape name (which is expected to be lower case not mixed). Pushed a fix in master and just uploaded a new dev version. Please try it out.

@costin costin closed this as completed Jan 25, 2016
@costin costin reopened this Jan 25, 2016
@randallwhitman
Copy link
Author

I hit snags re-running tests - I will look again tomorrow.

@randallwhitman
Copy link
Author

I am consistently seeing a NoSuchMethodError, on index containing geo-shape, with both esDF and sqlContext.read.format("org.elasticsearch.spark.sql").

16/01/26 11:37:32 INFO Version: Elasticsearch Hadoop v2.2.0.BUILD-SNAPSHOT [ec94fe5ee9]

�[31m*** RUN ABORTED ***�[0m
�[31m  java.lang.NoSuchMethodError: org.apache.spark.sql.types.StructType.add(Ljava/lang/String;Lorg/apache/spark/sql/types/DataType;)Lorg/apache/spark/sql/types/StructType;�[0m
�[31m  at org.elasticsearch.spark.sql.SchemaUtils$.org$elasticsearch$spark$sql$SchemaUtils$$convertField(SchemaUtils.scala:160)�[0m
�[31m  at org.elasticsearch.spark.sql.SchemaUtils$$anonfun$1.apply(SchemaUtils.scala:106)�[0m
�[31m  at org.elasticsearch.spark.sql.SchemaUtils$$anonfun$1.apply(SchemaUtils.scala:106)�[0m
�[31m  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)�[0m
�[31m  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)�[0m
�[31m  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)�[0m
�[31m  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)�[0m
�[31m  at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)�[0m
�[31m  at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)�[0m
�[31m  at org.elasticsearch.spark.sql.SchemaUtils$.convertToStruct(SchemaUtils.scala:106)�[0m
�[31m  ...�[0m

@costin
Copy link
Member

costin commented Jan 26, 2016

What version of Spark are you using?

@randallwhitman
Copy link
Author

I thought I was using Spark-1.4 but I will double-check by re-running with an explicit -Pprofile to maven.
(I have profiles for 1.[456] versions of Spark.)

costin added a commit that referenced this issue Jan 26, 2016
@costin
Copy link
Member

costin commented Jan 26, 2016

Should be fixed in master; also pushed a new dev build - can you please try it out?

Thanks,

@randallwhitman
Copy link
Author

With Spark-1.4:
16/01/26 16:23:47 INFO Version: Elasticsearch Hadoop v2.2.0.BUILD-SNAPSHOT [16f42acc9f]

16/01/26 16:27:37 INFO ScalaEsRowRDD: Reading from [tests1453854216791/tests1453854216791]
16/01/26 16:27:37 INFO ScalaEsRowRDD: Discovered mapping {tests1453854216791=[rect=GEO_SHAPE]} for [tests1453854216791/tests1453854216791]
16/01/26 16:27:37 ERROR Executor: Exception in task 4.0 in stage 3.0 (TID 11)
org.elasticsearch.hadoop.EsHadoopIllegalStateException: Position for 'rect.crs' not found in row; typically this is caused by a mapping inconsistency
    at org.elasticsearch.spark.sql.RowValueReader$class.addToBuffer(RowValueReader.scala:40)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.addToBuffer(ScalaEsRowValueReader.scala:14)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.addToMap(ScalaEsRowValueReader.scala:84)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:791)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:692)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:791)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:692)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:457)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:382)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:277)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:250)
    at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:456)
    at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:86)
    at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)

@costin
Copy link
Member

costin commented Jan 27, 2016

Can you please post your mapping and a sample data set along with a gist with the logs (TRACE level on the REST and Spark packages please)?
Something is clearly off somehow. See the integration tests running nightly.

Also do note that the data is expected to have the same format (since that's what Spark SQL expects). If your geo_shapes are of different type, I'm afraid there's not much we can do - not if you want to use DataFrames.
You'll have to resort to RDDs in that case...

@randallwhitman
Copy link
Author

In the geo-shape test, the test data is a single polygon.

    val rawShape = List(
"""{"rect":{"type":"Polygon","coordinates":[[[50,32],[69,32],[69,50],[50,50],[50,32]]],"crs":null}}""")
      val rdd1 = sc.parallelize(rawShape, 1)
      rdd1.saveJsonToEs(shapeResource, connectorConfig)

@costin
Copy link
Member

costin commented Jan 27, 2016

I realize that but the info it's not very helpful - the log and the mapping however are.
Cheers,

@randallwhitman
Copy link
Author

I won't be able to get to that right away.

costin added a commit that referenced this issue Jan 27, 2016
ES allows extra fields to be specified for geo types.

relates #607
@costin
Copy link
Member

costin commented Jan 27, 2016

Found out what the issue was - geo types for some reason accept custom fields (like crs in your example) they are ignored so there's no mapping nor they are expected. When the Spark integration encounters them it doesn't know what to do with them, hence the error and the exception.

I've pushed a fix for this and published a dev build - can you please try it out? (the usual drill :) ).

@randallwhitman
Copy link
Author

Right, GeoJson can contain "crs" and/or "bbox".

With that patch, now my test passed, thanks!

@costin
Copy link
Member

costin commented Jan 27, 2016

And there was much rejoicing. Let's give this some extra days to see whether it passes all your tests and then I'll close it down.

Cheers,

@randallwhitman
Copy link
Author

OK. When I println the result, I see
[[Polygon,ArrayBuffer(ArrayBuffer(ArrayBuffer(50, 32), ArrayBuffer(69, 32), ArrayBuffer(69, 50), ArrayBuffer(50, 50), ArrayBuffer(50, 32))),null]]
Is the order arbitrary, versus, is it guaranteed, to get type first and coordinates second?

@costin
Copy link
Member

costin commented Jan 27, 2016

There are no guarantees (we currently control the schema but that might change). However it should be irrelevant as one can get access to the items through the name.

@costin
Copy link
Member

costin commented Jan 29, 2016

Closing the issue.
Thanks @randallwhitman for the issue and your patience. Cheers!

@Bomb281993
Copy link

Hi @costin , through spark java we are also facing issues while pushing geo-shape to elastic search index .
It is giving error message - Failed to parse

@jbaiera
Copy link
Member

jbaiera commented Oct 31, 2018

@Bomb281993 Please refrain from mentioning users on old issues like this. If you are seeing errors with geo-shape indexing, please post those errors and a description of the problem on the forum or in a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants