You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[] Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
The easier it is to track down the bug, the faster it is solved.
[ X] Feature Request. Start by telling us what problem you’re trying to solve.
Often a solution already exists! Don’t send pull requests to implement new features without
first getting our support. Sometimes we leave features out on purpose to keep the project small.
Issue description
I have an index with many long field names. When using Spark SQL via Spark Thrift Server, if I create a temporary view and then query it, I get an authorization error.
Through trial and error, I see that if I remove fields from the end, the query works. This points to a limit in the size of the URL being sent to ES, caused by lots of fields and long field names.
My question: is there a workaround, assuming that I can't reduce the number of fields or the size of the field names? For example, is there a parameter that can be specified to not use the query string for the underlying query to ES, but to use the query DSL as JSON? Or is there a better way to formulate the query to avoid this problem?
Description
Steps to reproduce
Code:
CREATE GLOBAL TEMPORARY VIEW view1 USING org.elasticsearch.spark.sql OPTIONS (resource 'es-index');
SELECT * from view1 limit 1;
Strack trace:
failed; server[docker-prod.west.usermind.com:8200] returned [401|Unauthorized:]
at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:488)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:446)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:436)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:363)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:92)
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I observe that the ES query is of the form es-index/_search?sort=doc&scroll=5m&size=50&source=field1,field2,field3,...&preference=shards%3A0%7Clocal
When the length of the query string exceeds around 4200 characters, the error presents.
Version Info
OS: : Mac OSX 10.12.3 (16D32)
JVM : 1.8.0_102
Hadoop/Spark: 2.6/2.1.0
ES-Hadoop : 5.0.2
ES : 5.0.2
Feature description
The text was updated successfully, but these errors were encountered:
Right now there's no switch to convert the query parameters on the end of the scroll request to query DSL. I think it makes most sense to transition to submitting the search request as an HTTP body using the DSL instead of loading it all into the URL for resiliency purposes. I'll mark this as a bug until that gets resolved.
What kind an issue is this?
The easier it is to track down the bug, the faster it is solved.
Often a solution already exists! Don’t send pull requests to implement new features without
first getting our support. Sometimes we leave features out on purpose to keep the project small.
Issue description
I have an index with many long field names. When using Spark SQL via Spark Thrift Server, if I create a temporary view and then query it, I get an authorization error.
Through trial and error, I see that if I remove fields from the end, the query works. This points to a limit in the size of the URL being sent to ES, caused by lots of fields and long field names.
My question: is there a workaround, assuming that I can't reduce the number of fields or the size of the field names? For example, is there a parameter that can be specified to not use the query string for the underlying query to ES, but to use the query DSL as JSON? Or is there a better way to formulate the query to avoid this problem?
Description
Steps to reproduce
Code:
CREATE GLOBAL TEMPORARY VIEW view1 USING org.elasticsearch.spark.sql OPTIONS (resource 'es-index');
SELECT * from view1 limit 1;
Strack trace:
failed; server[docker-prod.west.usermind.com:8200] returned [401|Unauthorized:]
at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:488)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:446)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:436)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:363)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:92)
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I observe that the ES query is of the form es-index/_search?sort=doc&scroll=5m&size=50&source=field1,field2,field3,...&preference=shards%3A0%7Clocal
When the length of the query string exceeds around 4200 characters, the error presents.
Version Info
OS: : Mac OSX 10.12.3 (16D32)
JVM : 1.8.0_102
Hadoop/Spark: 2.6/2.1.0
ES-Hadoop : 5.0.2
ES : 5.0.2
Feature description
The text was updated successfully, but these errors were encountered: