Large number of fields in ES index exceeds URL query parameter length #942

cvjones17 · 2017-02-27T20:50:24Z

What kind an issue is this?

[] Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
The easier it is to track down the bug, the faster it is solved.
[ X] Feature Request. Start by telling us what problem you’re trying to solve.
Often a solution already exists! Don’t send pull requests to implement new features without
first getting our support. Sometimes we leave features out on purpose to keep the project small.

Issue description

I have an index with many long field names. When using Spark SQL via Spark Thrift Server, if I create a temporary view and then query it, I get an authorization error.

Through trial and error, I see that if I remove fields from the end, the query works. This points to a limit in the size of the URL being sent to ES, caused by lots of fields and long field names.

My question: is there a workaround, assuming that I can't reduce the number of fields or the size of the field names? For example, is there a parameter that can be specified to not use the query string for the underlying query to ES, but to use the query DSL as JSON? Or is there a better way to formulate the query to avoid this problem?

Description

Steps to reproduce

Code:

CREATE GLOBAL TEMPORARY VIEW view1 USING org.elasticsearch.spark.sql OPTIONS (resource 'es-index');
SELECT * from view1 limit 1;

Strack trace:

failed; server[docker-prod.west.usermind.com:8200] returned [401|Unauthorized:]
at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:488)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:446)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:436)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:363)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:92)
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I observe that the ES query is of the form es-index/_search?sort=doc&scroll=5m&size=50&source=field1,field2,field3,...&preference=shards%3A0%7Clocal

When the length of the query string exceeds around 4200 characters, the error presents.

Version Info

OS: : Mac OSX 10.12.3 (16D32)
JVM : 1.8.0_102
Hadoop/Spark: 2.6/2.1.0
ES-Hadoop : 5.0.2
ES : 5.0.2

Feature description

jbaiera · 2017-02-27T20:58:35Z

Right now there's no switch to convert the query parameters on the end of the scroll request to query DSL. I think it makes most sense to transition to submitting the search request as an HTTP body using the DSL instead of loading it all into the URL for resiliency purposes. I'll mark this as a bug until that gets resolved.

jbaiera · 2017-02-27T21:04:44Z

I opened #943

jbaiera · 2018-12-12T22:12:16Z

This issue was missed when #1154 was merged in. This has been fixed since 6.3.1 and 6.4.0.

jbaiera added :Rest bug labels Feb 27, 2017

jbaiera closed this as completed Dec 12, 2018

jbaiera added v6.4.0 v6.3.1 labels Dec 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large number of fields in ES index exceeds URL query parameter length #942

Large number of fields in ES index exceeds URL query parameter length #942

cvjones17 commented Feb 27, 2017

jbaiera commented Feb 27, 2017

jbaiera commented Feb 27, 2017

jbaiera commented Dec 12, 2018

Large number of fields in ES index exceeds URL query parameter length #942

Large number of fields in ES index exceeds URL query parameter length #942

Comments

cvjones17 commented Feb 27, 2017

What kind an issue is this?

Issue description

Steps to reproduce

Version Info

Feature description

jbaiera commented Feb 27, 2017

jbaiera commented Feb 27, 2017

jbaiera commented Dec 12, 2018