Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large number of fields in ES index exceeds URL query parameter length #942

Closed
cvjones17 opened this issue Feb 27, 2017 · 3 comments
Closed

Comments

@cvjones17
Copy link

What kind an issue is this?

  • [] Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
    The easier it is to track down the bug, the faster it is solved.
  • [ X] Feature Request. Start by telling us what problem you’re trying to solve.
    Often a solution already exists! Don’t send pull requests to implement new features without
    first getting our support. Sometimes we leave features out on purpose to keep the project small.

Issue description

I have an index with many long field names. When using Spark SQL via Spark Thrift Server, if I create a temporary view and then query it, I get an authorization error.

Through trial and error, I see that if I remove fields from the end, the query works. This points to a limit in the size of the URL being sent to ES, caused by lots of fields and long field names.

My question: is there a workaround, assuming that I can't reduce the number of fields or the size of the field names? For example, is there a parameter that can be specified to not use the query string for the underlying query to ES, but to use the query DSL as JSON? Or is there a better way to formulate the query to avoid this problem?

Description

Steps to reproduce

Code:

CREATE GLOBAL TEMPORARY VIEW view1 USING org.elasticsearch.spark.sql OPTIONS (resource 'es-index');
SELECT * from view1 limit 1;

Strack trace:

failed; server[docker-prod.west.usermind.com:8200] returned [401|Unauthorized:]
at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:488)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:446)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:436)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:363)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:92)
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I observe that the ES query is of the form es-index/_search?sort=doc&scroll=5m&size=50&source=field1,field2,field3,...&preference=shards%3A0%7Clocal

When the length of the query string exceeds around 4200 characters, the error presents.

Version Info

OS: : Mac OSX 10.12.3 (16D32)
JVM : 1.8.0_102
Hadoop/Spark: 2.6/2.1.0
ES-Hadoop : 5.0.2
ES : 5.0.2

Feature description

@jbaiera
Copy link
Member

jbaiera commented Feb 27, 2017

Right now there's no switch to convert the query parameters on the end of the scroll request to query DSL. I think it makes most sense to transition to submitting the search request as an HTTP body using the DSL instead of loading it all into the URL for resiliency purposes. I'll mark this as a bug until that gets resolved.

@jbaiera
Copy link
Member

jbaiera commented Feb 27, 2017

I opened #943

@jbaiera
Copy link
Member

jbaiera commented Dec 12, 2018

This issue was missed when #1154 was merged in. This has been fixed since 6.3.1 and 6.4.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants