New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SELECT * FROM tabletest WHERE col1 IN (0,10,5,27 ) #615

Closed
hdominguez-stratio opened this Issue Nov 25, 2015 · 6 comments

Comments

Projects
None yet
3 participants
@hdominguez-stratio

hdominguez-stratio commented Nov 25, 2015

When I try to execute a query with an IN operator, if the column is of type LONG, the datasource throws the next exception:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.util.TaskCompletionListenerException: SearchPhaseExecutionException[Failed to execute phase [init_scan], all shards failed; shardFailures {[W61yWz-NRv6PzH1kPzisiA][databasetest][0]: SearchParseException[[databasetest][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"filtered":{ "query":{"match_all":{}},"filter": { "and" : [ {"query":{"match":{"ident":"0 10 5 27"}}} ] } }}}]]]; nested: NumberFormatException[For input string: "0 10 5 27"]; }]
        at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:90)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:909)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:908)
        at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:177)
        at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
        at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
        at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1903)
        at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1384)

The column is of type LONG.

The executed test is the next (CUCUMBER FORMAT)

  Scenario: SELECT * FROM tabletest WHERE ident IN (0,10,5,27);
    When I execute 'SELECT * FROM tabletest WHERE ident IN (0,10,5,27)'
    Then The result has to have '2' rows ignoring the order:
      | ident-long | name-string   | money-double  |  new-boolean  | date-date  |
      |    0       | name_0        | 10.2          |  true         | 1999-11-30 |
      |    5       | name_5        | 15.2          |  true         | 2005-05-05 |
@hdominguez-stratio

This comment has been minimized.

Show comment
Hide comment
@hdominguez-stratio

hdominguez-stratio Nov 25, 2015

When I try the same but with a column of type double or date, a similar exception is thrown

hdominguez-stratio commented Nov 25, 2015

When I try the same but with a column of type double or date, a similar exception is thrown

@costin

This comment has been minimized.

Show comment
Hide comment
@costin

costin Nov 25, 2015

Member

What version of es-hadoop are you using?
On Nov 25, 2015 7:19 PM, "Hugo Domínguez Sanz" notifications@github.com
wrote:

When I try to execute a query with an IN operator, if the column is of
type LONG, the datasource throws the next exception:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage
0.0 (TID 0, localhost):
org.apache.spark.util.TaskCompletionListenerException:
SearchPhaseExecutionException[Failed to execute phase [init_scan], all
shards failed; shardFailures {[W61yWz-NRv6PzH1kPzisiA][databasetest][0]:
SearchParseException[[databasetest][0]: from[-1],size[-1]: Parse Failure
[Failed to parse source [{"query":{"filtered":{
"query":{"match_all":{}},"filter": { "and" : [
{"query":{"match":{"ident":"0 10 5 27"}}} ] } }}}]]]; nested:
NumberFormatException[For input string: "0 10 5 27"]; }]
at
org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:90)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:909)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.collect(RDD.scala:908)
at
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:177)
at
org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
at
org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1903)
at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1384)

The column is of type LONG.

The executed test is the next (CUCUMBER FORMAT)

Scenario: SELECT * FROM tabletest WHERE ident IN (0,10,5,27);
When I execute 'SELECT * FROM tabletest WHERE ident IN (0,10,5,27)'
Then The result has to have '2' rows ignoring the order:
| ident-long | name-string | money-double | new-boolean | date-date |
| 0 | name_0 | 10.2 | true | 1999-11-30 |
| 5 | name_5 | 15.2 | true | 2005-05-05 |


Reply to this email directly or view it on GitHub
#615.

Member

costin commented Nov 25, 2015

What version of es-hadoop are you using?
On Nov 25, 2015 7:19 PM, "Hugo Domínguez Sanz" notifications@github.com
wrote:

When I try to execute a query with an IN operator, if the column is of
type LONG, the datasource throws the next exception:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage
0.0 (TID 0, localhost):
org.apache.spark.util.TaskCompletionListenerException:
SearchPhaseExecutionException[Failed to execute phase [init_scan], all
shards failed; shardFailures {[W61yWz-NRv6PzH1kPzisiA][databasetest][0]:
SearchParseException[[databasetest][0]: from[-1],size[-1]: Parse Failure
[Failed to parse source [{"query":{"filtered":{
"query":{"match_all":{}},"filter": { "and" : [
{"query":{"match":{"ident":"0 10 5 27"}}} ] } }}}]]]; nested:
NumberFormatException[For input string: "0 10 5 27"]; }]
at
org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:90)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:909)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.collect(RDD.scala:908)
at
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:177)
at
org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
at
org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1903)
at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1384)

The column is of type LONG.

The executed test is the next (CUCUMBER FORMAT)

Scenario: SELECT * FROM tabletest WHERE ident IN (0,10,5,27);
When I execute 'SELECT * FROM tabletest WHERE ident IN (0,10,5,27)'
Then The result has to have '2' rows ignoring the order:
| ident-long | name-string | money-double | new-boolean | date-date |
| 0 | name_0 | 10.2 | true | 1999-11-30 |
| 5 | name_5 | 15.2 | true | 2005-05-05 |


Reply to this email directly or view it on GitHub
#615.

@hdominguez-stratio

This comment has been minimized.

Show comment
Hide comment
@hdominguez-stratio

hdominguez-stratio Nov 26, 2015

I'm using the version 2.1.1 and 1.7.3 of elasticsearch

hdominguez-stratio commented Nov 26, 2015

I'm using the version 2.1.1 and 1.7.3 of elasticsearch

@costin

This comment has been minimized.

Show comment
Hide comment
@costin

costin Nov 26, 2015

Member

Please use 2.1.2 as it likely fixed your issue.

Member

costin commented Nov 26, 2015

Please use 2.1.2 as it likely fixed your issue.

@hdominguez-stratio

This comment has been minimized.

Show comment
Hide comment
@hdominguez-stratio

hdominguez-stratio Nov 26, 2015

Thanks, in version 2.1.2 is fixed, but with DATE type does not works.

ES MAPPING:

{"databasetest":{"mappings":{"tabletest":{"properties":{"date":{"type":"date","format":"dateOptionalTime"},"ident":{"type":"long"},"money":{"type":"double"},"name":{"type":"string"},"new":{"type":"boolean"}}}}}}

CUCUMBER TEST:

 Scenario: [ES] SELECT date FROM tabletest WHERE date IN ('1999-11-30','1998-12-25','2005-05-05','2008-2-27');
    When I execute 'SELECT date FROM tabletest WHERE date IN ('1999-11-30','1998-12-25','2005-05-05','2008-2-27')'
    Then The result has to have '2' rows ignoring the order:
       | date-date  |
       | 1999-11-30 |
       | 2005-05-05 |
EXCEPTION:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 26.0 failed 1 times, most recent failure: Lost task 0.0 in stage 26.0 (TID 85, localhost): org.apache.spark.util.TaskCompletionListenerException: SearchPhaseExecutionException[Failed to execute phase [init_scan], all shards failed; shardFailures {[OPZ3P8qmTFSGSYiqg_Z5VA][databasetest][0]: SearchParseException[[databasetest][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"filtered":{ "query":{"match_all":{}},"filter": { "and" : [ {"or":{"filters":[{"query":{"match":{"date":""1999-11-30T00:00:00+01:00" "1998-12-25T00:00:00+01:00" "2005-05-05T00:00:00+02:00" "2008-02-27T00:00:00+01:00""}}}]}} ] } }}}]]]; nested: QueryParsingException[[databasetest] Failed to parse]; nested: JsonParseException[Unexpected character ('1' (code 49)): was expecting comma to separate OBJECT entries
 at [Source: [B@3a91aa51; line: 1, column: 118]]; }]
    at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:90)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
    at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:215)
    at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:207)
    at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
    at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
    at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1903)
    at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1384)

I do not know if this issue could be an Spark issue.

hdominguez-stratio commented Nov 26, 2015

Thanks, in version 2.1.2 is fixed, but with DATE type does not works.

ES MAPPING:

{"databasetest":{"mappings":{"tabletest":{"properties":{"date":{"type":"date","format":"dateOptionalTime"},"ident":{"type":"long"},"money":{"type":"double"},"name":{"type":"string"},"new":{"type":"boolean"}}}}}}

CUCUMBER TEST:

 Scenario: [ES] SELECT date FROM tabletest WHERE date IN ('1999-11-30','1998-12-25','2005-05-05','2008-2-27');
    When I execute 'SELECT date FROM tabletest WHERE date IN ('1999-11-30','1998-12-25','2005-05-05','2008-2-27')'
    Then The result has to have '2' rows ignoring the order:
       | date-date  |
       | 1999-11-30 |
       | 2005-05-05 |
EXCEPTION:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 26.0 failed 1 times, most recent failure: Lost task 0.0 in stage 26.0 (TID 85, localhost): org.apache.spark.util.TaskCompletionListenerException: SearchPhaseExecutionException[Failed to execute phase [init_scan], all shards failed; shardFailures {[OPZ3P8qmTFSGSYiqg_Z5VA][databasetest][0]: SearchParseException[[databasetest][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"filtered":{ "query":{"match_all":{}},"filter": { "and" : [ {"or":{"filters":[{"query":{"match":{"date":""1999-11-30T00:00:00+01:00" "1998-12-25T00:00:00+01:00" "2005-05-05T00:00:00+02:00" "2008-02-27T00:00:00+01:00""}}}]}} ] } }}}]]]; nested: QueryParsingException[[databasetest] Failed to parse]; nested: JsonParseException[Unexpected character ('1' (code 49)): was expecting comma to separate OBJECT entries
 at [Source: [B@3a91aa51; line: 1, column: 118]]; }]
    at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:90)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
    at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:215)
    at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:207)
    at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
    at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
    at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1903)
    at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1384)

I do not know if this issue could be an Spark issue.

@ebuildy

This comment has been minimized.

Show comment
Hide comment
@ebuildy

ebuildy Dec 28, 2015

Contributor

Maybe a comma is missing here:

https://github.com/elastic/elasticsearch-hadoop/blob/master/spark/sql-13/src/main/scala/org/elasticsearch/spark/sql/DefaultSource.scala#L261

strings.mkString("\"", " ", "\"") ==> strings.mkString("\"", ",", "\"")

?

Contributor

ebuildy commented Dec 28, 2015

Maybe a comma is missing here:

https://github.com/elastic/elasticsearch-hadoop/blob/master/spark/sql-13/src/main/scala/org/elasticsearch/spark/sql/DefaultSource.scala#L261

strings.mkString("\"", " ", "\"") ==> strings.mkString("\"", ",", "\"")

?

costin added a commit that referenced this issue Dec 29, 2015

[SPARK] Use terms query for date types
When using Date types with Spark IN filter, apply a terms query
 instead of match

relates #615

@costin costin closed this Jan 8, 2016

costin added a commit that referenced this issue Jan 16, 2016

[SPARK] Use terms query for date types
When using Date types with Spark IN filter, apply a terms query
 instead of match

relates #615

(cherry picked from commit 6b212e7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment