You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
The easier it is to track down the bug, the faster it is solved.
Issue description
The following exception is thrown when calling saveToEs on a DataFrame, if the operation is upsert, a script and params are provided and if one of the columns of the dataframe is a timestamp.
java.lang.ClassCastException: java.sql.Timestamp cannot be cast to scala.Tuple2
Steps to reproduce
case class TestDoc(id: Int, SomeString: String, Date: java.sql.Timestamp)
val ts1 = new Timestamp(cal.getTime.getTime)
val ds = List(TestDoc(1234567,"SECOND",ts1)).toDF()
val script = "if (ctx._source.new_UpdateDate > new_UpdateDate) {ctx.op = 'none'} else {ctx._source.id = new_id;ctx._source.SomeString= new_SomeString;ctx._source.Date = new_Date}"
val params = "new_id:id,new_SomeString:SomeString,new_Date:Date"
ds.saveToEs("ipg4/doc", Map("es.mapping.id" -> "id", "es.write.operation" -> "upsert", "es.update.script"-> script, "es.update.script.params" -> params))
Strack trace:
java.lang.ClassCastException: java.sql.Timestamp cannot be cast to scala.Tuple2
at org.elasticsearch.spark.sql.DataFrameValueWriter.write(DataFrameValueWriter.scala:55)
at org.elasticsearch.hadoop.serialization.bulk.AbstractBulkFactory$FieldWriter.doWrite(AbstractBulkFactory.java:130)
at org.elasticsearch.hadoop.serialization.bulk.AbstractBulkFactory$FieldWriter.write(AbstractBulkFactory.java:102)
at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.writeTemplate(TemplatedBulk.java:80)
at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(TemplatedBulk.java:56)
at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:159)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:67)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:94)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Version Info
OS: Windows 8
JVM : 1.8.0_111
Hadoop/Spark: 2.1.0
ES-Hadoop : 5.0.0
ES : 2.3.5
The text was updated successfully, but these errors were encountered:
It seems that the field extraction code that pulls values out of the document and into the parameter string for the bulk update line is not handling the Timestamp object correctly. It looks like it's passing it to the value writer to be serialized, which in this case, expects a Tuple2 of Row and StructType types. Definitely a bug! Thanks for the easy to reproduce steps!
I submitted a PR to add support for Timestamp and Date, to match the behavior of the DataFrameValueWriter which converts them to milliseconds. I tested it locally and it works but you will probably want to review it as well :)
What kind an issue is this?
The easier it is to track down the bug, the faster it is solved.
Issue description
The following exception is thrown when calling saveToEs on a DataFrame, if the operation is upsert, a script and params are provided and if one of the columns of the dataframe is a timestamp.
java.lang.ClassCastException: java.sql.Timestamp cannot be cast to scala.Tuple2
Steps to reproduce
Strack trace:
Version Info
OS: Windows 8
JVM : 1.8.0_111
Hadoop/Spark: 2.1.0
ES-Hadoop : 5.0.0
ES : 2.3.5
The text was updated successfully, but these errors were encountered: