Skip to content

ISO8601 dates with timezone offset considered invalid #458

@bobrik

Description

@bobrik

I'm using 2.1.0Beta4 and Spark 1.3.1:

JavaEsSparkSQL.esDF().saveAsParquetFile()

Fails with error:

15/05/26 14:55:49 ERROR Executor: Exception in task 4.0 in stage 0.0 (TID 4)
java.lang.IllegalArgumentException: 2015-05-25T22:30:00+0300
    at com.sun.org.apache.xerces.internal.jaxp.datatype.XMLGregorianCalendarImpl$Parser.skip(XMLGregorianCalendarImpl.java:2844)
    at com.sun.org.apache.xerces.internal.jaxp.datatype.XMLGregorianCalendarImpl$Parser.parse(XMLGregorianCalendarImpl.java:2808)
    at com.sun.org.apache.xerces.internal.jaxp.datatype.XMLGregorianCalendarImpl.<init>(XMLGregorianCalendarImpl.java:435)
    at com.sun.org.apache.xerces.internal.jaxp.datatype.DatatypeFactoryImpl.newXMLGregorianCalendar(DatatypeFactoryImpl.java:536)
    at javax.xml.bind.DatatypeConverterImpl._parseDateTime(DatatypeConverterImpl.java:422)
    at javax.xml.bind.DatatypeConverterImpl.parseDateTime(DatatypeConverterImpl.java:417)
    at javax.xml.bind.DatatypeConverter.parseDateTime(DatatypeConverter.java:327)
    at org.elasticsearch.spark.serialization.ScalaValueReader.parseDate(ScalaValueReader.scala:113)
    at org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:106)
    at org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:106)
    at org.elasticsearch.spark.serialization.ScalaValueReader.checkNull(ScalaValueReader.scala:58)
    at org.elasticsearch.spark.serialization.ScalaValueReader.date(ScalaValueReader.scala:106)
    at org.elasticsearch.spark.serialization.ScalaValueReader.readValue(ScalaValueReader.scala:46)
    at org.elasticsearch.hadoop.serialization.ScrollReader.parseValue(ScrollReader.java:540)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:528)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:596)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:519)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:339)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:290)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:185)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:164)
    at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:403)
    at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:76)
    at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:46)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)
    at org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:669)
    at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
    at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions