# 05. Date and time in Spark data frames

Date and time column are essential for time series data. In spark, we can express date and time with **four different data types**: 
- String : Most commun format, no compability issues. But need to be parsed to long or timestamp types to do arithmetic operations
- Long : in spark, it represents **unix_timestamp** implementation. it uses seconds since the UNIX epoch
- timestamp :  spark has its own implementation to represent timestamp. Represents values comprising values of fields year, month, day, hour, minute, and second, with the **session local time-zone**. The timestamp value represents an absolute point in time.
- date: Represents values comprising values of fields year, month and day, without a time-zone.

Below, we will show how to convert th

In [1]:
from pyspark.sql import SparkSession, DataFrame
from pyspark.sql.functions import lit, unix_timestamp, col, from_unixtime, to_timestamp, date_format, to_date

In [2]:
local = True

if local:
    spark=SparkSession.builder.master("local[4]").appName("DateAndTime").getOrCreate()
else:
    spark = SparkSession \
    .builder.master("k8s://https://kubernetes.default.svc:443") \
    .appName("SparkArrowCompression") \
    .config("spark.kubernetes.container.image", "inseefrlab/jupyter-datascience:master") \
    .config("spark.kubernetes.authenticate.driver.serviceAccountName", os.environ['KUBERNETES_SERVICE_ACCOUNT']) \
    .config("spark.executor.instances", "4") \
    .config("spark.executor.memory","8g") \
    .config("spark.kubernetes.namespace", os.environ['KUBERNETES_NAMESPACE']) \
    .getOrCreate()

data = [("2019-07-01 12:01:19", "07-01-2019 12:01:19.888", "07-01-2019"),
            ("2018-07-01 12:01:19", "07-01-2018 12:01:19.666", "07-01-2018"),
            ("2017-07-01 12:01:19", "07-01-2017 12:01:19.111", "07-01-2017")]
columns = ["timestamp_1", "timestamp_2", "timestamp_3"]

In [5]:
df=spark.createDataFrame(data=data,schema=columns)
df.show()
df.printSchema()

+-------------------+--------------------+-----------+
|        timestamp_1|         timestamp_2|timestamp_3|
+-------------------+--------------------+-----------+
|2019-07-01 12:01:19|07-01-2019 12:01:...| 07-01-2019|
|2018-07-01 12:01:19|07-01-2018 12:01:...| 07-01-2018|
|2017-07-01 12:01:19|07-01-2017 12:01:...| 07-01-2017|
+-------------------+--------------------+-----------+

root
 |-- timestamp_1: string (nullable = true)
 |-- timestamp_2: string (nullable = true)
 |-- timestamp_3: string (nullable = true)



## 5.1 Convert string date, timestamp to long (seconds since the UNIX epoch)

To convert string to long, we can use unix_timestamp() function, it takes a string date, and a date format, if the date format is not provided, the default format (yyyy-MM-dd HH:mm:ss) will be used. Then the string date are converted to Unix timestamp (in seconds) by using the current timezone of the system. 

Note you can set the timezone of your spark session by using **spark.conf.set("spark.sql.session.timeZone", "<time_zone>")**, for example spark.conf.set("spark.sql.session.timeZone", "US/Pacific").

1: def unix_timestamp(): returns the current time in seconds (LongType)
2) def unix_timestamp(date: Column): take the string column date as input, output it in seconds, it will use default date format (yyyy-MM-dd HH:mm:ss) to parse the string. 
3) def unix_timestamp(date: Column, format: String): we can specify explicitly the format of the date string, if the default format does not fit.

In [7]:
# We add a new column that is filled with the current time
# note the column type
df1 = df.withColumn("timestamp_4", lit(unix_timestamp()))
df1.printSchema()
df1.show()


root
 |-- timestamp_1: string (nullable = true)
 |-- timestamp_2: string (nullable = true)
 |-- timestamp_3: string (nullable = true)
 |-- timestamp_4: long (nullable = true)

+-------------------+--------------------+-----------+-----------+
|        timestamp_1|         timestamp_2|timestamp_3|timestamp_4|
+-------------------+--------------------+-----------+-----------+
|2019-07-01 12:01:19|07-01-2019 12:01:...| 07-01-2019| 1632594547|
|2018-07-01 12:01:19|07-01-2018 12:01:...| 07-01-2018| 1632594547|
|2017-07-01 12:01:19|07-01-2017 12:01:...| 07-01-2017| 1632594547|
+-------------------+--------------------+-----------+-----------+



In [8]:
# note the output dataframe has timestamp 1 and timestamp 2 correct. But timestamp 3 wrong. 
# we did not give format for timestamp_1 and timestamp_3, it works for timestamp_1 because it uses the default format is yyyy-MM-dd HH:mm:ss
# timestamp_3 does not work, because it does not use the default date format.
# for timestamp_2, we specify a date format, if we don't give the format, it returns null, because the string does not have
# the default date format
# If we give a format that is wrong, the spark job will fail. For example, try to remove SSS from timestamp_2 format, and see what happens

df2 = df.withColumn("timestamp_1", unix_timestamp("timestamp_1")) \
        .withColumn("timestamp_2", unix_timestamp("timestamp_2", "MM-dd-yyyy HH:mm:ss.SSS")) \
        .withColumn("timestamp_3", unix_timestamp("timestamp_3"))
print("Exp1: covert date string to seconds with some erreurs")
df2.printSchema()
df2.show()

Exp1: covert date string to seconds with some erreurs
root
 |-- timestamp_1: long (nullable = true)
 |-- timestamp_2: long (nullable = true)
 |-- timestamp_3: long (nullable = true)

+-----------+-----------+-----------+
|timestamp_1|timestamp_2|timestamp_3|
+-----------+-----------+-----------+
| 1561982479| 1561982479|       null|
| 1530446479| 1530446479|       null|
| 1498910479| 1498910479|       null|
+-----------+-----------+-----------+



In [10]:
# below, we give each one a correct date format.
df3 = df.withColumn("timestamp_1", unix_timestamp("timestamp_1")) \
        .withColumn("timestamp_2", unix_timestamp("timestamp_2", "MM-dd-yyyy HH:mm:ss.SSS")) \
        .withColumn("timestamp_3", unix_timestamp("timestamp_3", "MM-dd-yyyy")) \
        .withColumn("timestamp_4", lit(unix_timestamp()))
print("Exp1: covert date string to seconds with success")
df3.printSchema()
df3.show()

Exp1: covert date string to seconds with success
root
 |-- timestamp_1: long (nullable = true)
 |-- timestamp_2: long (nullable = true)
 |-- timestamp_3: long (nullable = true)
 |-- timestamp_4: long (nullable = true)

+-----------+-----------+-----------+-----------+
|timestamp_1|timestamp_2|timestamp_3|timestamp_4|
+-----------+-----------+-----------+-----------+
| 1561982479| 1561982479| 1561939200| 1632594849|
| 1530446479| 1530446479| 1530403200| 1632594849|
| 1498910479| 1498910479| 1498867200| 1632594849|
+-----------+-----------+-----------+-----------+



## 5.2 Convert long(seconds since the UNIX epoch) back to string date

We can use from_unixtime(unix_time: Column, format: String) to convert long back to string


In [13]:
# Convert unix timestamp in second to string date
df4 = df3.select(
        from_unixtime(col("timestamp_1")).alias("timestamp_1"),
        from_unixtime(col("timestamp_2"), "MM-dd-yyyy HH:mm:ss").alias("timestamp_2"),
        from_unixtime(col("timestamp_3"), "MM-dd-yyyy").alias("timestamp_3"),
        from_unixtime(col("timestamp_4")).alias("timestamp_4")
    )
df4.printSchema()
df4.show(truncate=False)

root
 |-- timestamp_1: string (nullable = true)
 |-- timestamp_2: string (nullable = true)
 |-- timestamp_3: string (nullable = true)
 |-- timestamp_4: string (nullable = true)

+-------------------+-------------------+-----------+-------------------+
|timestamp_1        |timestamp_2        |timestamp_3|timestamp_4        |
+-------------------+-------------------+-----------+-------------------+
|2019-07-01 12:01:19|07-01-2019 12:01:19|07-01-2019 |2021-09-25 18:39:03|
|2018-07-01 12:01:19|07-01-2018 12:01:19|07-01-2018 |2021-09-25 18:39:03|
|2017-07-01 12:01:19|07-01-2017 12:01:19|07-01-2017 |2021-09-25 18:39:03|
+-------------------+-------------------+-----------+-------------------+



## 5.3 Convert string to spark timestamp

Spark provides its own timestamp type. Instead of long, we can also use spark timestamp to store time data.

To convert a string date to spark timestamp, we can use:
- to_timestamp(col, format): the default format is yyyy-MM-dd HH:mm:ss. If it failed, return null.


In [15]:
# to_timestamp() works similar to unix_timestamp(). For timestamp_1, it has the default format, so no need to specify the format
# For 2 and 3, we need to specify the date format. Otherwise, it returns null

# note the output column type

df5 = df.withColumn("timestamp_1", to_timestamp("timestamp_1")) \
        .withColumn("timestamp_2", to_timestamp("timestamp_2", "MM-dd-yyyy HH:mm:ss.SSS")) \
        .withColumn("timestamp_3", to_timestamp("timestamp_3", "MM-dd-yyyy"))

df5.printSchema()
df5.show(truncate=False)

root
 |-- timestamp_1: timestamp (nullable = true)
 |-- timestamp_2: timestamp (nullable = true)
 |-- timestamp_3: timestamp (nullable = true)

+-------------------+-----------------------+-------------------+
|timestamp_1        |timestamp_2            |timestamp_3        |
+-------------------+-----------------------+-------------------+
|2019-07-01 12:01:19|2019-07-01 12:01:19.888|2019-07-01 00:00:00|
|2018-07-01 12:01:19|2018-07-01 12:01:19.666|2018-07-01 00:00:00|
|2017-07-01 12:01:19|2017-07-01 12:01:19.111|2017-07-01 00:00:00|
+-------------------+-----------------------+-------------------+



In [20]:
# we can also use to_timestamp in a sql 
spark.sql("select to_timestamp('06-24-2019 12:01:19.000','MM-dd-yyyy HH:mm:ss.SSSS') as timestamp")

DataFrame[timestamp: timestamp]

## 5.4 Convert spark timestamp back to string 

To convert timestamp back to string, we have two options 
- cast("String"): It converts a timestamp column to default date format yyyy-MM-dd HH:mm:ss
- date_format(column,format): It converts a timestamp column to any Java Date formats specified in DateTimeFormatter.

### 5.4.1 Use cast("String")

In [18]:
# convert spark timestamp back to string by using cast()
# note they all use the default date format.
df6 = df5.select(col("timestamp_2").cast("string"), \
                col("timestamp_2").cast("string"))
df6.printSchema()
df6.show(truncate=False)

root
 |-- timestamp_2: string (nullable = true)
 |-- timestamp_2: string (nullable = true)

+-----------------------+-----------------------+
|timestamp_2            |timestamp_2            |
+-----------------------+-----------------------+
|2019-07-01 12:01:19.888|2019-07-01 12:01:19.888|
|2018-07-01 12:01:19.666|2018-07-01 12:01:19.666|
|2017-07-01 12:01:19.111|2017-07-01 12:01:19.111|
+-----------------------+-----------------------+



### 5.4.2 Use date_format()

In [19]:
# use date_format() to convert spark timestamp to various string date
df7 = df5.withColumn("str_date_yyyy_MM_dd", date_format(col("timestamp_2"), "yyyy MM dd")) \
        .withColumn("str_date_MM/dd/yyyy_hh:mm", date_format(col("timestamp_2"), "MM/dd/yyyy hh:mm")) \
        .withColumn("str_date_yyyy_MMM_dd", date_format(col("timestamp_2"), "yyyy MMM dd")) \
        .withColumn("str_date_yyyy_MMMM_dd_E", date_format(col("timestamp_2"), "yyyy MMMM dd E"))
    
df7.printSchema()
df7.show(truncate=False)

root
 |-- timestamp_1: timestamp (nullable = true)
 |-- timestamp_2: timestamp (nullable = true)
 |-- timestamp_3: timestamp (nullable = true)
 |-- str_date_yyyy_MM_dd: string (nullable = true)
 |-- str_date_MM/dd/yyyy_hh:mm: string (nullable = true)
 |-- str_date_yyyy_MMM_dd: string (nullable = true)
 |-- str_date_yyyy_MMMM_dd_E: string (nullable = true)

+-------------------+-----------------------+-------------------+-------------------+-------------------------+--------------------+-----------------------+
|timestamp_1        |timestamp_2            |timestamp_3        |str_date_yyyy_MM_dd|str_date_MM/dd/yyyy_hh:mm|str_date_yyyy_MMM_dd|str_date_yyyy_MMMM_dd_E|
+-------------------+-----------------------+-------------------+-------------------+-------------------------+--------------------+-----------------------+
|2019-07-01 12:01:19|2019-07-01 12:01:19.888|2019-07-01 00:00:00|2019 07 01         |07/01/2019 12:01         |2019 Jul 01         |2019 July 01 Mon       |
|2018-07-01 1

## use to_date() function to convert string to date type

In [24]:
data1=[["02-03-2013"], 
       ["05-06-2023"]]
columns1=["input"]
df8 = spark.createDataFrame(data=data1,schema=columns1 )
df9 = df8.withColumn("date", to_date("input", "MM-dd-yyyy"))

df9.printSchema()
df9.show()

root
 |-- input: string (nullable = true)
 |-- date: date (nullable = true)



Py4JJavaError: An error occurred while calling o264.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 37.0 failed 1 times, most recent failure: Lost task 0.0 in stage 37.0 (TID 73) (jupyter.jupyter-831257.user-pengfei.svc.cluster.local executor driver): org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '02-03-2013 12:01:19' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.
	at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkParsedDiff$1.applyOrElse(DateTimeFormatterHelper.scala:150)
	at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkParsedDiff$1.applyOrElse(DateTimeFormatterHelper.scala:141)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
	at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.$anonfun$parse$1(TimestampFormatter.scala:86)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.parse(TimestampFormatter.scala:77)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.time.format.DateTimeParseException: Text '02-03-2013 12:01:19' could not be parsed, unparsed text found at index 10
	at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1952)
	at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1777)
	at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.$anonfun$parse$1(TimestampFormatter.scala:78)
	... 21 more

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2253)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2202)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2201)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2201)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1078)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1078)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1078)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2440)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2382)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2371)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2202)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2223)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2242)
	at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:472)
	at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:425)
	at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:47)
	at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3696)
	at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2722)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
	at org.apache.spark.sql.Dataset.head(Dataset.scala:2722)
	at org.apache.spark.sql.Dataset.take(Dataset.scala:2929)
	at org.apache.spark.sql.Dataset.getRows(Dataset.scala:301)
	at org.apache.spark.sql.Dataset.showString(Dataset.scala:338)
	at sun.reflect.GeneratedMethodAccessor60.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '02-03-2013 12:01:19' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.
	at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkParsedDiff$1.applyOrElse(DateTimeFormatterHelper.scala:150)
	at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkParsedDiff$1.applyOrElse(DateTimeFormatterHelper.scala:141)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
	at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.$anonfun$parse$1(TimestampFormatter.scala:86)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.parse(TimestampFormatter.scala:77)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	... 1 more
Caused by: java.time.format.DateTimeParseException: Text '02-03-2013 12:01:19' could not be parsed, unparsed text found at index 10
	at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1952)
	at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1777)
	at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.$anonfun$parse$1(TimestampFormatter.scala:78)
	... 21 more
