#### **to_date()**

**How to convert string to date format?**

✅ **to_date()** function is used to format a **date string / timestamp string column** into the **Date Type column** using a **specified format**.

**Syntax:**

     to_date(column,format)
     to_date(col("string_column"),"MM-dd-yyyy") 

#### **to_date() Vs date_format()**

**1) to_date()**

✅ to_date() function is used to format **string (StringType) to date (DateType)** column.

✅ If the **format is not provide**, to_date() takes the **default value as 'yyyy-MM-dd'**.

✅ to_date() accepts the first argument in any date format.

✅ If the input column values **does not match** with the format specified (second argument) then to_date() populates the new column with **null**.

✅ Returns **NULL** if the format does **not match**.

✅ Extracts only the **date** portion **(removes time part if present)**.

**2) date_format()**

✅ date_format() function is used to **format date (DateType / StringType) to (StringType)** in the specified format.

✅ If the format is **not provide**, date_format() throws a **TypeError**.

✅ date_format() requires the **first argument** to be in **'yyyy-MM-dd'** format, else it populates the **new column with null**.

✅ If the **input column values does not match** with the **format** specified (second argument) then date_format() converts it in the specified **format**.

In [0]:
%fs ls /FileStore/tables 

path,name,size,modificationTime
dbfs:/FileStore/tables/Flatten Nested Array.json,Flatten Nested Array.json,3756,1718618620000
dbfs:/FileStore/tables/MarketPrice-1.csv,MarketPrice-1.csv,19528,1719656512000
dbfs:/FileStore/tables/MarketPrice.csv,MarketPrice.csv,19528,1719656208000
dbfs:/FileStore/tables/MultiLineJSON.json/,MultiLineJSON.json/,0,0
dbfs:/FileStore/tables/MultiLineJSON01.json/,MultiLineJSON01.json/,0,0
dbfs:/FileStore/tables/MultiLineJSON1.json/,MultiLineJSON1.json/,0,0
dbfs:/FileStore/tables/MultiLineJSON123.json/,MultiLineJSON123.json/,0,0
dbfs:/FileStore/tables/MultiLineJSON2.json/,MultiLineJSON2.json/,0,0
dbfs:/FileStore/tables/Question7.csv,Question7.csv,154,1725816645000
dbfs:/FileStore/tables/RunningData_Rev02.csv,RunningData_Rev02.csv,1222,1719810609000


In [0]:
df = spark.read.csv("dbfs:/FileStore/tables/to_date.csv", header=True, inferSchema=True)
display(df.limit(10))

input_timestamp,Sensex_Category,Label_Type,Last_transaction_date,Effective_Date,last_timestamp,pymt_timestamp
25/04/2023 2:00,Top,average,2024-02-26,6-Feb-23,25/04/2023 24:56:18,25/04/2023 2
26/04/2023 6:01,Top,average,2023-12-21,6-Feb-23,25/04/2002 21:12:00,26/04/2023 6
20/01/2020 4:01,Top,average,2025-03-27,8-Jan-24,25/04/2021 12:34:01,20/01/2020 4
26/04/2023 2:02,Top,average,2023-12-27,8-Jan-24,25/04/1957 20:12:01,26/04/2023 2
25/04/2023 5:02,Top,average,2024-04-29,6-Mar-23,25/04/2023 23:45:22,25/04/2023 5
25/04/2023 9:03,Forward,medium,2024-12-27,6-Mar-23,25/04/2024 14:12:02,25/04/2023 9
25/04/2023 7:03,Forward,medium,2024-03-26,6-Jan-25,25/04/2023 20:00:03,25/04/2023 7
26/03/2023 8:04,Forward,medium,2024-11-28,6-Jan-25,25/04/2024 14:12:03,26/03/2023 8
25/01/2022 4:04,Forward,medium,2023-12-27,6-Apr-23,25/05/2021 23:45:04,25/01/2022 4
26/03/2023 8:05,Forward,medium,2023-05-15,6-Apr-23,25/04/2024 14:12:04,26/03/2023 8


In [0]:
from pyspark.sql.functions import lit, col, to_date, current_timestamp

#### **1) Convert date string / timestamp string to date**

In [0]:
df_date = df.withColumn("input_timestamp", to_date(col("input_timestamp"),'dd/MM/yyyy H:mm').alias("input_timestamp"))\
            .withColumn("Effective_Date", to_date(col("Effective_Date"),'d-MMM-yy').alias("effective_date"))\
            .withColumn("pymt_timestamp", to_date(col("pymt_timestamp"),'dd/MM/yyyy H').alias("pymt_timestamp"))
display(df_date.limit(10))

input_timestamp,Sensex_Category,Label_Type,Last_transaction_date,Effective_Date,last_timestamp,pymt_timestamp
2023-04-25,Top,average,2024-02-26,2023-02-06,25/04/2023 24:56:18,2023-04-25
2023-04-26,Top,average,2023-12-21,2023-02-06,25/04/2002 21:12:00,2023-04-26
2020-01-20,Top,average,2025-03-27,2024-01-08,25/04/2021 12:34:01,2020-01-20
2023-04-26,Top,average,2023-12-27,2024-01-08,25/04/1957 20:12:01,2023-04-26
2023-04-25,Top,average,2024-04-29,2023-03-06,25/04/2023 23:45:22,2023-04-25
2023-04-25,Forward,medium,2024-12-27,2023-03-06,25/04/2024 14:12:02,2023-04-25
2023-04-25,Forward,medium,2024-03-26,2025-01-06,25/04/2023 20:00:03,2023-04-25
2023-03-26,Forward,medium,2024-11-28,2025-01-06,25/04/2024 14:12:03,2023-03-26
2022-01-25,Forward,medium,2023-12-27,2023-04-06,25/05/2021 23:45:04,2022-01-25
2023-03-26,Forward,medium,2023-05-15,2023-04-06,25/04/2024 14:12:04,2023-03-26


In [0]:
# Custom Timestamp format to DateType
df_date = df_date.withColumn("custom_timestamp", to_date(lit('06-24-2019 12:01:19.000'),'MM-dd-yyyy HH:mm:ss.SSSS'))
display(df_date.limit(10))

input_timestamp,Sensex_Category,Label_Type,Last_transaction_date,Effective_Date,last_timestamp,pymt_timestamp,custom_timestamp
2023-04-25,Top,average,2024-02-26,2023-02-06,25/04/2023 24:56:18,2023-04-25,2019-06-24
2023-04-26,Top,average,2023-12-21,2023-02-06,25/04/2002 21:12:00,2023-04-26,2019-06-24
2020-01-20,Top,average,2025-03-27,2024-01-08,25/04/2021 12:34:01,2020-01-20,2019-06-24
2023-04-26,Top,average,2023-12-27,2024-01-08,25/04/1957 20:12:01,2023-04-26,2019-06-24
2023-04-25,Top,average,2024-04-29,2023-03-06,25/04/2023 23:45:22,2023-04-25,2019-06-24
2023-04-25,Forward,medium,2024-12-27,2023-03-06,25/04/2024 14:12:02,2023-04-25,2019-06-24
2023-04-25,Forward,medium,2024-03-26,2025-01-06,25/04/2023 20:00:03,2023-04-25,2019-06-24
2023-03-26,Forward,medium,2024-11-28,2025-01-06,25/04/2024 14:12:03,2023-03-26,2019-06-24
2022-01-25,Forward,medium,2023-12-27,2023-04-06,25/05/2021 23:45:04,2022-01-25,2019-06-24
2023-03-26,Forward,medium,2023-05-15,2023-04-06,25/04/2024 14:12:04,2023-03-26,2019-06-24


#### **2) How to convert timestamp string to date**

In [0]:
data = [(1,"HSR","2021-07-24 12:01:19.335"),
        (2,"BDA","2019-07-22 13:02:20.220"),
        (3,"BMRDS","2021-07-25 03:03:13.098"),
        (4,"APSRTC","2023-09-25 15:33:43.054"),
        (5,"SDARC","2024-05-25 23:53:53.023")]

schema = ["id","Name","input_timestamp"]

df_ex = spark.createDataFrame(data, schema)
display(df_ex)

id,Name,input_timestamp
1,HSR,2021-07-24 12:01:19.335
2,BDA,2019-07-22 13:02:20.220
3,BMRDS,2021-07-25 03:03:13.098
4,APSRTC,2023-09-25 15:33:43.054
5,SDARC,2024-05-25 23:53:53.023


In [0]:
# Timestamp String to DateType
df_ex = df_ex.withColumn("input_timestamp",to_date("input_timestamp"))\
             .withColumn("current_date",to_date(current_timestamp()))
display(df_ex)

id,Name,input_timestamp,current_date
1,HSR,2021-07-24,2024-09-13
2,BDA,2019-07-22,2024-09-13
3,BMRDS,2021-07-25,2024-09-13
4,APSRTC,2023-09-25,2024-09-13
5,SDARC,2024-05-25,2024-09-13


**spark sql**

In [0]:
# SQL TimestampType to DateType
spark.sql("select to_date(current_timestamp) as date_type").show()

+----------+
| date_type|
+----------+
|2024-09-13|
+----------+



In [0]:
spark.sql("select to_date('02-03-2013','MM-dd-yyyy') date").show()

+----------+
|      date|
+----------+
|2013-02-03|
+----------+



In [0]:
# QL CAST TimestampType to DateType
spark.sql("select date(to_timestamp('2019-06-24 12:01:19.000')) as date_type").show()

+----------+
| date_type|
+----------+
|2019-06-24|
+----------+



In [0]:
# SQL CAST timestamp string to DateType
spark.sql("select date('2019-06-24 12:01:19.000') as date_type").show()

+----------+
| date_type|
+----------+
|2019-06-24|
+----------+



In [0]:
# SQL Timestamp String (default format) to DateType
spark.sql("select to_date('2019-06-24 12:01:19.000') as date_type").show()

+----------+
| date_type|
+----------+
|2019-06-24|
+----------+



In [0]:
# SQL Custom Timeformat to DateType
spark.sql("select to_date('06-24-2019 12:01:19.000','MM-dd-yyyy HH:mm:ss.SSSS') as date_type").show()

+----------+
| date_type|
+----------+
|2019-06-24|
+----------+

