#### to_date()

✅ **to_date()** function is used to format a **"date string" (or) "timestamp string" column** into the **"Date" Type column** using a **specified format**.

✅ If the **format is not provided**, to_date() takes the **default value as 'yyyy-MM-dd'**.

✅ Extracts only the **date** portion **(removes time part if present)**.

✅ Returns **NULL** if the format does **not match**.

- **to_date():** extracts **only the date** part (ignores time).
- **to_timestamp():** parses **both date and time**.

**Syntax:**

     to_date(column,format)

**Example:**

##### to_date()

1) Converting a **String** Column with **Default Format**.
   - If your **date strings** follow the **default** format **"yyyy-MM-dd"**, you can simply apply **to_date without specifying a format**.

2) Converting a **String** Column with a **Custom Format**

   - If your **date strings** are in a **different** format (e.g., **"MM/dd/yyyy"**), you **must specify the format** in the **to_date** function.

|      col_name	        |         format                   | default format: yyyy-MM-dd  |  After to_date(col_name, "yyyy-MM-dd") | correct format |
|-----------------------|----------------------------------|-----------------------------|----------------------------------------|----------------|
| "2024-03-06"	        |  to_date("2024-03-06")           |      Matching               |         2024-03-06 (Date)              | to_date("2024-03-06") |
| "06-03-2024"	        |  to_date("06-03-2024")           |      Not Matching           |       NULL (Format mismatch)           | to_date("06-03-2024", "dd-MM-yyyy") |
| "2024-03-06 12:30:00" |	 to_date("2024-03-06 12:30:00")  |      Not Matching           |      2024-03-06 (Time removed)         | to_date("2024-03-06 12:30:00", "yyyy-MM-dd HH:mm:ss") |

#### 1) Converting a "String" Column with "Default Format"

- If your **date strings** follow the **default** format **"yyyy-MM-dd"**, you can simply apply **to_date**, **without specifying a format**.

In [0]:
from pyspark.sql.functions import to_date, col

In [0]:
# Sample data with date in default format "yyyy-MM-dd"
data = [("2021-12-01",),
        ("2022-01-15",),
        ("2023-03-20",),
        ("2024-06-28",),
        ("2025-09-12",),
        ("2025-03-22",)]

columns = ["date_string"]

# Create DataFrame
df_default = spark.createDataFrame(data, columns)

# Convert string column to date type
df_with_date = df_default.withColumn("date_parsed", to_date(col("date_string")))
display(df_with_date)

date_string,date_parsed
2021-12-01,2021-12-01
2022-01-15,2022-01-15
2023-03-20,2023-03-20
2024-06-28,2024-06-28
2025-09-12,2025-09-12
2025-03-22,2025-03-22


#### 2) Converting a "String" Column with a "Custom Format"

- If your **date strings** are in a **different** format (e.g., **"MM/dd/yyyy"**), you **must specify the format** in the **to_date** function.
- Suppose you have **multiple columns** with **dates in different formats**.
- You can **convert** each one separately by applying **to_date** with the **corresponding format**.

**Ex 01**

In [0]:
# Sample data with date in "MM/dd/yyyy" format
data = [("12/01/2021", "25/04/2023 2:00", "6-Feb-23", "2021-07-24 12:01:19.335"),
        ("01/15/2022", "26/04/2023 6:01", "1-Mar-22", "2019-07-22 13:02:20.220"),
        ("03/20/2023", "20/01/2020 4:01", "9-Apr-24", "2021-07-25 03:03:13.098"),
        ("05/25/2024", "26/04/2023 2:02", "8-May-20", "2023-09-25 15:33:43.054"),
        ("07/20/2025", "25/04/2023 5:02", "7-Jun-21", "2024-05-25 23:53:53.023"),
        ("09/29/2020", "25/04/2023 9:03", "5-Jul-23", "2024-04-12 13:33:53.323")]

columns = ["ts_format_01", "ts_format_02", "ts_format_03", "ts_format_04"]

# Create DataFrame
df_custom = spark.createDataFrame(data, columns)

# Convert string column to date type using custom format
df_with_date = df_custom\
    .withColumn("date_parsed_01", to_date(col("ts_format_01"), "MM/dd/yyyy")) \
    .withColumn("date_parsed_02", to_date(col("ts_format_02"), "dd/MM/yyyy H:mm")) \
    .withColumn("date_parsed_03", to_date(col("ts_format_03"), "d-MMM-yy")) \
    .withColumn("date_parsed_04", to_date(col("ts_format_04"))) # default format yyyy-MM-dd HH:
display(df_with_date)

ts_format_01,ts_format_02,ts_format_03,ts_format_04,date_parsed_01,date_parsed_02,date_parsed_03,date_parsed_04
12/01/2021,25/04/2023 2:00,6-Feb-23,2021-07-24 12:01:19.335,2021-12-01,2023-04-25,2023-02-06,2021-07-24
01/15/2022,26/04/2023 6:01,1-Mar-22,2019-07-22 13:02:20.220,2022-01-15,2023-04-26,2022-03-01,2019-07-22
03/20/2023,20/01/2020 4:01,9-Apr-24,2021-07-25 03:03:13.098,2023-03-20,2020-01-20,2024-04-09,2021-07-25
05/25/2024,26/04/2023 2:02,8-May-20,2023-09-25 15:33:43.054,2024-05-25,2023-04-26,2020-05-08,2023-09-25
07/20/2025,25/04/2023 5:02,7-Jun-21,2024-05-25 23:53:53.023,2025-07-20,2023-04-25,2021-06-07,2024-05-25
09/29/2020,25/04/2023 9:03,5-Jul-23,2024-04-12 13:33:53.323,2020-09-29,2023-04-25,2023-07-05,2024-04-12


**Ex 02**

In [0]:
data1 = [("2025-03-26", "26-03-2025", "02/12/2022", "6-Feb-23", "invalid_date", "06-03-2019"),
         ("2024-12-31", "31-12-2024", "04/15/2023", "8-Jan-24", "invalid_date", "16-04-2020"),
         ("2023-07-15", "15-07-2023", "06/18/2024", "6-Mar-23", "invalid_date", "26-05-2021"),
         ("2022-05-25", "25-03-2020", "07/20/2025", "7-Jan-25", "invalid_date", "14-06-2022"),
         ("1998-08-05", "08-05-1982", "08/22/2020", "8-Apr-23", None, "19-11-2023"),
         ("1995-09-29", "29-09-2021", "09/29/2010", "9-Feb-25", None, "29-12-2024"),
         ("1995-09-29", "29-09-2021", "09/29/2010", "9-Feb-25", "2025-03-25", "29-12-2024"),
         ("1995-09-29", "29-09-2021", "09/29/2010", "9-Feb-25", "2024-09-12", "29-12-2024"),
         ("1995-09-29", "29-09-2021", "09/29/2010", "9-Feb-25", "2020-11-29", "29-12-2024")
        ]
columns1 = ["d1", "d2", "d3", "d4", "d5", "d6"]

df_custom_01 = spark.createDataFrame(data1, columns1)
display(df_custom_01)

d1,d2,d3,d4,d5,d6
2025-03-26,26-03-2025,02/12/2022,6-Feb-23,invalid_date,06-03-2019
2024-12-31,31-12-2024,04/15/2023,8-Jan-24,invalid_date,16-04-2020
2023-07-15,15-07-2023,06/18/2024,6-Mar-23,invalid_date,26-05-2021
2022-05-25,25-03-2020,07/20/2025,7-Jan-25,invalid_date,14-06-2022
1998-08-05,08-05-1982,08/22/2020,8-Apr-23,,19-11-2023
1995-09-29,29-09-2021,09/29/2010,9-Feb-25,,29-12-2024
1995-09-29,29-09-2021,09/29/2010,9-Feb-25,2025-03-25,29-12-2024
1995-09-29,29-09-2021,09/29/2010,9-Feb-25,2024-09-12,29-12-2024
1995-09-29,29-09-2021,09/29/2010,9-Feb-25,2020-11-29,29-12-2024


In [0]:
# Convert each string column to date type with appropriate format
from pyspark.sql.functions import col, to_date, when

df1_with_date = (
    df_custom_01
    .withColumn("d1_custom", to_date(col("d1")))  # default yyyy-MM-dd
    .withColumn("d2_custom", to_date(col("d2"), "dd-MM-yyyy"))
    .withColumn("d3_custom", to_date(col("d3"), "MM/dd/yyyy"))
    .withColumn("d4_custom", to_date(col("d4"), "d-MMM-yy"))
    .withColumn(
        "d5_custom",
        when(col("d5").isNotNull() & (col("d5") != "invalid_date"),
             to_date(col("d5"), "yyyy-MM-dd"))
        .otherwise(None)
    )
    .withColumn("d6_custom", to_date(col("d6"), "dd-MM-yyyy"))
)

display(df1_with_date)

d1,d2,d3,d4,d5,d6,d1_custom,d2_custom,d3_custom,d4_custom,d5_custom,d6_custom
2025-03-26,26-03-2025,02/12/2022,6-Feb-23,invalid_date,06-03-2019,2025-03-26,2025-03-26,2022-02-12,2023-02-06,,2019-03-06
2024-12-31,31-12-2024,04/15/2023,8-Jan-24,invalid_date,16-04-2020,2024-12-31,2024-12-31,2023-04-15,2024-01-08,,2020-04-16
2023-07-15,15-07-2023,06/18/2024,6-Mar-23,invalid_date,26-05-2021,2023-07-15,2023-07-15,2024-06-18,2023-03-06,,2021-05-26
2022-05-25,25-03-2020,07/20/2025,7-Jan-25,invalid_date,14-06-2022,2022-05-25,2020-03-25,2025-07-20,2025-01-07,,2022-06-14
1998-08-05,08-05-1982,08/22/2020,8-Apr-23,,19-11-2023,1998-08-05,1982-05-08,2020-08-22,2023-04-08,,2023-11-19
1995-09-29,29-09-2021,09/29/2010,9-Feb-25,,29-12-2024,1995-09-29,2021-09-29,2010-09-29,2025-02-09,,2024-12-29
1995-09-29,29-09-2021,09/29/2010,9-Feb-25,2025-03-25,29-12-2024,1995-09-29,2021-09-29,2010-09-29,2025-02-09,2025-03-25,2024-12-29
1995-09-29,29-09-2021,09/29/2010,9-Feb-25,2024-09-12,29-12-2024,1995-09-29,2021-09-29,2010-09-29,2025-02-09,2024-09-12,2024-12-29
1995-09-29,29-09-2021,09/29/2010,9-Feb-25,2020-11-29,29-12-2024,1995-09-29,2021-09-29,2010-09-29,2025-02-09,2020-11-29,2024-12-29


**Ex 03**

In [0]:
data2 = [("25/04/2023 2:00", "25/04/2023 2", "25/04/2023 22:56:18", "2021-07-24 12:01:19.335"),
         ("26/04/2023 6:01", "26/04/2023 6", "25/04/2002 21:12:00", "2019-07-22 13:02:20.220"),
         ("20/01/2020 4:01", "20/01/2020 4", "25/04/1957 20:12:01", "2021-07-25 03:03:13.098"),
         ("26/04/2023 2:02", "26/04/2023 2", "25/04/2023 23:45:22", "2023-09-25 15:33:43.054"),
         ("25/04/2023 5:02", "25/04/2023 5", "25/04/2024 14:12:03", "2024-05-25 23:53:53.023"),
         ("26/03/2023 8:04", "26/03/2023 8", "25/05/2021 23:45:04", "2025-03-25 22:43:33.323")
        ]
columns2 = ["t1", "t2", "t3", "t4"]

df_custom_02 = spark.createDataFrame(data2, columns2)
display(df_custom_02)

t1,t2,t3,t4
25/04/2023 2:00,25/04/2023 2,25/04/2023 22:56:18,2021-07-24 12:01:19.335
26/04/2023 6:01,26/04/2023 6,25/04/2002 21:12:00,2019-07-22 13:02:20.220
20/01/2020 4:01,20/01/2020 4,25/04/1957 20:12:01,2021-07-25 03:03:13.098
26/04/2023 2:02,26/04/2023 2,25/04/2023 23:45:22,2023-09-25 15:33:43.054
25/04/2023 5:02,25/04/2023 5,25/04/2024 14:12:03,2024-05-25 23:53:53.023
26/03/2023 8:04,26/03/2023 8,25/05/2021 23:45:04,2025-03-25 22:43:33.323


In [0]:
df2_with_date = (df_custom_02\
    .withColumn("t1_custom", to_date(col("t1"), 'dd/MM/yyyy H:mm'))
    .withColumn('t2_custom', to_date(col("t2"), 'dd/MM/yyyy H'))
    .withColumn("t3_custom", to_date(col("t3"), 'dd/MM/yyyy HH:mm:ss')) # extracts only date portion (removes time)
    .withColumn("t4_custom", to_date(col("t4"))) # extracts only date portion (removes time)
)
    
display(df2_with_date)

t1,t2,t3,t4,t1_custom,t2_custom,t3_custom,t4_custom
25/04/2023 2:00,25/04/2023 2,25/04/2023 22:56:18,2021-07-24 12:01:19.335,2023-04-25,2023-04-25,2023-04-25,2021-07-24
26/04/2023 6:01,26/04/2023 6,25/04/2002 21:12:00,2019-07-22 13:02:20.220,2023-04-26,2023-04-26,2002-04-25,2019-07-22
20/01/2020 4:01,20/01/2020 4,25/04/1957 20:12:01,2021-07-25 03:03:13.098,2020-01-20,2020-01-20,1957-04-25,2021-07-25
26/04/2023 2:02,26/04/2023 2,25/04/2023 23:45:22,2023-09-25 15:33:43.054,2023-04-26,2023-04-26,2023-04-25,2023-09-25
25/04/2023 5:02,25/04/2023 5,25/04/2024 14:12:03,2024-05-25 23:53:53.023,2023-04-25,2023-04-25,2024-04-25,2024-05-25
26/03/2023 8:04,26/03/2023 8,25/05/2021 23:45:04,2025-03-25 22:43:33.323,2023-03-26,2023-03-26,2021-05-25,2025-03-25


#### 3) Using to_date() with selectExpr()

- to convert the **datetime string** to a **date**, **ignoring the time** part.



In [0]:
# Sample data with custom format
data = [("2021-12-01 12:30:00", "2021-12-01"),
        ("2022-01-15 10:00:00", "2025-06-15"),
        ("2023-03-20 14:45:00", "2019-09-25"),
        ("2024-09-25 19:55:45", "2024-12-30"),
        ("2025-02-15 11:45:25", "2025-04-28"),
        ("2020-12-25 16:25:55", "2023-10-18")]

columns = ["datetime_string", "input_date"]

# Create DataFrame
df_custom_03 = spark.createDataFrame(data, columns)

# Convert string with datetime to date type using selectExpr
df_custom_expr = df_custom_03.selectExpr(
  "to_date(datetime_string) as date", # extracts only the date from the timestamp.
  "to_date(datetime_string, 'yyyy-MM-dd HH:mm:ss') as date_time", # extracts only the date from the timestamp.
  "to_date(input_date) as input_date" # ensures input_date is recognized as a date type.
)

display(df_custom_expr)

date,date_time,input_date
2021-12-01,2021-12-01,2021-12-01
2022-01-15,2022-01-15,2025-06-15
2023-03-20,2023-03-20,2019-09-25
2024-09-25,2024-09-25,2024-12-30
2025-02-15,2025-02-15,2025-04-28
2020-12-25,2020-12-25,2023-10-18


#### 4) spark sql

**a) Convert String to Date (Default Format)**

In [0]:
%sql
SELECT to_date('2025-03-29') AS converted_date;

converted_date
2025-03-29


**b) Convert String with Custom Format**
- If the **date** is in a **different format**, you can specify the **format** using **to_date()**

In [0]:
%sql
SELECT to_date('29-03-2025', 'dd-MM-yyyy') AS converted_date;

converted_date
2025-03-29


In [0]:
spark.sql("select to_date('02-03-2013','MM-dd-yyyy') date").show()

+----------+
|      date|
+----------+
|2013-02-03|
+----------+



**c) Convert Timestamp to Date**

In [0]:
%sql
SELECT to_date('2025-03-29 14:30:00') AS converted_date;

converted_date
2025-03-29


In [0]:
# SQL TimestampType to DateType
spark.sql("select to_date(current_timestamp) as date_type").show()

+----------+
| date_type|
+----------+
|2025-08-25|
+----------+



- **current_timestamp Function**

  - This function returns the current timestamp (**including date and time**).
  - Example output: **2025-03-29 14:30:45.123**

- **to_date(current_timestamp) Function**

  - The **to_date()** function **extracts** only the **date part** from the **timestamp**.
  - It **removes** the **time portion** and **returns** only the **YYYY-MM-DD** format.
  - Example output: **2025-03-29**

In [0]:
# SQL CAST "TimestampType string" to "DateType"
spark.sql("select date(to_timestamp('2019-06-24 12:01:19.000')) as date_type").display()

date_type
2019-06-24


- **to_timestamp('2019-06-24 12:01:19.000')**

  - This function **converts** the input **string '2019-06-24 12:01:19.000'** into a **TimestampType** value.
  - Output: 2019-06-24 12:01:19.000 (**TimestampType**)

- **date(to_timestamp(...))**

  - The **date()** function **extracts** only the **date** portion **(YYYY-MM-DD)** from the **TimestampType** value.
  - It effectively **truncates** the **time part** and **converts** it into a **DateType** value.
  - Output: 2019-06-24 (**DateType**)

In [0]:
# SQL CAST "timestamp string" to "DateType"
spark.sql("select date('2019-06-24 12:01:19.000') as date_type").display()

date_type
2019-06-24


**'2019-06-24 12:01:19.000' (String Input)**
  - The input is a **timestamp** in **string** format **(YYYY-MM-DD HH:MI:SS.SSS)**.

**date('2019-06-24 12:01:19.000')**
  - The **date()** function implicitly **converts** the **timestamp string** into a **DateType** by **extracting** only the **date** portion **(YYYY-MM-DD)**.
  - The **time** part (12:01:19.000) is **discarded**.
  - **Equivalent** to **CAST('2019-06-24 12:01:19.000' AS DATE)**.

      # SQL Timestamp String (default format) to DateType
      spark.sql("select to_date('2019-06-24 12:01:19.000') as date_type").show()
                                         (or)
      spark.sql("select CAST('2019-06-24 12:01:19.000' AS DATE) as date_type").show()

In [0]:
# SQL Timestamp String (default format) to DateType
spark.sql("select to_date('2019-06-24 12:01:19.000') as date_type").display()

date_type
2019-06-24


- Input String: **'2019-06-24 12:01:19.000'**
  - This is a **timestamp string** in the default format: **YYYY-MM-DD HH:MI:SS.SSS**.

- **to_date('2019-06-24 12:01:19.000') Function**
  - The **to_date()** function **extracts** only the **date (YYYY-MM-DD) part** from the input.
  - The **time** portion (**12:01:19.000**) is **discarded**.
  - Implicitly **converts** the **string to DateType**.

      # SQL Custom Timeformat to DateType
      spark.sql("select to_date('06-24-2019 12:01:19.000','MM-dd-yyyy HH:mm:ss.SSSS') as date_type").show()
                                         (or)
      spark.sql("select CAST('2019-06-24 12:01:19.000' AS DATE) as date_type").show()

In [0]:
# SQL Custom Timeformat to DateType
spark.sql("select to_date('06-24-2019 12:01:19.000','MM-dd-yyyy HH:mm:ss.SSSS') as date_type").display()

date_type
2019-06-24


**Input String: '06-24-2019 12:01:19.000'**
  - The **date and time** are provided in a **custom format**: MM-dd-yyyy HH:mm:ss.SSSS
  - **"06-24-2019"** represents **June 24, 2019**.
  - **"12:01:19.000"** is the **time** portion, which will be **ignored**.

**to_date('06-24-2019 12:01:19.000', 'MM-dd-yyyy HH:mm:ss.SSSS') Function**
  - **First argument:** The timestamp string to convert.
  - **Second argument:** The format of the input string.
  - The function **parses the string according to the provided format** and extracts only the **date part**.
  - The **time** portion (12:01:19.000) is **discarded**.
  - The result is converted to **DateType (YYYY-MM-DD)**.

**d) Use to_date() on a Column**
- If you have a table with a **string** column containing **dates**.

In [0]:
%sql
CREATE TABLE EmpOrder (order_id INT, order_date STRING);
INSERT INTO EmpOrder VALUES (1, '29-03-2025'), (2, '30-03-2025');

SELECT order_id, order_date, to_date(order_date, 'dd-MM-yyyy') AS formatted_date FROM EmpOrder;

order_id,order_date,formatted_date
1,29-03-2025,2025-03-29
2,30-03-2025,2025-03-30
