#### **unix_timestamp (EPOCH)**

- is used to get the **current time** and to convert the time string in format **yyyy-MM-dd HH:mm:ss** to Unix timestamp (in **seconds**) by using the **current timezone of the system**.

- It is an **integer** and started from **January 1st 1970 Midnight UTC**.

- Converts **Date and Timestamp** Column to **Unix Time**.

- https://www.epochconverter.com/

Here are some common use cases:

**Converting Date/Time to Unix Timestamp:**
- When you need to convert a date or time string into a Unix timestamp (the number of seconds since '1970-01-01 00:00:00' UTC). This is useful for performing time-based calculations or when storing or transmitting date/time data in a compact, numeric format.

**Time Difference Calculations:**
- To calculate the difference between two dates or times by converting them to Unix timestamps first. Since Unix timestamps are in seconds, subtracting one from another gives the difference in seconds, which can then be converted to other units of time.

**Data Filtering Based on Time Range:**
- When querying data within a specific time range, converting dates to Unix timestamps can simplify the query. You can filter rows based on Unix timestamp values to retrieve records within a desired timeframe.

**Sorting or Grouping by Time:**
- Unix timestamps are numeric, making them straightforward to use for sorting or grouping operations in your queries. This can be particularly useful when dealing with large datasets and you need to organize records based on their timestamp values.

**Interoperability with Other Systems:**
- When exchanging data with other systems or applications that use Unix timestamps to represent date/time values. Converting to and from Unix timestamps ensures compatibility and simplifies data integration tasks.

**Efficiency in Storage and Computation:**
- Unix timestamps are stored as integers, which can be more storage-efficient than storing full datetime strings. Computations with integers (Unix timestamps) can also be faster than parsing and operating on string representations of dates and times.

#### **Syntax**

     unix_timestamp(date_time_column, pattern)

**Arguments**

- **date_time_column:** An optional **DATE, TIMESTAMP, or a STRING** expression in a valid datetime format.

- **pattern:** An optional STRING expression specifying the format if expr is a STRING.

- The default **pattern** value is **'yyyy-MM-dd HH:mm:ss'**.

**Returns:** BIGINT

![test image](files/syntax-1.png)

In [0]:
%sql
-- unix timestamp or epoc time 
-- If no argument is provided the default is the current timestamp.
-- The number of seconds that are passed from 01-JAN-1970 to this particular moment.
SELECT unix_timestamp() AS default;

default
1726385352


In [0]:
%sql
-- convert epoc time to human readble format
SELECT from_unixtime(unix_timestamp()) AS default;

default
2024-09-15 07:34:47


In [0]:
%sql
SELECT unix_timestamp('2016-04-08', 'yyyy-MM-dd') AS unixtimestamp;

unixtimestamp
1460073600


In [0]:
import pyspark.sql.functions as F
from pyspark.sql.functions import lit, col, unix_timestamp, current_timestamp

#### **1) Convert timestamp string to Unix time**

In [0]:
df_ts_ut = spark.createDataFrame([
    (20140228, "2017-03-09 10:27:18", "17-01-2019 12:01:19", "01-07-2019 12:01:19", "08-18-2022", "2022-08-18"),
    (20160229, "2017-03-10 15:27:18", "13-05-2022 15:05:36", "05-13-2022 15:05:36", "04-29-2022", "2022-04-29"),
    (20171031, "2017-03-13 12:27:18", "18-08-2023 17:22:45", "08-18-2023 17:22:45", "08-22-2022", "2022-08-22"),
    (20191130, "2017-03-15 12:27:18", "22-04-2002 18:15:34", "04-22-2002 18:15:34", "03-28-2021", "2021-09-28"),
    (20221130, "2017-03-15 02:27:18", "29-06-2005 22:55:29", "06-29-2005 22:55:29", "02-13-2022", "2022-02-13"),
    (20321130, "2017-03-18 11:27:18", "20-10-2019 23:45:56", "10-20-2019 23:45:56", "07-23-2024", "2024-07-23")],
    ["dateid", "start_timestamp", "input_timestamp", "last_timestamp", "start_date", "end_date"])
display(df_ts_ut)

dateid,start_timestamp,input_timestamp,last_timestamp,start_date,end_date
20140228,2017-03-09 10:27:18,17-01-2019 12:01:19,01-07-2019 12:01:19,08-18-2022,2022-08-18
20160229,2017-03-10 15:27:18,13-05-2022 15:05:36,05-13-2022 15:05:36,04-29-2022,2022-04-29
20171031,2017-03-13 12:27:18,18-08-2023 17:22:45,08-18-2023 17:22:45,08-22-2022,2022-08-22
20191130,2017-03-15 12:27:18,22-04-2002 18:15:34,04-22-2002 18:15:34,03-28-2021,2021-09-28
20221130,2017-03-15 02:27:18,29-06-2005 22:55:29,06-29-2005 22:55:29,02-13-2022,2022-02-13
20321130,2017-03-18 11:27:18,20-10-2019 23:45:56,10-20-2019 23:45:56,07-23-2024,2024-07-23


In [0]:
df_ts_ut = df_ts_ut\
         .withColumn('dateid', F.unix_timestamp(F.col('dateid').cast('string'),'yyyyMMdd'))\
         .withColumn('start_timestamp', F.unix_timestamp(F.col('start_timestamp')))\
         .withColumn('input_timestamp', F.unix_timestamp(F.col('input_timestamp'), 'dd-MM-yyyy HH:mm:ss'))\
         .withColumn('last_timestamp', F.unix_timestamp(F.col('last_timestamp'), "MM-dd-yyyy HH:mm:ss"))\
         .withColumn('start_date', F.unix_timestamp(F.col('start_date'), 'MM-dd-yyyy'))\
         .withColumn('end_date', F.unix_timestamp(F.col('end_date'), "yyyy-MM-dd"))
display(df_ts_ut)

dateid,start_timestamp,input_timestamp,last_timestamp,start_date,end_date
1393545600,1489055238,1547726479,1546862479,1660780800,1660780800
1456704000,1489159638,1652454336,1652454336,1651190400,1651190400
1509408000,1489408038,1692379365,1692379365,1661126400,1661126400
1575072000,1489580838,1019499334,1019499334,1616889600,1632787200
1669766400,1489544838,1120085729,1120085729,1644710400,1644710400
1985385600,1489836438,1571615156,1571615156,1721692800,1721692800


In [0]:
df_ts_ut = df_ts_ut.select("*",
                           unix_timestamp(lit("2024-09-29 13:45:55")).alias('lit_timestamp'),\
                           unix_timestamp(current_timestamp(),'yyyy-MM-dd HH:mm:ss').alias('current_time'),\
                           unix_timestamp(current_timestamp(),'yyyy-MM-dd').alias('current_date'),\
                           unix_timestamp().alias('default_time')
                           )
display(df_ts_ut)

dateid,start_timestamp,input_timestamp,last_timestamp,start_date,end_date,lit_timestamp,current_timestamp,current_date,default_timestamp
1393545600,1489055238,1547726479,1546862479,1660780800,1660780800,1727617555,1726043127,1726043127,1726043127
1456704000,1489159638,1652454336,1652454336,1651190400,1651190400,1727617555,1726043127,1726043127,1726043127
1509408000,1489408038,1692379365,1692379365,1661126400,1661126400,1727617555,1726043127,1726043127,1726043127
1575072000,1489580838,1019499334,1019499334,1616889600,1632787200,1727617555,1726043127,1726043127,1726043127
1669766400,1489544838,1120085729,1120085729,1644710400,1644710400,1727617555,1726043127,1726043127,1726043127
1985385600,1489836438,1571615156,1571615156,1721692800,1721692800,1727617555,1726043127,1726043127,1726043127


#### **2) Convert string to timestamp to Unix time**

**convert string to timestamp by cast()**

In [0]:
# Create a DataFrame from the timestamps list
timestamps = ["2024-01-15T07:17:37Z", "2024-01-16T10:28:33Z", "2024-01-15T07:17:21Z", "2024-01-16T10:12:49Z",
              "2024-01-16T10:36:48Z", "2024-01-16T11:44:29Z", "2024-01-15T07:58:03Z", "2024-01-15T07:16:18Z",
              "2024-01-16T10:27:13Z", "2024-01-16T10:10:34Z", "2024-01-16T10:39:04Z", "2024-01-20T23:39:04Z",
              "2024-01-21T07:39:04Z", "2024-01-16T11:44:29Z", "2024-01-15T07:17:21Z", "2024-01-16T10:36:48Z"
              ]

timestamps_df = spark.createDataFrame([(ts,) for ts in timestamps], ["time_zone"])
display(timestamps_df)

time_zone
2024-01-15T07:17:37Z
2024-01-16T10:28:33Z
2024-01-15T07:17:21Z
2024-01-16T10:12:49Z
2024-01-16T10:36:48Z
2024-01-16T11:44:29Z
2024-01-15T07:58:03Z
2024-01-15T07:16:18Z
2024-01-16T10:27:13Z
2024-01-16T10:10:34Z


In [0]:
# Convert timestamp string to timestamp type
timestamps_df = timestamps_df.withColumn("time_zone", col("time_zone").cast("timestamp"))
display(timestamps_df)

time_zone
2024-01-15T07:17:37Z
2024-01-16T10:28:33Z
2024-01-15T07:17:21Z
2024-01-16T10:12:49Z
2024-01-16T10:36:48Z
2024-01-16T11:44:29Z
2024-01-15T07:58:03Z
2024-01-15T07:16:18Z
2024-01-16T10:27:13Z
2024-01-16T10:10:34Z


In [0]:
timestamps_df = timestamps_df.withColumn('time_zone', F.unix_timestamp(F.col('time_zone'), "yyyy-MM-dd HH:mm:ss"))
display(timestamps_df)

time_zone
1705303057
1705400913
1705303041
1705399969
1705401408
1705405469
1705305483
1705302978
1705400833
1705399834


**convert string to timestamp by to_timestamp()**

In [0]:
data = [
    ("2022-08-08","2022-04-17 17:16:20","19-04-2022 23:02:32"),
    ("2022-04-29","2022-11-07 04:03:11","27-07-2022 18:09:39"),
    ("2022-08-22","2022-02-07 09:15:31","08-11-2022 09:58:34"),
    ("2021-12-28","2022-02-28 02:47:25","03-01-2022 01:59:22"),
    ("2022-02-13","2022-05-22 11:25:29","25-02-2022 04:46:47")
    ]
 
df_stt = spark.createDataFrame(data, schema=["start_date","input_timestamp","update_timestamp"])
display(df_stt)

start_date,input_timestamp,update_timestamp
2022-08-08,2022-04-17 17:16:20,19-04-2022 23:02:32
2022-04-29,2022-11-07 04:03:11,27-07-2022 18:09:39
2022-08-22,2022-02-07 09:15:31,08-11-2022 09:58:34
2021-12-28,2022-02-28 02:47:25,03-01-2022 01:59:22
2022-02-13,2022-05-22 11:25:29,25-02-2022 04:46:47


In [0]:
from pyspark.sql.functions import to_timestamp
df_stt = df_stt.withColumn("input_timestamp", to_timestamp("input_timestamp", 'yyyy-MM-dd HH:mm:ss'))\
               .withColumn("update_timestamp", to_timestamp("update_timestamp", 'dd-MM-yyyy HH:mm:ss'))
display(df_stt)

start_date,input_timestamp,update_timestamp
2022-08-08,2022-04-17T17:16:20Z,2022-04-19T23:02:32Z
2022-04-29,2022-11-07T04:03:11Z,2022-07-27T18:09:39Z
2022-08-22,2022-02-07T09:15:31Z,2022-11-08T09:58:34Z
2021-12-28,2022-02-28T02:47:25Z,2022-01-03T01:59:22Z
2022-02-13,2022-05-22T11:25:29Z,2022-02-25T04:46:47Z


In [0]:
df_stt = df_stt.withColumn('start_date', F.unix_timestamp(F.col('start_date'), "yyyy-MM-dd"))\
               .withColumn('input_timestamp', F.unix_timestamp(F.col('input_timestamp'), "yyyy-MM-dd HH:mm:ss"))\
               .withColumn('update_timestamp', F.unix_timestamp(F.col('update_timestamp'), "yyyy-MM-dd HH:mm:ss"))
display(df_stt)

start_date,input_timestamp,update_timestamp
1659916800,1650215780,1650409352
1651190400,1667793791,1658945379
1661126400,1644225331,1667901514
1640649600,1646016445,1641175162
1644710400,1653218729,1645764407
