## Timestamp implementation in different frameworks:

**Arrow timestamps** has three parts:
1. a **64-bit integer**
2. a **metadata** that associates a time unit** (e.g. milliseconds, microseconds, or nanoseconds),
3. an **optional time zone**.

**Pandas (Timestamp)** has two parts:
1. a **64-bit integer** representing **nanoseconds**
2. an **optional time zone**.

Python/Pandas timestamp types without an associated time zone are referred to as “Time Zone Naive”.
Python/Pandas timestamp types with an associated time zone are referred to as “Time Zone Aware”.

**Spark timestamps** has one part:
1. a **64-bit integers** representing **seconds since the UNIX epoch**.
2. Note don't mix the long(unix_timestamp) with timestamp(spark_timestamp, microseconds since the unix epoch). They are two different data types.

Note, Spark does not store any metadata about time zones with its timestamps. Spark interprets timestamps with
the session local time zone, (i.e. spark.sql.session.timeZone). If that time zone is undefined, Spark turns to
the default system time zone.

In this doc, you can find out all about date and timestamp in spark
https://databricks.com/blog/2020/07/22/a-comprehensive-look-at-dates-and-timestamps-in-apache-spark-3-0.html 

## The difference of the timestamp implementation will cause: 

- Timezone information is lost (all timestamps that result from converting from spark to arrow/pandas are “time zone naive”).

- Timestamps are truncated to microseconds.

- The session time zone might have unintuitive impacts on translation of timestamp values.


In [1]:
from datetime import datetime, timezone, timedelta

import pandas as pd
from pandas import Timestamp
from pyspark.sql import SparkSession, DataFrame
from pyspark.sql.functions import col, from_unixtime,lit, unix_timestamp, to_timestamp
import pyarrow as pa
import pyarrow.parquet as pq
import os
import s3fs

# Check system timezone
First, we need to check the system timezone. Because spark will use it, if we don't specify it. 

Second, your system may use different timezone system.
UTC+1:00 = CET (Central European Time)
UTC+2:00 = CEST (Central European Summer Time) 

CET and CEST represents the same area, countries who use CET switch to CEST during summer. 

In [11]:
! date +"%Z %z"

UTC +0000


In [2]:
spark = SparkSession.builder \
    .master("local[2]") \
    .appName("PandasSparkTimeStamp") \
    .getOrCreate()

pdf = pd.DataFrame({'naive': [datetime(2046, 1, 1, 0)],
                    'aware': [Timestamp(year=2046, month=1, day=1,
                                        nanosecond=500, tz=timezone(timedelta(hours=-8)))]})
# pandas data frame print the datetime
print(pdf.head())
# Enable Arrow-based columnar data transfers
spark.conf.set("spark.sql.execution.arrow.enabled", "true")


       naive                               aware
0 2046-01-01 2046-01-01 00:00:00.000000500-08:00


## 1. In memory pandas, spark datetime conversion

### 1.1 convert pandas dataframe to spark dataframe

As spark use session timezone to do the conversion, we need to specify the session timezone before the conversion.

In [3]:
# set up spark session time zone
spark.conf.set("spark.sql.session.timeZone", "UTC")

# spark convert the datetime with UTC timezone
utc_df = spark.createDataFrame(pdf)
print("UTC converted datetime in UTC timezone")
utc_df.show()

# if we change the spark session time zone, and read datetime with it.
spark.conf.set("spark.sql.session.timeZone", "US/Pacific")
# spark convert the datetime with US/Pacific timezone
pst_df = spark.createDataFrame(pdf)
print("US/Pacific converted datetime in US/Pacific timezone")
pst_df.show()
print("UTC converted datetime in US/Pacific timezone")
utc_df.show()

UTC converted datetime in UTC timezone
+-------------------+-------------------+
|              naive|              aware|
+-------------------+-------------------+
|2046-01-01 00:00:00|2046-01-01 08:00:00|
+-------------------+-------------------+

US/Pacific converted datetime in US/Pacific timezone
+-------------------+-------------------+
|              naive|              aware|
+-------------------+-------------------+
|2046-01-01 00:00:00|2046-01-01 00:00:00|
+-------------------+-------------------+

UTC converted datetime in US/Pacific timezone
+-------------------+-------------------+
|              naive|              aware|
+-------------------+-------------------+
|2045-12-31 16:00:00|2046-01-01 00:00:00|
+-------------------+-------------------+



### 1.2 Convert spark dataframe to pandas 

In the first block, we are in timeZone US/Pacific

In the second block, we are in timeZone UTC

In [4]:
# set timezone to US/Pacific
spark.conf.set("spark.sql.session.timeZone", "US/Pacific")
# we convert a spark dataframe back to pandas dataframe
# as spark does not have time zone, so the generated pandas can't have time zone
ppst_df1 = pst_df.toPandas()
print(ppst_df1.head())

# now we compare the datetime of origin pandas dataframe with the dataframe generated by spark.
print(f"spark converted pandas data frame {ppst_df1['aware'][0]}")
print(f"pandas origin data frame{pdf['aware'][0]}")

# the result should be 0, but because spark converted dataframe lost the timezone info, so we have a 8 hour difference. 
print(f"time zone hours {(ppst_df1['aware'][0].timestamp() - pdf['aware'][0].timestamp()) / 3600}")

       naive      aware
0 2046-01-01 2046-01-01
spark converted pandas data frame 2046-01-01 00:00:00
pandas origin data frame2046-01-01 00:00:00.000000500-08:00
time zone hours -8.0


Note that the surprising shift for aware does not happen when the session time zone is UTC (but the timestamps still become “time zone naive”):


In [5]:
# set the session timezone to UTC again
spark.conf.set("spark.sql.session.timeZone", "UTC")

ppst_df2 = pst_df.toPandas()
print(ppst_df2.head())

# now we compare the datetime of origin pandas dataframe with the dataframe generated by spark.
print(f"spark converted pandas data frame {ppst_df2['aware'][0]}")
print(f"pandas origin data frame{pdf['aware'][0]}")

# the result should be 0, but because spark converted dataframe lost the timezone info, so we have a 8 hour difference. 
print(f"time zone hours {(ppst_df2['aware'][0].timestamp() - pdf['aware'][0].timestamp()) / 3600}")

                naive               aware
0 2046-01-01 08:00:00 2046-01-01 08:00:00
spark converted pandas data frame 2046-01-01 08:00:00
pandas origin data frame2046-01-01 00:00:00.000000500-08:00
time zone hours 0.0


# 2. pandas, spark datetime conversion via Parquet file 


In above test, we have tested the data conversation via the framework memory converter.

Now if we output the date in a parquet file with pyarrow and read it with spark and vise versa. Is it still compatible?

## 2.1 Pyarrow pandas parquet file read by spark
In this test, we first use pyarrow to write pandas dataframe to a Parquet file with parquet file format version 1.0 and 2.0 (not the arrow framework version). Then we use spark to read these parquet files.



### 2.1.1 Write parquet by using pyarrow

In [6]:
# 1. We creat a pandas data frame and write it in a parquet file
pdf = pd.DataFrame({'naive': [datetime(2046, 1, 1, 0)],
                    'aware': [Timestamp(year=2046, month=1, day=1,
                                        nanosecond=500, tz=timezone(timedelta(hours=-8)))]})
# pandas data frame print the datetime
print(pdf.head())

       naive                               aware
0 2046-01-01 2046-01-01 00:00:00.000000500-08:00


In [19]:
# 2. write it as parquet file
def write_parquet_as_partitioned_dataset(table, endpoint, bucket_name, path, partition_cols=None, compression="SNAPPY",version="1.0"):
    url = f"https://{endpoint}"
    fs = s3fs.S3FileSystem(client_kwargs={'endpoint_url': url})
    file_uri = f"{bucket_name}/{path}"
    if version=="1.0":
        # note without the coerce_timestamps='ms', the write will fail. Because it cant convert the nano second automatically.
        # allow_truncated_timestamps=True, suppress the conversion warning (lose time precision)
        pq.write_to_dataset(table, root_path=file_uri, partition_cols=partition_cols, filesystem=fs, compression=compression,version=version, coerce_timestamps='ms', allow_truncated_timestamps=True)
    elif version=="2.0":
        pq.write_to_dataset(table, root_path=file_uri, partition_cols=partition_cols, filesystem=fs, compression=compression,version=version)
    else: 
        raise ValueError("The parquet version must be 1.0 or 2.0")
    
# omit the index by using preserve_index=False
table = pa.Table.from_pandas(pdf, preserve_index=False)

In [22]:
# arrow write to parquet version 1.0. timestamp cast between pandas and arrow lose data
# Casting from timestamp[ns, tz=-08:00] to timestamp[us] would lose data: 2398406400000000500
# with 2.0, no more warning.

endpoint=os.environ['AWS_S3_ENDPOINT']
bucket_name="pengfei"
path_v1="diffusion/data_format/timestamp_compability/arrow_time_v1.0"
path_v2="diffusion/data_format/timestamp_compability/arrow_time_v2.0"


In [None]:
# write parquet with format version 1.0
write_parquet_as_partitioned_dataset(table, endpoint, bucket_name, path_v1,version="1.0")

In [21]:
# write parquet with format version 2.0
write_parquet_as_partitioned_dataset(table, endpoint, bucket_name, path_v2,version="2.0")

In [26]:
# 3. Arrow read it back to pandas df
# This function reads a parquet data set (partitioned parquet files) from s3, and returns an arrow table
def read_parquet_from_s3(endpoint: str, bucket_name, path):
    url = f"https://{endpoint}"
    fs = s3fs.S3FileSystem(client_kwargs={'endpoint_url': url})
    file_uri = f"{bucket_name}/{path}"
    str_info = fs.info(file_uri)
    print(f"input file metadata: {str_info}")
    dataset = pq.ParquetDataset(file_uri, filesystem=fs, metadata_nthreads=8)
    table = dataset.read()
    return table

### 2.1.2. Spark read different parquet file 

Note, in the arrow implementation of parquet format v1.0. A timestamp in pandas dataframe is converted from nanosecond to microsecond. And in spark, this is considered as **timestamp (spark column type)**

Check the schema of the output dataframe of the parquet file. 

In [4]:
spath_v1=f"s3a://pengfei/{path_v1}"
spath_v2=f"s3a://pengfei/{path_v2}"

In [5]:
# check the compatibility of arrow parquet v1
df_v1=spark.read.parquet(spath_v1)
df_v1=df_v1.withColumn("now", lit(unix_timestamp()))
# you can notice in the dataframe schema, for naive and aware column, they are both recognize as type timestamp automatically
# Because in pandas/arrow conversion, we convert the nanosecond to microsecond, which is consider as Spark column type timestamp.  
df_v1.printSchema()
df_v1.show()

root
 |-- naive: timestamp (nullable = true)
 |-- aware: timestamp (nullable = true)
 |-- now: long (nullable = true)

+-------------------+-------------------+----------+
|              naive|              aware|       now|
+-------------------+-------------------+----------+
|2046-01-01 00:00:00|2046-01-01 08:00:00|1633435859|
+-------------------+-------------------+----------+



In [15]:
spark.conf.set("spark.sql.session.timeZone", "US/Pacific")
df_v1.show()

+-------------------+-------------------+
|              naive|              aware|
+-------------------+-------------------+
|2045-12-31 16:00:00|2046-01-01 00:00:00|
+-------------------+-------------------+



In the arrow implementation of parquet format v2.0. A timestamp in pandas dataframe keeps the nanosecond (long type). When spark read it, spark considered it as long. If we use spark timestamp conversion function such as from_unixtime(). We will have wong result. Because the long column type in spark is considered as second since linux time. To get right timestamp, we need to convert nanosecond to second.

In [31]:
# check the compatibility of arrow parquet v2
df_v2=spark.read.parquet(spath_v2)
df_v2=df_v2.withColumn("now", lit(unix_timestamp()))
df_v2.printSchema()
df_v2.show()

root
 |-- naive: long (nullable = true)
 |-- aware: long (nullable = true)
 |-- now: long (nullable = true)

+-------------------+-------------------+----------+
|              naive|              aware|       now|
+-------------------+-------------------+----------+
|2398377600000000000|2398406400000000500|1632499078|
+-------------------+-------------------+----------+



In [34]:

df_v2_convert = df_v2.select( \
        from_unixtime(col("naive"), "MM-dd-yyyy HH:mm:ss").alias("naive_convert"), \
        from_unixtime(col("aware"), "MM-dd-yyyy HH:mm:ss").alias("aware_convert"), \
        from_unixtime(col("now"), "MM-dd-yyyy HH:mm:ss").alias("now_convert"))

df_v2_convert.show(truncate=False)

+----------------------+---------------------+-------------------+
|naive_convert         |aware_convert        |now_convert        |
+----------------------+---------------------+-------------------+
|03-16-+183309 11:28:57|10-10--73164 03:33:37|09-24-2021 16:01:22|
+----------------------+---------------------+-------------------+



In [50]:
df_v2_nano_to_micro=df_v2.withColumn("micro_naive",col("naive")/1000000000) \
                         .withColumn("micro_aware",col("aware")/1000000000) \
                         .withColumn("convert_naive", from_unixtime(col("micro_naive"), "yyyy-MM-dd HH:mm:ss")) \
                         .withColumn("convert_aware", from_unixtime(col("micro_aware"), "yyyy-MM-dd HH:mm:ss")) \
                         .withColumn("convert_now", from_unixtime(col("now"), "yyyy-MM-dd HH:mm:ss")) \
                         

In [52]:
df_v2_nano_to_micro.select("convert_naive","convert_aware","convert_now").show()

+-------------------+-------------------+-------------------+
|      convert_naive|      convert_aware|        convert_now|
+-------------------+-------------------+-------------------+
|2046-01-01 00:00:00|2046-01-01 08:00:00|2021-09-24 16:17:41|
+-------------------+-------------------+-------------------+



## 2.2 Spark write parquet with timestamp and use arrow to read it

https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriter.html

The spark doc says when write to parquet, we can only specify the compression type. There is nothing we can do for time

### 2.2.1 Spark write timestamp

When spark writes timestamp to parquet file, spark will first check the spark session time zone. If the current session time zone is not UTC, for example US/Pacific, spark will convert the timestamp of the current session time zone (i.e. US/Pacific) to a timestamp of timezone UTC.

In [16]:
data = [("2046-01-01 00:00:00", "2046-01-01 08:00:00")]

columns = ["t1", "t2"]

# set a timezone, for the string timestamp, it has zero effect. For the long and timestamp column, it will be converted to the UTC timezone.
spark.conf.set("spark.sql.session.timeZone", "US/Pacific")

df_v1=spark.createDataFrame(data=data,schema=columns)

df_spark_time=df_v1.withColumn("t1_long", unix_timestamp("t1"))\
                   .withColumn("t1_timestamp", to_timestamp("t1")) \
                   .withColumn("t2_long", unix_timestamp("t2"))\
                   .withColumn("t2_timestamp", to_timestamp("t2"))\
                   .select("t1","t1_long","t1_timestamp","t2","t2_long","t2_timestamp")



In [17]:
df_spark_time.printSchema()
df_spark_time.show(truncate=False)

root
 |-- t1: string (nullable = true)
 |-- t1_long: long (nullable = true)
 |-- t1_timestamp: timestamp (nullable = true)
 |-- t2: string (nullable = true)
 |-- t2_long: long (nullable = true)
 |-- t2_timestamp: timestamp (nullable = true)

+-------------------+----------+-------------------+-------------------+----------+-------------------+
|t1                 |t1_long   |t1_timestamp       |t2                 |t2_long   |t2_timestamp       |
+-------------------+----------+-------------------+-------------------+----------+-------------------+
|2046-01-01 00:00:00|2398406400|2046-01-01 00:00:00|2046-01-01 08:00:00|2398435200|2046-01-01 08:00:00|
+-------------------+----------+-------------------+-------------------+----------+-------------------+



In [22]:
spark_parquet_utc_path="s3a://pengfei/diffusion/data_format/timestamp_compability/spark_time_v1.0"
spark_parquet_upc_path="s3a://pengfei/diffusion/data_format/timestamp_compability/spark_time_upc_v1.0"

# df_spark_time.coalesce(1).write.parquet(spark_parquet_utc_path)
# df_spark_time.coalesce(1).write.parquet(spark_parquet_upc_path)

In [24]:
spark.conf.set("spark.sql.session.timeZone", "UTC")

df_read_utc_parquet=spark.read.parquet(spark_parquet_utc_path)
df_read_utc_parquet.show()

+-------------------+----------+-------------------+-------------------+----------+-------------------+
|                 t1|   t1_long|       t1_timestamp|                 t2|   t2_long|       t2_timestamp|
+-------------------+----------+-------------------+-------------------+----------+-------------------+
|2046-01-01 00:00:00|2398377600|2046-01-01 00:00:00|2046-01-01 08:00:00|2398406400|2046-01-01 08:00:00|
+-------------------+----------+-------------------+-------------------+----------+-------------------+



In [25]:
df_read_upc_parquet=spark.read.parquet(spark_parquet_upc_path)
df_read_upc_parquet.show()

+-------------------+----------+-------------------+-------------------+----------+-------------------+
|                 t1|   t1_long|       t1_timestamp|                 t2|   t2_long|       t2_timestamp|
+-------------------+----------+-------------------+-------------------+----------+-------------------+
|2046-01-01 00:00:00|2398406400|2046-01-01 08:00:00|2046-01-01 08:00:00|2398435200|2046-01-01 16:00:00|
+-------------------+----------+-------------------+-------------------+----------+-------------------+



### 2.2.2 Arrow read the parquet file

Arrow read the parquet file, then convert it to pandas

In [20]:
# This function reads a parquet data set (partitioned parquet files) from s3, and returns an arrow table
def read_parquet_from_s3(endpoint: str, bucket_name, path):
    url = f"https://{endpoint}"
    fs = s3fs.S3FileSystem(client_kwargs={'endpoint_url': url})
    file_uri = f"{bucket_name}/{path}"
    str_info = fs.info(file_uri)
    print(f"input file metadata: {str_info}")
    dataset = pq.ParquetDataset(file_uri, filesystem=fs, metadata_nthreads=8)
    table = dataset.read()
    return table



In [11]:
path="diffusion/data_format/timestamp_compability/spark_time_v1.0"

pandas_df=read_parquet_from_s3(endpoint, bucket_name, path).to_pandas()


input file metadata: {'Key': 'pengfei/diffusion/data_format/timestamp_compability/spark_time_v1.0', 'name': 'pengfei/diffusion/data_format/timestamp_compability/spark_time_v1.0', 'type': 'directory', 'Size': 0, 'size': 0, 'StorageClass': 'DIRECTORY'}


We can notice that all columns of the pandas dataframe has the right column type. You can notice the t1_timestamp and t2_timestamp column have the datetime64 column type 

In [12]:
pandas_df.head()

Unnamed: 0,t1,t1_long,t1_timestamp,t2,t2_long,t2_timestamp
0,2046-01-01 00:00:00,2398377600,2046-01-01,2046-01-01 08:00:00,2398406400,2046-01-01 08:00:00


In [13]:
pandas_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   t1            1 non-null      object        
 1   t1_long       1 non-null      int64         
 2   t1_timestamp  1 non-null      datetime64[ns]
 3   t2            1 non-null      object        
 4   t2_long       1 non-null      int64         
 5   t2_timestamp  1 non-null      datetime64[ns]
dtypes: datetime64[ns](2), int64(2), object(2)
memory usage: 176.0+ bytes


To convert the long to pandas timestamp, we can use the function to_datetime. Note the important point is the unit. As spark output timestamp as second, if you use other unit, you will not have correct timestamp. Try to change the unit from s to ns, or ms

In [14]:
pandas_df['t1_pandas_UNIXTIME'] = pd.to_datetime(pandas_df['t1_long'], unit='s')

In [15]:
pandas_df.head()

Unnamed: 0,t1,t1_long,t1_timestamp,t2,t2_long,t2_timestamp,t1_pandas_UNIXTIME
0,2046-01-01 00:00:00,2398377600,2046-01-01,2046-01-01 08:00:00,2398406400,2046-01-01 08:00:00,2046-01-01


In [51]:
pandas_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   t1                  1 non-null      object        
 1   t1_long             1 non-null      int64         
 2   t1_timestamp        1 non-null      datetime64[ns]
 3   t2                  1 non-null      object        
 4   t2_long             1 non-null      int64         
 5   t2_timestamp        1 non-null      datetime64[ns]
 6   UNIXTIME            1 non-null      datetime64[ns]
 7   t1_pandas_UNIXTIME  1 non-null      datetime64[ns]
dtypes: datetime64[ns](4), int64(2), object(2)
memory usage: 192.0+ bytes


# 3. Timestamp String with timezone 

The best solution to avoid conversion nuance is to use string timestamp with timezone. And for each use case, the data analyste can convert it to numeric timestamp according their needs. 

## 3.1 Spark Write timestamp to parquet 

In [13]:
data=[("2046-01-01 00:15:00+01:00","2046-01-01 00:15:00-01:00")]
schema=["t1","t2"]

df_tz_raw=spark.createDataFrame(data=data,schema=schema)

df_tz_raw.show(truncate=False)

+-------------------------+-------------------------+
|t1                       |t2                       |
+-------------------------+-------------------------+
|2046-01-01 00:15:00+01:00|2046-01-01 00:15:00-01:00|
+-------------------------+-------------------------+



In [14]:
# Set session timezone to system default
spark.conf.set("spark.sql.session.timeZone", "UTC")

# change session timezone to one that is different to the system timezone. You will see pandas returns a wrong timestamp after 
# reading the parquet file
# spark.conf.set("spark.sql.session.timeZone", "US/Pacific")


df_tz=df_tz_raw.withColumn("t1_unix",unix_timestamp("t1","yyy-MM-dd HH:mm:ssXXX")) \
     .withColumn("t2_unix",unix_timestamp("t2","yyy-MM-dd HH:mm:ssXXX")) \
     .withColumn("t1_ts",to_timestamp("t1","yyy-MM-dd HH:mm:ssXXX")) \
     .withColumn("t2_ts",to_timestamp("t2","yyy-MM-dd HH:mm:ssXXX"))

df_tz.show(truncate=False)

+-------------------------+-------------------------+----------+----------+-------------------+-------------------+
|t1                       |t2                       |t1_unix   |t2_unix   |t1_ts              |t2_ts              |
+-------------------------+-------------------------+----------+----------+-------------------+-------------------+
|2046-01-01 00:15:00+01:00|2046-01-01 00:15:00-01:00|2398374900|2398382100|2045-12-31 23:15:00|2046-01-01 01:15:00|
+-------------------------+-------------------------+----------+----------+-------------------+-------------------+



In [16]:
spark_parquet_tz="s3a://pengfei/diffusion/data_format/timestamp_compability/spark_timestamp_with_timezone_v1.0"
df_tz.coalesce(1).write.parquet(spark_parquet_tz)

In [19]:
df_read_tz_parquet=spark.read.parquet(spark_parquet_tz)
df_read_tz_parquet.show(truncate=False)

+-------------------------+-------------------------+----------+----------+-------------------+-------------------+
|t1                       |t2                       |t1_unix   |t2_unix   |t1_ts              |t2_ts              |
+-------------------------+-------------------------+----------+----------+-------------------+-------------------+
|2046-01-01 00:15:00+01:00|2046-01-01 00:15:00-01:00|2398374900|2398382100|2045-12-31 23:15:00|2046-01-01 01:15:00|
+-------------------------+-------------------------+----------+----------+-------------------+-------------------+



## 3.2 Arrow pandas read parquet with timezone awared timestamp 

You can notice the result for all three column is correct. Because we write the timestamp by using the system timezone and read it with the same system timezone.

If we set the spark session time zone to a timezone that is different to the system timezone, then use arrow to read it with system timezone, you will see a different story.


At last, we show how to convert string timestamp

In [23]:
path="diffusion/data_format/timestamp_compability/spark_timestamp_with_timezone_v1.0"

pandas_df=read_parquet_from_s3(endpoint, bucket_name, path).to_pandas()

input file metadata: {'Key': 'pengfei/diffusion/data_format/timestamp_compability/spark_timestamp_with_timezone_v1.0', 'name': 'pengfei/diffusion/data_format/timestamp_compability/spark_timestamp_with_timezone_v1.0', 'type': 'directory', 'Size': 0, 'size': 0, 'StorageClass': 'DIRECTORY'}


In [25]:
pandas_df.head()

Unnamed: 0,t1,t2,t1_unix,t2_unix,t1_ts,t2_ts
0,2046-01-01 00:15:00+01:00,2046-01-01 00:15:00-01:00,2398374900,2398382100,2045-12-31 23:15:00,2046-01-01 01:15:00


In [29]:
# Pandas provides the to_datetime() function which can convert string with time zone to a timezone aware timestamp.
t1=pd.to_datetime('2046-01-01 00:15:00+01:00', utc=True)
t2=pd.to_datetime('2046-01-01 00:15:00-01:00', utc=True)
print(f"t1 type is : {type(t1)}, t1 value is: {t1}")
print(f"t2 type is : {type(t2)}, t2 value is: {t2}")

t1 type is : <class 'pandas._libs.tslibs.timestamps.Timestamp'>, t1 value is: 2045-12-31 23:15:00+00:00
t2 type is : <class 'pandas._libs.tslibs.timestamps.Timestamp'>, t2 vaule is: 2046-01-01 01:15:00+00:00
