Użyj każdą z tych funkcji 
* `unix_timestamp()` 
* `date_format()`
* `to_unix_timestamp()`
* `from_unixtime()`
* `to_date()` 
* `to_timestamp()` 
* `from_utc_timestamp()` 
* `to_utc_timestamp()`

In [3]:
from pyspark.sql.functions import current_date, current_timestamp
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("Notatnik Daty") \
    .getOrCreate()

kolumny = ["timestamp", "unix", "Date"]
dane = [("2015-03-22T14:13:34", 1646641525847, "May, 2021"),
        ("2015-03-22T15:03:18", 1646641557555, "Mar, 2021"),
        ("2015-03-22T14:38:39", 1646641578622, "Jan, 2021")]

dataFrame = spark.createDataFrame(dane, kolumny) \
    .withColumn("current_date", current_date()) \
    .withColumn("current_timestamp", current_timestamp())

display(dataFrame)
dataFrame.show(truncate=False)

25/03/12 13:57:22 WARN Utils: Your hostname, MacBook-Pro-Marysia.local resolves to a loopback address: 127.0.0.1; using 192.168.0.143 instead (on interface en0)
25/03/12 13:57:22 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/03/12 13:57:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


DataFrame[timestamp: string, unix: bigint, Date: string, current_date: date, current_timestamp: timestamp]

                                                                                

+-------------------+-------------+---------+------------+--------------------------+
|timestamp          |unix         |Date     |current_date|current_timestamp         |
+-------------------+-------------+---------+------------+--------------------------+
|2015-03-22T14:13:34|1646641525847|May, 2021|2025-03-12  |2025-03-12 13:57:28.120763|
|2015-03-22T15:03:18|1646641557555|Mar, 2021|2025-03-12  |2025-03-12 13:57:28.120763|
|2015-03-22T14:38:39|1646641578622|Jan, 2021|2025-03-12  |2025-03-12 13:57:28.120763|
+-------------------+-------------+---------+------------+--------------------------+



In [4]:
dataFrame.printSchema()

root
 |-- timestamp: string (nullable = true)
 |-- unix: long (nullable = true)
 |-- Date: string (nullable = true)
 |-- current_date: date (nullable = false)
 |-- current_timestamp: timestamp (nullable = false)



## unix_timestamp(..) & cast(..)

Konwersja **string** to a **timestamp**.

Lokalizacja funkcji 
* `pyspark.sql.functions` in the case of Python
* `org.apache.spark.sql.functions` in the case of Scala & Java

## 1. Zmiana formatu wartości timestamp yyyy-MM-dd'T'HH:mm:ss 
`unix_timestamp(..)`

Dokumentacja API `unix_timestamp(..)`:
> Convert time string with given pattern (see <a href="http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html" target="_blank">SimpleDateFormat</a>) to Unix time stamp (in seconds), return null if fail.

`SimpleDataFormat` is part of the Java API and provides support for parsing and formatting date and time values.

In [5]:
from pyspark.sql.functions import unix_timestamp, from_unixtime
dataFrame = dataFrame.withColumn("unix_timestamp", unix_timestamp("timestamp", "yyyy-MM-dd'T'HH:mm:ss"))
dataFrame.show()

+-------------------+-------------+---------+------------+--------------------+--------------+
|          timestamp|         unix|     Date|current_date|   current_timestamp|unix_timestamp|
+-------------------+-------------+---------+------------+--------------------+--------------+
|2015-03-22T14:13:34|1646641525847|May, 2021|  2025-03-12|2025-03-12 13:57:...|    1427030014|
|2015-03-22T15:03:18|1646641557555|Mar, 2021|  2025-03-12|2025-03-12 13:57:...|    1427032998|
|2015-03-22T14:38:39|1646641578622|Jan, 2021|  2025-03-12|2025-03-12 13:57:...|    1427031519|
+-------------------+-------------+---------+------------+--------------------+--------------+



2. Zmień format zgodnie z klasą `SimpleDateFormat`**yyyy-MM-dd HH:mm:ss**
  * a. Wyświetl schemat i dane żeby sprawdzicz czy wartości się zmieniły

In [6]:
dataFrame = dataFrame.withColumn("timestamp_formatted", from_unixtime("unix_timestamp", "yyyy-MM-dd HH:mm:ss"))
dataFrame.show()

+-------------------+-------------+---------+------------+--------------------+--------------+-------------------+
|          timestamp|         unix|     Date|current_date|   current_timestamp|unix_timestamp|timestamp_formatted|
+-------------------+-------------+---------+------------+--------------------+--------------+-------------------+
|2015-03-22T14:13:34|1646641525847|May, 2021|  2025-03-12|2025-03-12 13:57:...|    1427030014|2015-03-22 14:13:34|
|2015-03-22T15:03:18|1646641557555|Mar, 2021|  2025-03-12|2025-03-12 13:57:...|    1427032998|2015-03-22 15:03:18|
|2015-03-22T14:38:39|1646641578622|Jan, 2021|  2025-03-12|2025-03-12 13:57:...|    1427031519|2015-03-22 14:38:39|
+-------------------+-------------+---------+------------+--------------------+--------------+-------------------+



In [58]:
display(dataFrame)

DataFrame[timestamp: string, unix: bigint, Date: string, current_date: date, current_timestamp: timestamp, unix_timestamp: bigint, timestamp_formatted: string, year: int, month: int, dayofyear: int, timestamp_unix: timestamp]

## Stwórz nowe kolumny do DataFrame z wartościami year(..), month(..), dayofyear(..)

In [59]:
from pyspark.sql.functions import year, month, dayofyear
#date_format
dataFrame = dataFrame.withColumn("year", year("timestamp")) \
                     .withColumn("month", month("timestamp")) \
                     .withColumn("dayofyear", dayofyear("timestamp"))

dataFrame.show()

+-------------------+-------------+---------+------------+--------------------+--------------+-------------------+----+-----+---------+-------------------+
|          timestamp|         unix|     Date|current_date|   current_timestamp|unix_timestamp|timestamp_formatted|year|month|dayofyear|     timestamp_unix|
+-------------------+-------------+---------+------------+--------------------+--------------+-------------------+----+-----+---------+-------------------+
|2015-03-22T14:13:34|1646641525847|May, 2021|  2025-03-12|2025-03-12 14:08:...|    1427030014|2015-03-22 14:13:34|2015|    3|       81|2015-03-22 14:13:34|
|2015-03-22T15:03:18|1646641557555|Mar, 2021|  2025-03-12|2025-03-12 14:08:...|    1427032998|2015-03-22 15:03:18|2015|    3|       81|2015-03-22 15:03:18|
|2015-03-22T14:38:39|1646641578622|Jan, 2021|  2025-03-12|2025-03-12 14:08:...|    1427031519|2015-03-22 14:38:39|2015|    3|       81|2015-03-22 14:38:39|
+-------------------+-------------+---------+------------+------

In [60]:
#to_date()
from pyspark.sql.functions import to_timestamp, to_date

dataFrame = dataFrame.withColumn("timestamp_unix", to_timestamp("timestamp", "yyyy-MM-dd'T'HH:mm:ss"))
toDate = dataFrame.withColumn("date_only", to_date("timestamp_unix"))

toDate.show()

+-------------------+-------------+---------+------------+--------------------+--------------+-------------------+----+-----+---------+-------------------+----------+
|          timestamp|         unix|     Date|current_date|   current_timestamp|unix_timestamp|timestamp_formatted|year|month|dayofyear|     timestamp_unix| date_only|
+-------------------+-------------+---------+------------+--------------------+--------------+-------------------+----+-----+---------+-------------------+----------+
|2015-03-22T14:13:34|1646641525847|May, 2021|  2025-03-12|2025-03-12 14:08:...|    1427030014|2015-03-22 14:13:34|2015|    3|       81|2015-03-22 14:13:34|2015-03-22|
|2015-03-22T15:03:18|1646641557555|Mar, 2021|  2025-03-12|2025-03-12 14:08:...|    1427032998|2015-03-22 15:03:18|2015|    3|       81|2015-03-22 15:03:18|2015-03-22|
|2015-03-22T14:38:39|1646641578622|Jan, 2021|  2025-03-12|2025-03-12 14:08:...|    1427031519|2015-03-22 14:38:39|2015|    3|       81|2015-03-22 14:38:39|2015-03-22

In [61]:
#from_unixtime()
from pyspark.sql.functions import from_unixtime

fromUnix = dataFrame.withColumn("from_unixtime", from_unixtime("unix_timestamp"))
fromUnix.show()

+-------------------+-------------+---------+------------+--------------------+--------------+-------------------+----+-----+---------+-------------------+-------------------+
|          timestamp|         unix|     Date|current_date|   current_timestamp|unix_timestamp|timestamp_formatted|year|month|dayofyear|     timestamp_unix|      from_unixtime|
+-------------------+-------------+---------+------------+--------------------+--------------+-------------------+----+-----+---------+-------------------+-------------------+
|2015-03-22T14:13:34|1646641525847|May, 2021|  2025-03-12|2025-03-12 14:08:...|    1427030014|2015-03-22 14:13:34|2015|    3|       81|2015-03-22 14:13:34|2015-03-22 14:13:34|
|2015-03-22T15:03:18|1646641557555|Mar, 2021|  2025-03-12|2025-03-12 14:08:...|    1427032998|2015-03-22 15:03:18|2015|    3|       81|2015-03-22 15:03:18|2015-03-22 15:03:18|
|2015-03-22T14:38:39|1646641578622|Jan, 2021|  2025-03-12|2025-03-12 14:08:...|    1427031519|2015-03-22 14:38:39|2015| 

In [62]:
#to_timestamp()
from pyspark.sql.functions import to_timestamp

toTimestamp = dataFrame.withColumn("to_timestamp", to_timestamp("timestamp_unix"))
toTimestamp.show()
display(toTimestamp)


+-------------------+-------------+---------+------------+--------------------+--------------+-------------------+----+-----+---------+-------------------+-------------------+
|          timestamp|         unix|     Date|current_date|   current_timestamp|unix_timestamp|timestamp_formatted|year|month|dayofyear|     timestamp_unix|       to_timestamp|
+-------------------+-------------+---------+------------+--------------------+--------------+-------------------+----+-----+---------+-------------------+-------------------+
|2015-03-22T14:13:34|1646641525847|May, 2021|  2025-03-12|2025-03-12 14:08:...|    1427030014|2015-03-22 14:13:34|2015|    3|       81|2015-03-22 14:13:34|2015-03-22 14:13:34|
|2015-03-22T15:03:18|1646641557555|Mar, 2021|  2025-03-12|2025-03-12 14:08:...|    1427032998|2015-03-22 15:03:18|2015|    3|       81|2015-03-22 15:03:18|2015-03-22 15:03:18|
|2015-03-22T14:38:39|1646641578622|Jan, 2021|  2025-03-12|2025-03-12 14:08:...|    1427031519|2015-03-22 14:38:39|2015| 

DataFrame[timestamp: string, unix: bigint, Date: string, current_date: date, current_timestamp: timestamp, unix_timestamp: bigint, timestamp_formatted: string, year: int, month: int, dayofyear: int, timestamp_unix: timestamp, to_timestamp: timestamp]

In [63]:
#to_utc_timestamp()
from pyspark.sql.functions import to_utc_timestamp

toUtcTimestamp = dataFrame.withColumn("utc", to_utc_timestamp("timestamp_unix", "Europe/Warsaw"))
toUtcTimestamp.show()
display(toUtcTimestamp)

+-------------------+-------------+---------+------------+--------------------+--------------+-------------------+----+-----+---------+-------------------+-------------------+
|          timestamp|         unix|     Date|current_date|   current_timestamp|unix_timestamp|timestamp_formatted|year|month|dayofyear|     timestamp_unix|                utc|
+-------------------+-------------+---------+------------+--------------------+--------------+-------------------+----+-----+---------+-------------------+-------------------+
|2015-03-22T14:13:34|1646641525847|May, 2021|  2025-03-12|2025-03-12 14:08:...|    1427030014|2015-03-22 14:13:34|2015|    3|       81|2015-03-22 14:13:34|2015-03-22 13:13:34|
|2015-03-22T15:03:18|1646641557555|Mar, 2021|  2025-03-12|2025-03-12 14:08:...|    1427032998|2015-03-22 15:03:18|2015|    3|       81|2015-03-22 15:03:18|2015-03-22 14:03:18|
|2015-03-22T14:38:39|1646641578622|Jan, 2021|  2025-03-12|2025-03-12 14:08:...|    1427031519|2015-03-22 14:38:39|2015| 

DataFrame[timestamp: string, unix: bigint, Date: string, current_date: date, current_timestamp: timestamp, unix_timestamp: bigint, timestamp_formatted: string, year: int, month: int, dayofyear: int, timestamp_unix: timestamp, utc: timestamp]

In [64]:
#from_utc_timestamp()
from pyspark.sql.functions import from_utc_timestamp

fromUtcTimestamp = dataFrame.withColumn("from_utc", from_utc_timestamp("timestamp_unix", "US/Alaska"))
fromUtcTimestamp.show()

+-------------------+-------------+---------+------------+--------------------+--------------+-------------------+----+-----+---------+-------------------+-------------------+
|          timestamp|         unix|     Date|current_date|   current_timestamp|unix_timestamp|timestamp_formatted|year|month|dayofyear|     timestamp_unix|           from_utc|
+-------------------+-------------+---------+------------+--------------------+--------------+-------------------+----+-----+---------+-------------------+-------------------+
|2015-03-22T14:13:34|1646641525847|May, 2021|  2025-03-12|2025-03-12 14:08:...|    1427030014|2015-03-22 14:13:34|2015|    3|       81|2015-03-22 14:13:34|2015-03-22 06:13:34|
|2015-03-22T15:03:18|1646641557555|Mar, 2021|  2025-03-12|2025-03-12 14:08:...|    1427032998|2015-03-22 15:03:18|2015|    3|       81|2015-03-22 15:03:18|2015-03-22 07:03:18|
|2015-03-22T14:38:39|1646641578622|Jan, 2021|  2025-03-12|2025-03-12 14:08:...|    1427031519|2015-03-22 14:38:39|2015| 

25/03/12 15:31:01 WARN HeartbeatReceiver: Removing executor driver with no recent heartbeats: 1045558 ms exceeds timeout 120000 ms
25/03/12 15:31:01 WARN SparkContext: Killing executors is not supported by current scheduler.
25/03/12 15:31:02 WARN Executor: Issue communicating with driver in heartbeater
org.apache.spark.SparkException: Exception thrown in awaitResult: 
	at org.apache.spark.util.SparkThreadUtils$.awaitResult(SparkThreadUtils.scala:56)
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:310)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
	at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:101)
	at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:85)
	at org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:80)
	at org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:642)
	at org.apache.spark.executor.Executor.reportHeartBeat(Executor.scala:1223)
	at 