## Date and Time - Extracting Information

Let us understand how to extract information from dates or times using functions.

* We can use date_format to extract the required information in a desired format from date or timestamp.
* There are also specific functions to extract year, month, day with in a week, a day with in a month, day with in a year etc.

### Tasks

Let us perform few tasks to extract the information we need from date or timestamp.

* Create a Dataframe by name datetimesDF with columns date and time.

Let us start spark context for this Notebook so that we can execute the code provided. You can sign up for our [10 node state of the art cluster/labs](https://labs.itversity.com/plans) to learn Spark SQL using our unique integrated LMS.

In [None]:
from pyspark.sql import SparkSession

import getpass
username = getpass.getuser()

spark = SparkSession. \
    builder. \
    config('spark.ui.port', '0'). \
    config("spark.sql.warehouse.dir", f"/user/{username}/warehouse"). \
    enableHiveSupport(). \
    appName(f'{username} | Python - Processing Column Data'). \
    master('yarn'). \
    getOrCreate()

If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.

**Using Spark SQL**

```
spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Scala**

```
spark2-shell \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Pyspark**

```
pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

In [None]:
datetimes = [("2014-02-28", "2014-02-28 10:00:00.123"),
                     ("2016-02-29", "2016-02-29 08:08:08.999"),
                     ("2017-10-31", "2017-12-31 11:59:59.123"),
                     ("2019-11-30", "2019-08-31 00:00:00.000")
                ]

In [None]:
datetimesDF = spark.createDataFrame(datetimes, schema="date STRING, time STRING")

In [None]:
datetimesDF.show(truncate=False)

* Get year from fields date and time.

In [None]:
from pyspark.sql.functions import year

In [None]:
datetimesDF. \
    withColumn("date_year", year("date")). \
    withColumn("time_year", year("time")). \
    show(truncate=False)

* Get one or two digit month from fields date and time.

In [None]:
from pyspark.sql.functions import month

In [None]:
datetimesDF. \
    withColumn("date_month", month("date")). \
    withColumn("time_month", month("time")). \
    show(truncate=False)

* Get year and month in yyyyMM format from date and time.

In [None]:
from pyspark.sql.functions import date_format

In [None]:
datetimesDF. \
    withColumn("date_ym", date_format("date", "yyyyMM")). \
    withColumn("time_ym", date_format("time", "yyyyMM")). \
    show(truncate=False)

# yyyy
# MM
# dd
# DD
# HH
# hh
# mm
# ss
# SSS

* Get day with in a week, a day with in a month and day within a year from date and time.

In [None]:
from pyspark.sql.functions import dayofweek, dayofmonth, dayofyear

In [None]:
datetimesDF. \
    withColumn("date_dow", dayofweek("date")). \
    withColumn("time_dow", dayofweek("time")). \
    withColumn("date_dom", dayofmonth("date")). \
    withColumn("time_dom", dayofmonth("time")). \
    withColumn("date_doy", dayofyear("date")). \
    withColumn("time_doy", dayofyear("time")). \
    show(truncate=False)

* Get the information from time in yyyyMMddHHmmss format.

In [None]:
from pyspark.sql.functions import date_format

In [None]:
datetimesDF. \
    withColumn("date_ts", date_format("time", "yyyyMMddHHmmss")). \
    show(truncate=False)