## Date Manipulation Functions

Let us go through some of the Spark predefined functions which are used for manipulating dates.

### Starting Spark Context

Let us start spark context for this Notebook so that we can execute the code provided.

In [None]:
import org.apache.spark.sql.SparkSession

val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    appName("Processing Column Data").
    master("yarn").
    getOrCreate

In [None]:
spark

### Date and Time - Overview
Let us get an overview about Date and Time using available functions.
* We can use `current_date` to get today’s server date. 
  * Date will be returned using **yyyy-MM-dd** format.
* We can use `current_timestamp` to get current server time. 
  * Timestamp will be returned using **yyyy-MM-dd HH:mm:ss:SSS** format.
  * Hours will be by default in 24 hour format.


In [None]:
val l = List("X")

In [None]:
val df = l.toDF("dummy")

### Date and Time Arithmetic
Let us perform Date and Time Arithmetic using relevant functions.
* Adding days to a date or timestamp - `date_add`
* Subtracting days from a date or timestamp - `date_sub`
* Getting difference between 2 dates or timestamps - `datediff`
* Getting a number of months between 2 dates or timestamps - `months_between`
* Adding months to a date or timestamp - `add_months`
* Getting next day from a given date - `next_day`
* All the functions are self explanatory. We can apply these on standard date or timestamp. All the functions return date even when applied on timestamp field.

#### Tasks

Let us perform some tasks related to date arithmetic.
* Get help on each and every function first and understand what all arguments need to be passed.
* Create a Dataframe by name datetimesDF with columns date and time.

In [None]:
val datetimes = List(("2014-02-28", "2014-02-28 10:00:00.123"),
                     ("2016-02-29", "2016-02-29 08:08:08.999"),
                     ("2017-10-31", "2017-12-31 11:59:59.123"),
                     ("2019-11-30", "2019-08-31 00:00:00.000")
                    )

* Add 10 days to both date and time values.
* Subtract 10 days from both date and time values.
* Get the difference between current_date and date values as well as current_timestamp and time values.
* Get the number of months between current_date and date values as well as current_timestamp and time values.
* Add 3 months to both date values as well as time values.


### Using trunc and date_trunc
In Data Warehousing we quite often run to date reports such as week to date, month to date, year to date etc.
* We can use `trunc` or `date_trunc` for the same to get the beginning date of the week, month, current year etc by passing date or timestamp to it.
* We can use `trunc` to get beginning date of the month or year by passing date or timestamp to it - for example `trunc(current_date(), "MM")` will give the first of the current month.
* We can use `date_trunc` to get beginning date of the month or year as well as beginning time of the day or hour by passing timestamp to it.
  * Get beginning date based on month - `date_trunc("MM", current_timestamp())`
  * Get beginning time based on day - `date_trunc("DAY", current_timestamp())`

#### Tasks

Let us perform few tasks to understand trunc and date_trunc in detail.
* Create a Dataframe by name datetimesDF with columns date and time.

In [None]:
val datetimes = List(("2014-02-28", "2014-02-28 10:00:00.123"),
                     ("2016-02-29", "2016-02-29 08:08:08.999"),
                     ("2017-10-31", "2017-12-31 11:59:59.123"),
                     ("2019-11-30", "2019-08-31 00:00:00.000")
                    )

In [None]:
val datetimesDF = datetimes.toDF("date", "time")

In [None]:
datetimesDF.show(truncate=false)

* Get beginning month date using date field and beginning year date using time field.

* Get beginning hour time using date and time field.