# Pre-Defined Functions

Let us go through the pre-defined functions available in Postgresql.

* Overview of Pre-Defined Functions
* String Manipulation Functions
* Date Manipulation Functions
* Overview of Numeric Functions
* Data Type Conversion
* Handling Null Values
* Using CASE and WHEN
* Exercises - Pre-Defined Functions

## Overview of Pre-Defined Functions

Like any RDBMS, Postgres provide robust set of pre-defined functions to come up with the solutions quickly as per the business requirements. There are many functions, but we will see the most common ones here.

* Following are the categories of functions that are more commonly used.
  * String Manipulation
  * Date Manipulation
  * Numeric Functions
  * Type Conversion Functions
  * CASE and WHEN
  * and more

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

In [None]:
%sql SELECT * FROM information_schema.routines LIMIT 10

## String Manipulation Functions

We use string manipulation functions quite extensively. Here are some of the important functions which we typically use.
* Case Conversion - `lower`, `upper`, `initcap`
* Getting size of the column value - `length`
* Extracting Data - `substr` and `split_part`
* Trimming and Padding functions - `trim`, `rtrim`, `ltrim`, `rpad` and `lpad`
* Reversing strings - `reverse`
* Concatenating multiple strings `concat` and `concat_ws`

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

### Case Conversion and Length
Let us understand how to perform case conversion functions of a string and also length of a string.

* Case Conversion Functions - `lower`, `upper`, `initcap`

In [None]:
%%sql

SELECT lower('hEllo wOrlD') AS lower_result,
    upper('hEllo wOrlD') AS upper_result,
    initcap('hEllo wOrlD') AS initcap_result

* Getting length - `length`

In [None]:
%%sql

SELECT length('hEllo wOrlD') AS result

Let us see how to use these functions on top of the table. We will use orders table which was loaded as part of last section.

* order_status for some of the orders is in lower case and we will convert every thing to upper case.

In [None]:
%%sql

SELECT * FROM orders LIMIT 10

In [None]:
%%sql

SELECT order_id, order_date, order_customer_id,
    lower(order_status) AS order_status,
    length(order_status) AS order_status_length
FROM orders LIMIT 10

### Extracting Data - substr and split_part
Let us understand how to extract data from strings using `substr`/`substring` and `split`.

* We can get syntax and symantecs of the functions using `DESCRIBE FUNCTION`
* We can extract first four characters from string using substr or substring.

In [None]:
%%sql

SELECT substr('2013-07-25 00:00:00.0', 1, 4) AS result

In [None]:
%%sql

SELECT substr('2013-07-25 00:00:00.0', 6, 2) AS result

In [None]:
%%sql

SELECT substr('2013-07-25 00:00:00.0', 9, 2) AS result

In [None]:
%%sql

SELECT substr('2013-07-25 00:00:00.0', 12) AS result

* Let us see how we can extract date part from order_date of orders.

In [None]:
%%sql

SELECT * FROM orders LIMIT 10

In [None]:
%%sql

SELECT order_id,
  substr(order_date::varchar, 1, 10) AS order_date,
  order_customer_id,
  order_status
FROM orders
LIMIT 10

Let us understand how to extract the information from the string where there is a delimiter.
* `split` converts delimited string into array.

In [None]:
%%sql

SELECT split_part('2013-07-25', '-', 1) AS result

### Trimming and Padding Functions

Let us understand how to trim or remove leading and/or trailing spaces in a string.

* `ltrim` is used to remove the spaces on the left side of the string.
* `rtrim` is used to remove the spaces on the right side of the string.
* `trim` is used to remove the spaces on both sides of the string.

In [None]:
%%sql

SELECT ltrim('     Hello World') AS result

In [None]:
%%sql

SELECT rtrim('     Hello World       ') AS result

In [None]:
%%sql

SELECT length(trim('     Hello World       ')) AS result

In [None]:
%%sql

SELECT ltrim('----Hello World----', '-') AS result

In [None]:
%%sql

SELECT rtrim('----Hello World----', '-') AS result

In [None]:
%%sql

SELECT trim('----Hello World----', '-') AS result

Let us understand how to use padding to pad characters to a string.

* Let us assume that there are 3 fields - year, month and date which are of type integer.
* If we have to concatenate all the 3 fields and create a date, we might have to pad month and date with 0.
* `lpad` is used more often than `rpad` especially when we try to build the date from separate columns.

In [None]:
%%sql

SELECT 2013 AS year, 7 AS month, 25 AS myDate

In [None]:
%%sql

SELECT lpad(7::varchar, 2, '0') AS result

In [None]:
%%sql

SELECT lpad(10::varchar, 2, '0') AS result

In [None]:
%%sql

SELECT lpad(100::varchar, 2, '0') AS result

### Reverse and Concatenating multiple strings

Let us understand how to reverse a string as well as concatenate multiple strings.
* We can use `reverse` to reverse a string.
* We can concatenate multiple strings using `concat` and `concat_ws`.
* `concat_ws` is typically used if we want to have the same string between all the strings that are being concatenated.

In [None]:
%%sql

SELECT reverse('Hello World') AS result

In [None]:
%%sql

SELECT concat('Hello ', 'World') AS result

In [None]:
%%sql

SELECT concat('Order Status is ', order_status) AS result
FROM orders LIMIT 10

In [None]:
%%sql

SELECT * FROM (SELECT 2013 AS year, 7 AS month, 25 AS myDate) q

In [None]:
%%sql

SELECT concat(year, '-', lpad(month::varchar, 2, '0'), '-',
              lpad(myDate::varchar, 2, '0')) AS order_date
FROM
    (SELECT 2013 AS year, 7 AS month, 25 AS myDate) q

In [None]:
%%sql

SELECT concat_ws('-', year, lpad(month::varchar, 2, '0'),
              lpad(myDate::varchar, 2, '0')) AS order_date
FROM
    (SELECT 2013 AS year, 7 AS month, 25 AS myDate) q

## Date Manipulation Functions

Let us go through some of the important date manipulation functions.
* Getting Current Date and Timestamp
* Date Arithmetic using `INTERVAL` and `-` operator
* Getting beginning date or time using `date_trunc`
* Extracting information using `to_char` as well as calendar functions.
* Dealing with unix timestamp using `from_unixtime`, `to_unix_timestamp`

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

### Getting Current Date and Timestamp

Let us understand how to get the details about current or today's date as well as current timestamp.

* `current_date` is the function or operator which will return today's date.
* `current_timestamp` is the function or operator which will return current time up to milliseconds.
* These are not like other functions and do not use **()** at the end.
* There is a format associated with date and timestamp.
  * Date - `yyyy-MM-dd`
  * Timestamp - `yyyy-MM-dd HH:mm:ss.SSS`
* Keep in mind that a date or timestamp in Spark SQL are nothing but special strings containing values using above specified formats. We can apply all string manipulation functions on date or timestamp.

In [None]:
%%sql

SELECT current_date AS current_date

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp

### Date Arithmetic
Let us understand how to perform arithmetic on dates or timestamps.

* We can add or subtract days or months or years from date or timestamp by using special operator called as `INTERVAL`.
* We can also add or subtract hours, minutes, seconds etc from date or timestamp using `INTERVAL`.
* We can combine multiple criteria in one operation using `INTERVAL`
* We can get difference between 2 dates or timestamps using minus (`-`) operator.

In [None]:
%%sql

SELECT current_date + INTERVAL '32 DAYS' AS result

In [None]:
%%sql

SELECT current_date + INTERVAL '730 DAYS' AS result

In [None]:
%%sql

SELECT current_date + INTERVAL '-730 DAYS' AS result

In [None]:
%%sql

SELECT current_date - INTERVAL '730 DAYS' AS result

In [None]:
%%sql

SELECT current_date + INTERVAL '3 MONTHS' AS result

In [None]:
%%sql

SELECT '2019-01-31'::date + INTERVAL '3 MONTHS' AS result

In [None]:
%%sql

SELECT '2019-01-31'::date + INTERVAL '3 MONTHS 3 DAYS 3 HOURS' AS result

In [None]:
%%sql

SELECT current_timestamp + INTERVAL '3 MONTHS' AS result

In [None]:
%%sql

SELECT current_timestamp + INTERVAL '10 HOURS' AS result

In [None]:
%%sql

SELECT current_timestamp + INTERVAL '10 MINUTES' AS result

In [None]:
%%sql

SELECT current_timestamp + INTERVAL '10 HOURS 10 MINUTES' AS result

In [None]:
%%sql

SELECT '2019-03-30'::date - '2017-12-31'::date AS result

In [None]:
%%sql

SELECT '2017-12-31'::date - '2019-03-30'::date AS result

In [None]:
%%sql

SELECT current_date - '2019-03-30'::date AS result

In [None]:
%%sql

SELECT current_timestamp - '2019-03-30'::date AS result

### Beginning Date or Time - date_trunc
Let us understand how to use `date_trunc` on dates or timestamps and get beginning date or time.

* We can use **MONTH** to get beginning date of the month.
* **YEAR** can be used to get begining date of the year.

In [None]:
%%sql

SELECT date_trunc('YEAR', current_date) AS year_beginning

In [None]:
%%sql

SELECT date_trunc('MONTH', current_date) AS month_beginning

In [None]:
%%sql

SELECT date_trunc('WEEK', current_date) AS week_beginning

In [None]:
%%sql

SELECT date_trunc('DAY', current_date) AS day_beginning

In [None]:
%%sql

SELECT date_trunc('HOUR', current_timestamp) AS hour_beginning

### Extracting information using to_char

Let us understand how to use `to_char` to extract information from date or timestamp.

Here is how we can get date related information such as year, month, day etc from date or timestamp.

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'yyyy') AS year

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'yy') AS year

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'MM') AS month

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'dd') AS day_of_month

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'DD') AS day_of_month

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'DDD') AS day_of_year

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'Mon') AS month_name

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'mon') AS month_name

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'Month') AS month_name

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'month') AS month_name

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'day') AS day_name

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'DY') AS day_name

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'Day') AS dayname

* Here is how we can get time related information such as hour, minute, seconds, milliseconds etc from timestamp.

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'HH') AS hour24

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'hh') AS hour12

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'mm') AS minutes

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'ss') AS seconds

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'MS') AS millis

* Here is how we can get the information from date or timestamp in the format we require.

In [None]:
%%sql

SELECT to_char(current_timestamp, 'yyyyMM') AS current_month

In [None]:
%%sql

SELECT to_char(current_timestamp, 'yyyyMMdd') AS current_date

In [None]:
%%sql

SELECT to_char(current_timestamp, 'yyyy/MM/dd') AS current_date

### Extracting information - extract

We can get year, month, day etc from date or timestamp using `extract` function.
* Let us see the usage of `extract` such as year, quarter, month, week, day, hour etc.
* We can also use `date_part` in place of `extract`. However there is subtle difference in the syntax.

In [None]:
%%sql

SELECT extract(century FROM current_date) AS century

In [None]:
%%sql

SELECT date_part('century', current_date) AS century

In [None]:
%%sql

SELECT extract(decade FROM current_date) AS decade

In [None]:
%%sql

SELECT extract(year FROM current_date) AS year

In [None]:
%%sql

SELECT extract(quarter FROM current_date) AS quarter

In [None]:
%%sql

SELECT extract(month FROM current_date) AS month

In [None]:
%%sql

SELECT extract(week FROM current_date) AS week

In [None]:
%%sql

SELECT extract(day FROM current_date) AS day

In [None]:
%%sql

SELECT extract(doy FROM current_date) AS day_of_year

In [None]:
%%sql

SELECT extract(dow FROM current_date) AS da_of_week

In [None]:
%%sql

SELECT extract(hour FROM current_timestamp) AS hour

In [None]:
%%sql

SELECT extract(minute FROM current_timestamp) AS minute

In [None]:
%%sql

SELECT extract(second FROM current_timestamp) AS second

In [None]:
%%sql

SELECT extract(milliseconds FROM current_timestamp) AS millis

### Dealing with Unix Timestamp

Let us go through the functions that can be used to deal with Unix Timestamp.

* `from_unixtime` can be used to convert Unix epoch to regular timestamp.
* `unix_timestamp` or `to_unix_timestamp` can be used to convert timestamp to Unix epoch.
* We can get Unix epoch or Unix timestamp by running `date '+%s'` in Unix/Linux terminal
* We can DESCRIBE on the above functions to get details about them.

Let us sww how we can use functions such as `from_unixtime`, `unix_timestamp` or `to_unix_timestamp` to convert between timestamp and Unix timestamp or epoch.

* We can unix epoch in Unix/Linux terminal using `date '+%s'`

In [None]:
%%sql

SELECT extract(epoch FROM current_date) AS date_epoch

In [None]:
%%sql

SELECT extract(epoch FROM '2019-04-30 18:18:51'::timestamp) AS unixtime

In [None]:
%%sql

SELECT to_timestamp(1556662731) AS time_from_epoch

In [None]:
%%sql

SELECT to_timestamp(1556662731)::date AS time_from_epoch

In [None]:
%%sql

SELECT to_char(to_timestamp(1556662731), 'yyyyMM')::int AS yyyyMM_from_epoch

## Overview of Numeric Functions

Here are some of the numeric functions we might use quite often.

* `abs` - always return positive number
* `sum`, `avg`
* `round` - rounds off to specified precision
* `ceil`, `floor` - always return integer.
* `greatest`
* `min`, `max`
* `random`
* `pow`, `sqrt`
* `cumedist`, `stddev`, `variance`

Some of the functions highlighted are aggregate functions, eg: `sum`, `avg`, `min`, `max` etc.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

In [None]:
%%sql

SELECT abs(-10.5), abs(10)

In [None]:
%%sql

SELECT avg(order_item_subtotal) AS order_revenue_avg FROM order_items
WHERE order_item_order_id = 2

In [None]:
%%sql

SELECT order_item_order_id, 
    sum(order_item_subtotal) AS order_revenue_avg 
FROM order_items
GROUP BY order_item_order_id
LIMIT 10

In [None]:
%%sql

SELECT
    round(10.58) rnd,
    floor(10.58) flr,
    ceil(10.58) cl

In [None]:
%%sql

SELECT
    round(10.48, 1) rnd,
    floor(10.48) flr,
    ceil(10.48) cl

In [None]:
%%sql

SELECT round(avg(order_item_subtotal::numeric), 2) AS order_revenue_avg 
FROM order_items
WHERE order_item_order_id = 2

In [None]:
%%sql

SELECT order_item_order_id, 
    round(sum(order_item_subtotal::numeric), 2) AS order_revenue_avg 
FROM order_items
GROUP BY order_item_order_id
LIMIT 10

In [None]:
%%sql

SELECT greatest(10, 11)

In [None]:
%%sql

SELECT order_item_order_id, 
    round(sum(order_item_subtotal)::numeric, 2) AS order_revenue_avg,
    min(order_item_subtotal) AS order_item_subtotal_min,
    max(order_item_subtotal) AS order_item_subtotal_max 
FROM order_items
GROUP BY order_item_order_id
LIMIT 10

In [None]:
%sql SELECT random()

In [None]:
%sql SELECT (random() * 100)::int

In [None]:
%sql SELECT pow(2, 2)::int, sqrt(4)

## Data Type Conversion

Let us understand how we can type cast to change the data type of extracted value to its original type.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

In [None]:
%%sql

SELECT '09'::int

In [None]:
%%sql

SELECT current_date AS current_date

In [None]:
%%sql

SELECT split_part('2020-09-30', '-', 2) AS month

In [None]:
%%sql

SELECT split_part('2020-09-30', '-', 2)::int AS month

In [None]:
%%sql

SELECT cast('0.04' AS FLOAT) AS result

In [None]:
%%sql

SELECT cast('09' AS INT) AS result

## Handling NULL Values

Let us understand how to handle nulls.
* By default if we try to add or concatenate null to another column or expression or literal, it will return null.
* If we want to replace null with some default value, we can use `nvl`.
  * Replace commission_pct with 0 if it is null.
  * We can also use `coalesce` in the place of `nvl`.
* `nvl2` can be used to perform one action when the value is not null and some other action when the value is null.
  * We want to increase commission_pct by 1 if it is not null and set commission_pct to 2 if it is null.
* `coalesce` returns first not null value if we pass multiple arguments to it.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

In [None]:
%%sql

SELECT 1 + NULL AS result

In [None]:
%%sql

SELECT coalesce(1, 0) AS result

In [None]:
%%sql

SELECT coalesce(NULL, NULL, 2, NULL, 3) AS result

In [None]:
%%sql

CREATE TABLE IF NOT EXISTS sales(
    sales_person_id INT,
    sales_amount FLOAT,
    commission_pct INT
)

In [None]:
%%sql

INSERT INTO sales VALUES
    (1, 1000, 10),
    (2, 1500, 8),
    (3, 500, NULL),
    (4, 800, 5),
    (5, 250, NULL)

In [None]:
%%sql

SELECT * FROM sales

In [None]:
%%sql

SELECT s.*, 
    round((sales_amount * commission_pct / 100)::numeric, 2) AS incorrect_commission_amount
FROM sales AS s

In [None]:
%%sql

SELECT s.*, 
    nvl(commission_pct, 0) AS commission_pct
FROM sales AS s

In [None]:
%%sql

SELECT s.*, 
    coalesce(commission_pct, 0) AS commission_pct
FROM sales AS s

In [None]:
%%sql

SELECT s.*, 
    round((sales_amount * coalesce(commission_pct, 0) / 100)::numeric, 2) AS commission_amount
FROM sales AS s

## Using CASE and WHEN
At times we might have to select values from multiple columns conditionally.
* We can use `CASE` and `WHEN` for that.
* Let us implement this conditional logic to come up with derived order_status.
  * If order_status is COMPLETE or CLOSED, set COMPLETED
  * If order_status have PENDING in it, then we will say PENDING
  * If order_status have PROCESSING or PAYMENT_REVIEW in it, then we will say PENDING
  * We will set all others as OTHER
* We can also have `ELSE` as part of `CASE` and `WHEN`.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

In [None]:
%%sql

SELECT DISTINCT order_status FROM orders

In [None]:
%%sql

SELECT o.*,
    CASE WHEN order_status IN ('COMPLETE', 'CLOSED') THEN 'COMPLETED'
    END AS updated_order_status
FROM orders o
LIMIT 10

In [None]:
%%sql

SELECT o.*,
    CASE WHEN order_status IN ('COMPLETE', 'CLOSED') THEN 'COMPLETED'
    ELSE order_status
    END AS updated_order_status
FROM orders o
LIMIT 10

In [None]:
%%sql

SELECT o.*,
    CASE 
        WHEN order_status IN ('COMPLETE', 'CLOSED') THEN 'COMPLETED'
        WHEN order_status LIKE '%PENDING%' THEN 'PENDING'
        ELSE 'OTHER'
    END AS updated_order_status
FROM orders o
LIMIT 10

In [None]:
%%sql

SELECT o.*,
    CASE 
        WHEN order_status IN ('COMPLETE', 'CLOSED') THEN 'COMPLETED'
        WHEN order_status LIKE '%PENDING%' OR order_status IN ('PROCESSING', 'PAYMENT_REVIEW')
            THEN 'PENDING'
        ELSE 'OTHER'
    END AS updated_order_status
FROM orders o
LIMIT 10

In [None]:
%%sql

SELECT DISTINCT order_status,
    CASE 
        WHEN order_status IN ('COMPLETE', 'CLOSED') THEN 'COMPLETED'
        WHEN order_status LIKE '%PENDING%' OR order_status IN ('PROCESSING', 'PAYMENT_REVIEW')
            THEN 'PENDING'
        ELSE 'OTHER'
    END AS updated_order_status
FROM orders
ORDER BY updated_order_status
LIMIT 10