# Pre-Defined Functions

Let us go through the pre-defined functions available in Postgresql.

* Overview of Pre-Defined Functions
* String Manipulation Functions
* Date Manipulation Functions
* Overview of Numeric Functions
* Data Type Conversion
* Handling Null Values
* Using CASE and WHEN
* Exercises - Pre-Defined Functions

Here are the key objectives of this section.
* How to use official documentation of Postgres to get syntax and symantecs of the pre-defined functions?
* Understand different categories of functions
* How to use functions effectively using real world examples?
* How to manipulate strings and dates?
* How to deal with nulls, convert data types etc?
* Self evaluate by solving the exercises by using multiple functions in tandem.

## Overview of Pre-Defined Functions

Like any RDBMS, Postgres provides robust set of pre-defined functions to come up with solutions quickly as per the business requirements. There are many functions, but we will see the most common ones here.

* Following are the categories of functions that are more commonly used.
  * String Manipulation
  * Date Manipulation
  * Numeric Functions
  * Type Conversion Functions
  * CASE and WHEN
  * and more
* One can go to the official documentation from [Postgres website](https://www.postgresql.org/).

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

In [None]:
%sql SELECT * FROM information_schema.routines LIMIT 10

In [None]:
%%sql 

SELECT * FROM information_schema.routines 
WHERE routine_name ~ 'str'

In [None]:
%%sql

SELECT substring('Thomas' from 2 for 3)

In [None]:
%%sql

SELECT substring('Thomas', 2, 3)

## String Manipulation Functions

We use string manipulation functions quite extensively. Here are some of the important functions which we typically use.
* Case Conversion - `lower`, `upper`, `initcap`
* Getting size of the column value - `length`
* Extracting Data - `substr` and `split_part`
* Trimming and Padding functions - `trim`, `rtrim`, `ltrim`, `rpad` and `lpad`
* Reversing strings - `reverse`
* Concatenating multiple strings `concat` and `concat_ws`

### Case Conversion and Length
Let us understand how to perform case conversion of a string and also get length of a string.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

* Case Conversion Functions - `lower`, `upper`, `initcap`

In [None]:
%%sql

SELECT lower('hEllo wOrlD') AS lower_result,
    upper('hEllo wOrlD') AS upper_result,
    initcap('hEllo wOrlD') AS initcap_result

* Getting length - `length`

In [None]:
%%sql

SELECT length('hEllo wOrlD') AS result

Let us see how to use these functions on top of the table. We will use orders table which was loaded as part of last section.

* order_status for all the orders is in upper case and we will convert every thing to lower case.

In [None]:
%%sql

SELECT * FROM orders LIMIT 10

In [None]:
%%sql

SELECT order_id, order_date, order_customer_id,
    lower(order_status) AS order_status,
    length(order_status) AS order_status_length
FROM orders LIMIT 10

### Extracting Data - substr and split_part
Let us understand how to extract data from strings using `substr`/`substring` as well as `split_part`.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

* We can extract sub string from main string using `substr` or `substring` position and length.
* For example, get first 4 characters from date to get year or get last 4 characters from fixed length unique id.
* `substring` have broader options (regular expression) and also can be used with different styles (using keywords such as `FROM`, `FOR`).
* Unlike in other relational databases, we cannot pass negative integers to `substr` or `substring` to get the information from right. We need to use functions like `right` instead.

In [None]:
%%sql

SELECT substr('2013-07-25 00:00:00.0', 1, 4) AS result

In [None]:
%%sql

SELECT substring('2013-07-25 00:00:00.0', 1, 4) AS result

In [None]:
%%sql

SELECT substring('2013-07-25 00:00:00.0' FROM 1 FOR 4) AS result

In [None]:
%%sql

SELECT substring('2013-07-25 00:00:00.0', 6, 2) AS result

In [None]:
%%sql

SELECT substring('2013-07-25 00:00:00.0', 9, 2) AS result

In [None]:
%%sql

SELECT substring('2013-07-25 00:00:00.0' from 12) AS result

In [None]:
%%sql

SELECT substr('2013-07-25 00:00:00.0', 12) AS result

In [None]:
%%sql

SELECT right('123 456 7890', 4) AS result

In [None]:
%%sql

SELECT left('123 456 7890', 3) AS result

```{note}
We can also use combination of `substring` and `length` like below to get last 4 digits or characters from a string.
```

In [None]:
%%sql

SELECT substring('123 456 7890' FROM length('123 456 7890') - 4) AS result

In [None]:
%%sql

SELECT substring('123 456 7890' FROM '....$') AS result

```{note}
Getting first 3 characters or digits as well as last 4 characters or digits using `substring`. However, this works only when the strings are of fixed length.
```

In [None]:
%%sql

WITH unique_ids AS (
    SELECT '241-80-7115' AS unique_id UNION
    SELECT '694-30-6851' UNION
    SELECT '586-92-5361' UNION
    SELECT '884-65-284' UNION
    SELECT '876-99-585' UNION
    SELECT '831-59-5593' UNION
    SELECT '399-88-3617' UNION
    SELECT '733-17-4217' UNION
    SELECT '873-68-9778' UNION
    SELECT '48'
) SELECT unique_id,
    substring(unique_id FROM 1 FOR 3) AS unique_id_first3,
    substring(unique_id FROM '....$') AS unique_id_last4
FROM unique_ids
ORDER BY unique_id

* Let us see how we can extract date part from order_date of orders.

In [None]:
%%sql

SELECT * FROM orders LIMIT 10

In [None]:
%%sql

SELECT order_id,
    substr(order_date::varchar, 1, 10) AS order_date, 
    order_customer_id, 
    order_status
FROM orders
LIMIT 10

Let us understand how to extract the information from the string where there is a delimiter.
* `split_part` can be used to split a string using delimiter and extract the information.
* If there is no data in a given position after splitting, it will be represented as empty string **''**.

In [None]:
%%sql

SELECT split_part('2013-07-25', '-', 1) AS result

In [None]:
%%sql

WITH addresses AS (
    SELECT '593 Fair Oaks Pass, Frankfort, Kentucky, 40618' AS address UNION
    SELECT ', Vancouver, Washington, 98687' UNION
    SELECT '83047 Glacier Hill Circle, Sacramento, California, 94237' UNION
    SELECT '935 Columbus Junction, Cincinnati, Ohio, 45213' UNION
    SELECT '03010 Nevada Crossing, El Paso, Texas, 88579' UNION
    SELECT '9 Dunning Circle, , Arizona, 85271' UNION
    SELECT '96 Fair Oaks Way, Decatur, Illinois, 62525' UNION
    SELECT '999 Caliangt Avenue, Greenville, South Carolina, 29615' UNION
    SELECT '2 Saint Paul Trail, Bridgeport, , 06673' UNION
    SELECT '3 Reindahl Center, Ogden, Utah'
) SELECT split_part(address, ', ', 1) street,
    split_part(address, ', ', 2) city,
    split_part(address, ', ', 3) state,
    split_part(address, ', ', 4) postal_code
FROM addresses
ORDER BY postal_code

In [None]:
%%sql

WITH addresses AS (
    SELECT '593 Fair Oaks Pass, Frankfort, Kentucky, 40618' AS address UNION
    SELECT ', Vancouver, Washington, 98687' UNION
    SELECT '83047 Glacier Hill Circle, Sacramento, California, 94237' UNION
    SELECT '935 Columbus Junction, Cincinnati, Ohio, 45213' UNION
    SELECT '03010 Nevada Crossing, El Paso, Texas, 88579' UNION
    SELECT '9 Dunning Circle, , Arizona, 85271' UNION
    SELECT '96 Fair Oaks Way, Decatur, Illinois, 62525' UNION
    SELECT '999 Caliangt Avenue, Greenville, South Carolina, 29615' UNION
    SELECT '2 Saint Paul Trail, Bridgeport, , 06673' UNION
    SELECT '3 Reindahl Center, Ogden, Utah'
) SELECT split_part(address, ', ', 1) street,
    split_part(address, ', ', 2) city,
    split_part(address, ', ', 3) state,
    split_part(address, ', ', 4) postal_code
FROM addresses
WHERE split_part(address, ', ', 1) = ''
ORDER BY postal_code

In [None]:
%%sql

WITH unique_ids AS (
    SELECT '241-80-7115' AS unique_id UNION
    SELECT '694-30-6851' UNION
    SELECT '586-92-5361' UNION
    SELECT '884-65-284' UNION
    SELECT '876-99-585' UNION
    SELECT '831-59-5593' UNION
    SELECT '399-88-3617' UNION
    SELECT '733-17-4217' UNION
    SELECT '873-68-9778' UNION
    SELECT '480-69-032'
) SELECT unique_id,
    substring(unique_id FROM 1 FOR 3) AS unique_id_first3,
    substring(unique_id FROM '....$') AS unique_id_last4,
    CASE WHEN length(split_part(unique_id, '-', 3)) = 4
        THEN split_part(unique_id, '-', 3)
        ELSE 'Invalid'
    END AS unique_id_last
FROM unique_ids
ORDER BY unique_id

### Using position or strpos

At times we might want to get the position of a substring in a main string. For example, we might want to check whether email ids have **@** in them. We can use functions such as `position` or `strpos`.

In [None]:
%%sql 

SELECT position('@' IN 'it@versity.com'),
    position('@' IN 'itversity.com')

In [None]:
%%sql 

SELECT strpos('it@versity.com', '@'),
    strpos('itversity.com', '@')

In [None]:
%%sql

WITH email_ids AS (
    SELECT 'bsellan0@yellowbook.com' AS email_id UNION
    SELECT 'rstelljes1@illinois.edu' UNION
    SELECT 'mmalarkey2@webeden.co.uk' UNION
    SELECT 'emussared3@redcross.org' UNION
    SELECT 'livashin4@bloglovin.com' UNION
    SELECT 'gkeach5@cbc.ca' UNION
    SELECT 'emasham6@xing.com' UNION
    SELECT 'rcobbald7@house.gov' UNION
    SELECT 'rdrohan8@washingtonpost.com' UNION
    SELECT 'aebben9@arstechnica.com'
)

### Trimming and Padding Functions

Let us understand how to trim or remove leading and/or trailing spaces in a string.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

* `ltrim` is used to remove the spaces on the left side of the string.
* `rtrim` is used to remove the spaces on the right side of the string.
* `trim` is used to remove the spaces on both sides of the string.

In [None]:
%%sql

SELECT ltrim('     Hello World') AS result

In [None]:
%%sql

SELECT rtrim('     Hello World       ') AS result

In [None]:
%%sql

SELECT length(trim('     Hello World       ')) AS result

In [None]:
%%sql

SELECT ltrim('----Hello World----', '-') AS result

In [None]:
%%sql

SELECT rtrim('----Hello World----', '-') AS result

In [None]:
%%sql

SELECT trim('----Hello World----', '-') AS result

Let us understand how to use padding to pad characters to a string.

* Let us assume that there are 3 fields - year, month and date which are of type integer.
* If we have to concatenate all the 3 fields and create a date, we might have to pad month and date with 0.
* `lpad` is used more often than `rpad` especially when we try to build the date from separate columns.

In [None]:
%%sql

SELECT 2013 AS year, 7 AS month, 25 AS myDate

In [None]:
%%sql

SELECT lpad(7::varchar, 2, '0') AS result

In [None]:
%%sql

SELECT lpad(10::varchar, 2, '0') AS result

In [None]:
%%sql

SELECT lpad(100::varchar, 2, '0') AS result

### Reverse and Concatenating multiple strings

Let us understand how to reverse a string as well as concatenate multiple strings.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

* We can use `reverse` to reverse a string.
* We can concatenate multiple strings using `concat` and `concat_ws`.
* `concat_ws` is typically used if we want to have the same string between all the strings that are being concatenated.

In [None]:
%%sql

SELECT reverse('Hello World') AS result

In [None]:
%%sql

SELECT concat('Hello ', 'World') AS result

In [None]:
%%sql

SELECT concat('Order Status is ', order_status) AS result
FROM orders LIMIT 10

In [None]:
%%sql

SELECT * FROM (SELECT 2013 AS year, 7 AS month, 25 AS myDate) q

In [None]:
%%sql

SELECT concat(year, '-', lpad(month::varchar, 2, '0'), '-',
              lpad(myDate::varchar, 2, '0')) AS order_date
FROM
    (SELECT 2013 AS year, 7 AS month, 25 AS myDate) q

In [None]:
%%sql

SELECT concat_ws('-', year, lpad(month::varchar, 2, '0'),
              lpad(myDate::varchar, 2, '0')) AS order_date
FROM
    (SELECT 2013 AS year, 7 AS month, 25 AS myDate) q

## Date Manipulation Functions

Let us go through some of the important date manipulation functions.
* Getting Current Date and Timestamp
* Date Arithmetic using `INTERVAL` and `-` operator
* Getting beginning date or time using `date_trunc`
* Extracting information using `to_char` as well as calendar functions.
* Dealing with unix timestamp using `from_unixtime`, `to_unix_timestamp`

### Getting Current Date and Timestamp

Let us understand how to get the details about current or today's date as well as current timestamp.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

* `current_date` is the function or operator which will return today's date.
* `current_timestamp` is the function or operator which will return current time up to milliseconds.
* These are not like other functions and do not use **()** at the end.
* There is a format associated with date and timestamp.
  * Date - `yyyy-MM-dd`
  * Timestamp - `yyyy-MM-dd HH:mm:ss.SSS`
* We can apply all string manipulation functions on date or timestamp once they are typecasted to strings using `varchar`.

In [None]:
%%sql

SELECT current_date AS current_date

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp

```{note}
Example of applying string manipulation functions on dates. However, it is not a good practice. Postgres provide functions on dates or timestamps for most of the common requirements.
```

In [None]:
%%sql

SELECT substring(current_date::varchar, 1, 4) AS current_date

### Date Arithmetic
Let us understand how to perform arithmetic on dates or timestamps.

* We can add or subtract days or months or years from date or timestamp by using special operator called as `INTERVAL`.
* We can also add or subtract hours, minutes, seconds etc from date or timestamp using `INTERVAL`.
* We can combine multiple criteria in one operation using `INTERVAL`
* We can get difference between 2 dates or timestamps using minus (`-`) operator.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

In [None]:
%%sql

SELECT current_date + INTERVAL '32 DAYS' AS result

In [None]:
%%sql

SELECT current_date + INTERVAL '730 DAYS' AS result

In [None]:
%%sql

SELECT current_date + INTERVAL '-730 DAYS' AS result

In [None]:
%%sql

SELECT current_date - INTERVAL '730 DAYS' AS result

In [None]:
%%sql

SELECT current_date + INTERVAL '3 MONTHS' AS result

In [None]:
%%sql

SELECT '2019-01-31'::date + INTERVAL '3 MONTHS' AS result

In [None]:
%%sql

SELECT '2019-01-31'::date + INTERVAL '3 MONTHS 3 DAYS 3 HOURS' AS result

In [None]:
%%sql

SELECT current_timestamp + INTERVAL '3 MONTHS' AS result

In [None]:
%%sql

SELECT current_timestamp + INTERVAL '10 HOURS' AS result

In [None]:
%%sql

SELECT current_timestamp + INTERVAL '10 MINUTES' AS result

In [None]:
%%sql

SELECT current_timestamp + INTERVAL '10 HOURS 10 MINUTES' AS result

In [None]:
%%sql

SELECT '2019-03-30'::date - '2017-12-31'::date AS result

In [None]:
%%sql

SELECT '2017-12-31'::date - '2019-03-30'::date AS result

In [None]:
%%sql

SELECT current_date - '2019-03-30'::date AS result

In [None]:
%%sql

SELECT current_timestamp - '2019-03-30'::date AS result

### Beginning Date or Time - date_trunc
Let us understand how to use `date_trunc` on dates or timestamps and get beginning date or time.

* We can use **MONTH** to get beginning date of the month.
* **YEAR** can be used to get begining date of the year.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

In [None]:
%%sql

SELECT date_trunc('YEAR', current_date) AS year_beginning

In [None]:
%%sql

SELECT date_trunc('MONTH', current_date) AS month_beginning

In [None]:
%%sql

SELECT date_trunc('WEEK', current_date) AS week_beginning

In [None]:
%%sql

SELECT date_trunc('DAY', current_date) AS day_beginning

In [None]:
%%sql

SELECT date_trunc('HOUR', current_timestamp) AS hour_beginning

### Extracting information using to_char

Let us understand how to use `to_char` to extract information from date or timestamp.

Here is how we can get date related information such as year, month, day etc from date or timestamp.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'yyyy') AS year

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'yy') AS year

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'MM') AS month

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'dd') AS day_of_month

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'DD') AS day_of_month

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'DDD') AS day_of_year

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'Mon') AS month_name

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'mon') AS month_name

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'Month') AS month_name

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'month') AS month_name

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'day') AS day_name

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'DY') AS day_name

```{note}
When we use `Day` to get the complete name of a day, it will return 9 character string by padding with spaces.
```

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'Day') AS dayname

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char('2020-11-17'::date, 'Day') AS dayname,
    length(to_char('2020-11-17'::date, 'Day')) AS dayname_length,
    length(trim(to_char('2020-11-17'::date, 'Day'))) AS dayname_trimmed_length

* Here is how we can get time related information such as hour, minute, seconds, milliseconds etc from timestamp.

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'HH') AS hour24

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'hh') AS hour12

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'mm') AS minutes

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'ss') AS seconds

In [None]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    to_char(current_timestamp, 'MS') AS millis

* Here is how we can get the information from date or timestamp in the format we require.

In [None]:
%%sql

SELECT to_char(current_timestamp, 'yyyyMM') AS current_month

In [None]:
%%sql

SELECT to_char(current_timestamp, 'yyyyMMdd') AS current_date

In [None]:
%%sql

SELECT to_char(current_timestamp, 'yyyy/MM/dd') AS current_date

### Extracting information - extract

We can get year, month, day etc from date or timestamp using `extract` function.
* Let us see the usage of `extract` such as year, quarter, month, week, day, hour etc.
* We can also use `date_part` in place of `extract`. However there is subtle difference in the syntax.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

In [None]:
%%sql

SELECT extract(century FROM current_date) AS century

In [None]:
%%sql

SELECT date_part('century', current_date) AS century

In [None]:
%%sql

SELECT extract(decade FROM current_date) AS decade

In [None]:
%%sql

SELECT extract(year FROM current_date) AS year

In [None]:
%%sql

SELECT extract(quarter FROM current_date) AS quarter

In [None]:
%%sql

SELECT extract(month FROM current_date) AS month

In [None]:
%%sql

SELECT extract(week FROM current_date) AS week

In [None]:
%%sql

SELECT extract(day FROM current_date) AS day

In [None]:
%%sql

SELECT extract(doy FROM current_date) AS day_of_year

In [None]:
%%sql

SELECT extract(dow FROM current_date) AS day_of_week

In [None]:
%%sql

SELECT extract(hour FROM current_timestamp) AS hour

In [None]:
%%sql

SELECT extract(minute FROM current_timestamp) AS minute

In [None]:
%%sql

SELECT extract(second FROM current_timestamp) AS second

In [None]:
%%sql

SELECT extract(milliseconds FROM current_timestamp) AS millis

### Dealing with Unix Timestamp

Let us go through the functions that can be used to deal with Unix Timestamp.

* `from_unixtime` can be used to convert Unix epoch to regular timestamp.
* `unix_timestamp` or `to_unix_timestamp` can be used to convert timestamp to Unix epoch.
* We can get Unix epoch or Unix timestamp by running `date '+%s'` in Unix/Linux terminal
* We can DESCRIBE on the above functions to get details about them.

Let us sww how we can use functions such as `from_unixtime`, `unix_timestamp` or `to_unix_timestamp` to convert between timestamp and Unix timestamp or epoch.

* We can unix epoch in Unix/Linux terminal using `date '+%s'`

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

In [None]:
%%sql

SELECT extract(epoch FROM current_date) AS date_epoch

In [None]:
%%sql

SELECT extract(epoch FROM '2019-04-30 18:18:51'::timestamp) AS unixtime

In [None]:
%%sql

SELECT to_timestamp(1556662731) AS time_from_epoch

In [None]:
%%sql

SELECT to_timestamp(1556662731)::date AS time_from_epoch

In [None]:
%%sql

SELECT to_char(to_timestamp(1556662731), 'yyyyMM')::int AS yyyyMM_from_epoch

## Overview of Numeric Functions

Here are some of the numeric functions we might use quite often.

* `abs` - always return positive number
* `sum`, `avg`
* `round` - rounds off to specified precision
* `ceil`, `floor` - always return integer.
* `greatest`
* `min`, `max`
* `random`
* `pow`, `sqrt`
* `cumedist`, `stddev`, `variance`

Some of the functions highlighted are aggregate functions, eg: `sum`, `avg`, `min`, `max` etc.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

In [None]:
%%sql

SELECT abs(-10.5), abs(10)

In [None]:
%%sql

SELECT avg(order_item_subtotal) AS order_revenue_avg FROM order_items
WHERE order_item_order_id = 2

In [None]:
%%sql

SELECT order_item_order_id, 
    sum(order_item_subtotal) AS order_revenue_avg 
FROM order_items
GROUP BY order_item_order_id
LIMIT 10

In [None]:
%%sql

SELECT
    round(10.58) rnd,
    floor(10.58) flr,
    ceil(10.58) cl

In [None]:
%%sql

SELECT
    round(10.48, 1) rnd,
    floor(10.48) flr,
    ceil(10.48) cl

In [None]:
%%sql

SELECT round(avg(order_item_subtotal::numeric), 2) AS order_revenue_avg 
FROM order_items
WHERE order_item_order_id = 2

In [None]:
%%sql

SELECT order_item_order_id, 
    round(sum(order_item_subtotal::numeric), 2) AS order_revenue_avg 
FROM order_items
GROUP BY order_item_order_id
LIMIT 10

In [None]:
%%sql

SELECT greatest(10, 11)

In [None]:
%%sql

SELECT order_item_order_id, 
    round(sum(order_item_subtotal)::numeric, 2) AS order_revenue_avg,
    min(order_item_subtotal) AS order_item_subtotal_min,
    max(order_item_subtotal) AS order_item_subtotal_max 
FROM order_items
GROUP BY order_item_order_id
LIMIT 10

In [None]:
%sql SELECT random()

In [None]:
%sql SELECT (random() * 100)::int

In [None]:
%sql SELECT pow(2, 2)::int, sqrt(4)

## Data Type Conversion

Let us understand how we can type cast to change the data type of extracted value to its original type.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

In [None]:
%%sql

SELECT '09'::int

In [None]:
%%sql

SELECT current_date AS current_date

In [None]:
%%sql

SELECT split_part('2020-09-30', '-', 2) AS month

In [None]:
%%sql

SELECT split_part('2020-09-30', '-', 2)::int AS month

In [None]:
%%sql

SELECT cast('0.04' AS FLOAT) AS result

In [None]:
%%sql

SELECT cast('09' AS INT) AS result

## Handling NULL Values

Let us understand how to handle nulls.
* By default if we try to add or concatenate null to another column or expression or literal, it will return null.
* If we want to replace null with some default value, we can use `nvl`.
  * Replace commission_pct with 0 if it is null.
  * We can also use `coalesce` in the place of `nvl`.
* `nvl2` can be used to perform one action when the value is not null and some other action when the value is null.
  * We want to increase commission_pct by 1 if it is not null and set commission_pct to 2 if it is null.
* `coalesce` returns first not null value if we pass multiple arguments to it.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

In [None]:
%%sql

SELECT 1 + NULL AS result

In [None]:
%%sql

SELECT coalesce(1, 0) AS result

In [None]:
%%sql

SELECT coalesce(NULL, NULL, 2, NULL, 3) AS result

In [None]:
%%sql

CREATE TABLE IF NOT EXISTS sales(
    sales_person_id INT,
    sales_amount FLOAT,
    commission_pct INT
)

In [None]:
%%sql

INSERT INTO sales VALUES
    (1, 1000, 10),
    (2, 1500, 8),
    (3, 500, NULL),
    (4, 800, 5),
    (5, 250, NULL)

In [None]:
%%sql

SELECT * FROM sales

In [None]:
%%sql

SELECT s.*, 
    round((sales_amount * commission_pct / 100)::numeric, 2) AS incorrect_commission_amount
FROM sales AS s

In [None]:
%%sql

SELECT s.*, 
    nvl(commission_pct, 0) AS commission_pct
FROM sales AS s

In [None]:
%%sql

SELECT s.*, 
    coalesce(commission_pct, 0) AS commission_pct
FROM sales AS s

In [None]:
%%sql

SELECT s.*, 
    round((sales_amount * coalesce(commission_pct, 0) / 100)::numeric, 2) AS commission_amount
FROM sales AS s

## Using CASE and WHEN
At times we might have to select values from multiple columns conditionally.
* We can use `CASE` and `WHEN` for that.
* Let us implement this conditional logic to come up with derived order_status.
  * If order_status is COMPLETE or CLOSED, set COMPLETED
  * If order_status have PENDING in it, then we will say PENDING
  * If order_status have PROCESSING or PAYMENT_REVIEW in it, then we will say PENDING
  * We will set all others as OTHER
* We can also have `ELSE` as part of `CASE` and `WHEN`.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

In [None]:
%%sql

SELECT DISTINCT order_status FROM orders

In [None]:
%%sql

SELECT o.*,
    CASE WHEN order_status IN ('COMPLETE', 'CLOSED') THEN 'COMPLETED'
    END AS updated_order_status
FROM orders o
LIMIT 10

In [None]:
%%sql

SELECT o.*,
    CASE WHEN order_status IN ('COMPLETE', 'CLOSED') THEN 'COMPLETED'
    ELSE order_status
    END AS updated_order_status
FROM orders o
LIMIT 10

In [None]:
%%sql

SELECT o.*,
    CASE 
        WHEN order_status IN ('COMPLETE', 'CLOSED') THEN 'COMPLETED'
        WHEN order_status LIKE '%PENDING%' THEN 'PENDING'
        ELSE 'OTHER'
    END AS updated_order_status
FROM orders o
LIMIT 10

In [None]:
%%sql

SELECT o.*,
    CASE 
        WHEN order_status IN ('COMPLETE', 'CLOSED') THEN 'COMPLETED'
        WHEN order_status LIKE '%PENDING%' OR order_status IN ('PROCESSING', 'PAYMENT_REVIEW')
            THEN 'PENDING'
        ELSE 'OTHER'
    END AS updated_order_status
FROM orders o
LIMIT 10

In [None]:
%%sql

SELECT DISTINCT order_status,
    CASE 
        WHEN order_status IN ('COMPLETE', 'CLOSED') THEN 'COMPLETED'
        WHEN order_status LIKE '%PENDING%' OR order_status IN ('PROCESSING', 'PAYMENT_REVIEW')
            THEN 'PENDING'
        ELSE 'OTHER'
    END AS updated_order_status
FROM orders
ORDER BY updated_order_status
LIMIT 10

## Exercises - Pre-Defined Functions

Here are the exercises to ensure our understanding related to Pre-Defined Functions.
* We will use **users** table as well as other tables we got as part of retail database.
* Information will be provided with each exercise.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

In [None]:
%%sql

DROP TABLE IF EXISTS users

In [None]:
%%sql

CREATE TABLE users(
    user_id SERIAL PRIMARY KEY,
    user_first_name VARCHAR(30),
    user_last_name VARCHAR(30),
    user_email_id VARCHAR(50),
    user_gender VARCHAR(1),
    user_unique_id VARCHAR(15),
    user_phone_no VARCHAR(20),
    user_dob DATE,
    created_ts TIMESTAMP
)

In [None]:
%%sql

insert into users (
    user_first_name, user_last_name, user_email_id, user_gender, 
    user_unique_id, user_phone_no, user_dob, created_ts
) VALUES
    ('Giuseppe', 'Bode', 'gbode0@imgur.com', 'M', '88833-8759', 
     '+86 (764) 443-1967', '1973-05-31', '2018-04-15 12:13:38'),
    ('Lexy', 'Gisbey', 'lgisbey1@mail.ru', 'F', '262501-029', 
     '+86 (751) 160-3742', '2003-05-31', '2020-12-29 06:44:09'),
    ('Karel', 'Claringbold', 'kclaringbold2@yale.edu', 'F', '391-33-2823', 
     '+62 (445) 471-2682', '1985-11-28', '2018-11-19 00:04:08'),
    ('Marv', 'Tanswill', 'mtanswill3@dedecms.com', 'F', '1195413-80', 
     '+62 (497) 736-6802', '1998-05-24', '2018-11-19 16:29:43'),
    ('Gertie', 'Espinoza', 'gespinoza4@nationalgeographic.com', 'M', '471-24-6869', 
     '+249 (687) 506-2960', '1997-10-30', '2020-01-25 21:31:10'),
    ('Saleem', 'Danneil', 'sdanneil5@guardian.co.uk', 'F', '192374-933', 
     '+63 (810) 321-0331', '1992-03-08', '2020-11-07 19:01:14'),
    ('Rickert', 'O''Shiels', 'roshiels6@wikispaces.com', 'M', '749-27-47-52', 
     '+86 (184) 759-3933', '1972-11-01', '2018-03-20 10:53:24'),
    ('Cybil', 'Lissimore', 'clissimore7@pinterest.com', 'M', '461-75-4198', 
     '+54 (613) 939-6976', '1978-03-03', '2019-12-09 14:08:30'),
    ('Melita', 'Rimington', 'mrimington8@mozilla.org', 'F', '892-36-676-2', 
     '+48 (322) 829-8638', '1995-12-15', '2018-04-03 04:21:33'),
    ('Benetta', 'Nana', 'bnana9@google.com', 'M', '197-54-1646', 
     '+420 (934) 611-0020', '1971-12-07', '2018-10-17 21:02:51'),
    ('Gregorius', 'Gullane', 'ggullanea@prnewswire.com', 'F', '232-55-52-58', 
     '+62 (780) 859-1578', '1973-09-18', '2020-01-14 23:38:53'),
    ('Una', 'Glayzer', 'uglayzerb@pinterest.com', 'M', '898-84-336-6', 
     '+380 (840) 437-3981', '1983-05-26', '2019-09-17 03:24:21'),
    ('Jamie', 'Vosper', 'jvosperc@umich.edu', 'M', '247-95-68-44', 
     '+81 (205) 723-1942', '1972-03-18', '2020-07-23 16:39:33'),
    ('Calley', 'Tilson', 'ctilsond@issuu.com', 'F', '415-48-894-3', 
     '+229 (698) 777-4904', '1987-06-12', '2020-06-05 12:10:50'),
    ('Peadar', 'Gregorowicz', 'pgregorowicze@omniture.com', 'M', '403-39-5-869', 
     '+7 (267) 853-3262', '1996-09-21', '2018-05-29 23:51:31'),
    ('Jeanie', 'Webling', 'jweblingf@booking.com', 'F', '399-83-05-03', 
     '+351 (684) 413-0550', '1994-12-27', '2018-02-09 01:31:11'),
    ('Yankee', 'Jelf', 'yjelfg@wufoo.com', 'F', '607-99-0411', 
     '+1 (864) 112-7432', '1988-11-13', '2019-09-16 16:09:12'),
    ('Blair', 'Aumerle', 'baumerleh@toplist.cz', 'F', '430-01-578-5', 
     '+7 (393) 232-1860', '1979-11-09', '2018-10-28 19:25:35'),
    ('Pavlov', 'Steljes', 'psteljesi@macromedia.com', 'F', '571-09-6181', 
     '+598 (877) 881-3236', '1991-06-24', '2020-09-18 05:34:31'),
    ('Darn', 'Hadeke', 'dhadekej@last.fm', 'M', '478-32-02-87', 
     '+370 (347) 110-4270', '1984-09-04', '2018-02-10 12:56:00'),
    ('Wendell', 'Spanton', 'wspantonk@de.vu', 'F', null, 
     '+84 (301) 762-1316', '1973-07-24', '2018-01-30 01:20:11'),
    ('Carlo', 'Yearby', 'cyearbyl@comcast.net', 'F', null, 
     '+55 (288) 623-4067', '1974-11-11', '2018-06-24 03:18:40'),
    ('Sheila', 'Evitts', 'sevittsm@webmd.com', null, '830-40-5287',
     null, '1977-03-01', '2020-07-20 09:59:41'),
    ('Sianna', 'Lowdham', 'slowdhamn@stanford.edu', null, '778-0845', 
     null, '1985-12-23', '2018-06-29 02:42:49'),
    ('Phylys', 'Aslie', 'paslieo@qq.com', 'M', '368-44-4478', 
     '+86 (765) 152-8654', '1984-03-22', '2019-10-01 01:34:28')

### Exercise 1

Get all the number of users created per year.
* Use **users** table for this exercise.
* Output should contain 4 digit year and count.
* Use date specific functions to get the year using created_ts.
* Make sure you define aliases to the columns as **created_year** and **user_count** respectively.
* Data should be sorted in ascending order by **created_year**.
* When you run the query using Jupyter environment, it might have decimals for integers. Hence you can display results even with decimal points.
* Here is the sample output.

|created_year|user_count|
|----|--|
|2018|13|
|2019|4|
|2020|8|


### Exercise 2

Get the day name of the birth days for all the users born in the month of June.
* Use **users** table for this exercise.
* Output should contain user_id, user_dob, user_email_id and user_day_of_birth.
* Use date specific functions to get the month using user_dob.
* **user_day_of_birth** should be full day with first character in upper case such as **Tuesday**
* Data should be sorted by day with in the month of May.

|user_id|user_dob|user_email_id|user_day_of_birth|
|-|----------|----------------------|------|
|4|1998-05-24|mtanswill3@dedecms.com|Sunday|
|12|1983-05-26|uglayzerb@pinterest.com|Thursday|
|1|1973-05-31|gbode0@imgur.com|Thursday|
|2|2003-05-31|lgisbey1@mail.ru|Saturday|

### Exercise 3

Get the names and email ids of users added in year 2019.

* Use **users** table for this exercise.
* Output should contain user_id, user_name, user_email_id, created_ts, created_year.
* Use date specific functions to get the year using created_ts.
* **user_name** is a derived column by concatenating user_first_name and user_last_name with space in between.
* **user_name** should have values in upper case.
* Data should be sorted in ascending order by user_name

|user_id|user_name|user_email_id|created_ts|created_year|
|-|---------|------|------|------|
|8|CYBIL LISSIMORE|clissimore7@pinterest.com|2019-12-09 14:08:30|2019.0|
|25|PHYLYS ASLIE|paslieo@qq.com|2019-10-01 01:34:28|2019.0|
|12|UNA GLAYZER|uglayzerb@pinterest.com|2019-09-17 03:24:21|2019.0|
|17|YANKEE JELF|yjelfg@wufoo.com|2019-09-16 16:09:12|2019.0|


### Exercise 4

Get the number of users by gender.

* Use **users** table for this exercise.
* Output should contain gender and user_count.
* For males the output should display **Male** and for females the output should display **Female**.
* If gender is not specified, then it should display **Not Specified**.
* Data should be sorted in descending order by user_count.

|user_gender|user_count|
|----|--|
|Female|13|
|Male|10|
|Not Specified|2|


### Exercise 5

Get last 4 digits of unique ids.

* Use **users** table for this exercise.
* Output should contain user_id, user_unique_id and user_unique_id_last4
* Unique ids are either null or not null.
* Unique ids contain numbers and hyphens and are of different length.
* We need to get last 4 digits discarding hyphens only when the number of digits are at least 9.
* If unique id is null, then you should dispaly **Not Specified**.
* After discarding hyphens, if unique id have less than 9 digits then you should display **Invalid Unique Id**.
* Data should be sorted by user_id. You might see **None** or **null** for those user ids where there is no unique id for **user_unique_id**

|user_id|user_unique_id|user_unique_id_last4|
|-|----|----|
|1|88833-8759|8759|
|2|262501-029|1029|
|3|391-33-2823|2823|
|4|1195413-80|1380|
|5|471-24-6869|6869|
|6|192374-933|4933|
|7|749-27-47-52|4752|
|8|461-75-4198|4198|
|9|892-36-676-2|6762|
|10|197-54-1646|1646|
|11|232-55-52-58|5258|
|12|898-84-336-6|3366|
|13|247-95-68-44|6844|
|14|415-48-894-3|8943|
|15|403-39-5-869|5869|
|16|399-83-05-03|0503|
|17|607-99-0411|0411|
|18|430-01-578-5|5785|
|19|571-09-6181|6181|
|20|478-32-02-87|0287|
|21||Not Specified|
|22||Not Specified|
|23|830-40-5287|5287|
|24|778-0845|Invalid Unique Id|
|25|368-44-4478|4478|

### Exercise 6

Get the count of users based up on country code.

* Use users table for this exercise.
* Output should contain country code and count.
* There should be no `+` in the country code. It should only contain digits.
* Data should be sorted as numbers by country code.
* We should discard user_phone_no with null values.
* Here is the desired output:

|country_code|user_count|
|-|-|
|1|1|
|7|2|
|48|1|
|54|1|
|55|1|
|62|3|
|63|1|
|81|1|
|84|1|
|86|4|
|229|1|
|249|1|
|351|1|
|370|1|
|380|1|
|420|1|
|598|1|

### Exercise 7

Let us validate if we have invalid **order_item_subtotal** as part of **order_items** table.

* **order_items** table have 6 fields.
  * order_item_id
  * order_item_order_id
  * order_item_product_id
  * order_item_quantity
  * order_item_subtotal
  * order_item_product_price
* **order_item_subtotal** is nothing but product of **order_item_quantity** and **order_item_product_price**. It means order_item_subtotal is compute by multiplying order_item_quantity and order_item_product_price for each item.
* You need to get the count of order_items where **order_item_subtotal** is not equal to the product of **order_item_quantity** and **order_item_product_price**.
* There can be issues related to rounding off. Make sure it is taken care using appropriate function.
* Output should be 0 as there are no such records.

|count|
|-|
|0|

### Exercise 8

Get number of orders placed on weekdays and weekends in the month of January 2014.

* **orders** have 4 fields
  * order_id
  * order_date
  * order_customer_id
  * order_status
* Use order date to determine the day on which orders are placed.
* Output should contain 2 columns - day_type and order_count.
* **day_type** should have 2 values **Week days** and **Weekend days**.
* Here is the desired output.

|day_type|order_count|
|-|-|
|Weekend days|1505|
|Week days|4403|