# Partitioning Tables and Indexes

As part of this section we will primarily talk about partitioning tables as well as indexes.

* Overview of Partitioning
* List Partitioning
* Adding Partitions 
* Inserting Data
* Dropping Partitions
* Managing Indexes
* Range Partitioning
* Hash Partitioning
* Usage Scenarios

## Overview of Partitioning

Most of the modern database technologies support wide variety of partitioning strategies. However, here are the most commonly used ones.
* List Partitioning
* Range Partitioning
* Hash Partitioning
* List and Range are more widely used compared to Hash Partitioning.
* We can also mix and match these to have multi level partitioning. It is known as sub partitioning.
* We can only partition a table with primary key when partition column is prime attribute (one of the primary key columns).

## List Partitioning

Let us understand how we can take care of list partitioning of tables.
* It is primarily used to create partitions based up on the values.



In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_sms_user:sms_password@localhost:5432/itversity_sms_db

In [None]:
%sql DROP TABLE IF EXISTS users

In [None]:
%%sql

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    user_first_name VARCHAR(30) NOT NULL,
    user_last_name VARCHAR(30) NOT NULL,
    user_email_id VARCHAR(50) NOT NULL,
    user_email_validated BOOLEAN DEFAULT FALSE,
    user_password VARCHAR(200),
    user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A
    is_active BOOLEAN DEFAULT FALSE,
    created_dt DATE DEFAULT CURRENT_DATE,
    last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)

In [None]:
%sql DROP TABLE IF EXISTS users_part

In [None]:
%%sql

CREATE TABLE users_part (
    user_id SERIAL,
    user_first_name VARCHAR(30) NOT NULL,
    user_last_name VARCHAR(30) NOT NULL,
    user_email_id VARCHAR(50) NOT NULL,
    user_email_validated BOOLEAN DEFAULT FALSE,
    user_password VARCHAR(200),
    user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A
    is_active BOOLEAN DEFAULT FALSE,
    created_dt DATE DEFAULT CURRENT_DATE,
    last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_role, user_id)
) PARTITION BY LIST(user_role)

In [111]:
%sql DROP TABLE IF EXISTS users_mthly_part

 * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db
Done.


[]

In [112]:
%%sql

CREATE TABLE users_mthly_part (
    user_id SERIAL,
    user_first_name VARCHAR(30) NOT NULL,
    user_last_name VARCHAR(30) NOT NULL,
    user_email_id VARCHAR(50) NOT NULL,
    user_email_validated BOOLEAN DEFAULT FALSE,
    user_password VARCHAR(200),
    user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A
    is_active BOOLEAN DEFAULT FALSE,
    created_dt DATE DEFAULT CURRENT_DATE,
    created_mnth INT,
    last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) PARTITION BY LIST(created_mnth)

 * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db
Done.


[]

In [113]:
%sql ALTER TABLE users_mthly_part ADD PRIMARY KEY (created_mnth, user_id)

 * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db
Done.


[]

In [114]:
%%sql

SELECT table_name, column_name, ordinal_position, is_nullable 
FROM information_schema.columns
WHERE table_name = 'users_mthly_part'

 * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db
11 rows affected.


table_name,column_name,ordinal_position,is_nullable
users_mthly_part,user_id,1,NO
users_mthly_part,user_first_name,2,NO
users_mthly_part,user_last_name,3,NO
users_mthly_part,user_email_id,4,NO
users_mthly_part,user_email_validated,5,YES
users_mthly_part,user_password,6,YES
users_mthly_part,user_role,7,NO
users_mthly_part,is_active,8,YES
users_mthly_part,created_dt,9,YES
users_mthly_part,created_mnth,10,NO


In [115]:
%%sql

INSERT INTO users_mthly_part (user_first_name, user_last_name, user_email_id)
VALUES 
    ('Scott', 'Tiger', 'scott@tiger.com'),
    ('Donald', 'Duck', 'donald@duck.com'),
    ('Mickey', 'Mouse', 'mickey@mouse.com')

 * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db


IntegrityError: (psycopg2.errors.CheckViolation) no partition of relation "users_mthly_part" found for row
DETAIL:  Partition key of the failing row contains (created_mnth) = (null).

[SQL: INSERT INTO users_mthly_part (user_first_name, user_last_name, user_email_id)
VALUES 
    ('Scott', 'Tiger', 'scott@tiger.com'),
    ('Donald', 'Duck', 'donald@duck.com'),
    ('Mickey', 'Mouse', 'mickey@mouse.com')]
(Background on this error at: http://sqlalche.me/e/13/gkpj)

In [116]:
%%sql

INSERT INTO users_mthly_part 
    (user_first_name, user_last_name, user_email_id, created_dt, created_mnth)
VALUES 
    ('Scott', 'Tiger', 'scott@tiger.com', '2018-10-01', 201801),
    ('Donald', 'Duck', 'donald@duck.com', '2019-02-10', 201902),
    ('Mickey', 'Mouse', 'mickey@mouse.com', '2017-06-22', 201706)

 * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db


IntegrityError: (psycopg2.errors.CheckViolation) no partition of relation "users_mthly_part" found for row
DETAIL:  Partition key of the failing row contains (created_mnth) = (201801).

[SQL: INSERT INTO users_mthly_part (user_first_name, user_last_name, user_email_id, created_dt, created_mnth)
VALUES 
    ('Scott', 'Tiger', 'scott@tiger.com', '2018-10-01', 201801),
    ('Donald', 'Duck', 'donald@duck.com', '2019-02-10', 201902),
    ('Mickey', 'Mouse', 'mickey@mouse.com', '2017-06-22', 201706)]
(Background on this error at: http://sqlalche.me/e/13/gkpj)

## Adding Partitions

Let us see how we can add partitions to existing table which is created using list partitioning strategy.
* We can create partition by using `CREATE TABLE <partition_name> PARTITION OF <table_name>` syntax.
* We can have a partition for default values so that all the data that does not satisfy the condition can be added to it.
* We can have a partition for each value or for a set of values.
* Once partitions are added, we can insert data into it.


In [117]:
%%sql

CREATE TABLE users_mthly_part_default
PARTITION OF users_mthly_part DEFAULT

 * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db
Done.


[]

In [118]:
%%sql

CREATE TABLE users_mthly_part_201801 
PARTITION OF users_mthly_part  
FOR VALUES IN (201801)

 * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db
Done.


[]

In [119]:
%%sql

CREATE TABLE users_mthly_part_201902
PARTITION OF users_mthly_part  
FOR VALUES IN (201902)

 * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db
Done.


[]

In [120]:
%%sql

CREATE TABLE users_mthly_part_201706
PARTITION OF users_mthly_part  
FOR VALUES IN (201706)

 * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db
Done.


[]

## Inserting Data

Let us understand how we can insert data into a partitioned table.
* We need to ensure at least default partition is added.
* On top of default partition, make sure to have other desired partitions added.
* We can insert into the table or directly into the partition.

In [121]:
%%sql

INSERT INTO users_mthly_part 
    (user_first_name, user_last_name, user_email_id, created_dt, created_mnth)
VALUES 
    ('Scott', 'Tiger', 'scott@tiger.com', '2018-10-01', 201801),
    ('Donald', 'Duck', 'donald@duck.com', '2019-02-10', 201902)

 * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db
2 rows affected.


[]

In [122]:
%%sql

SELECT user_first_name, user_last_name, user_email_id, created_dt, created_mnth
FROM users_mthly_part

 * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db
2 rows affected.


user_first_name,user_last_name,user_email_id,created_dt,created_mnth
Scott,Tiger,scott@tiger.com,2018-10-01,201801
Donald,Duck,donald@duck.com,2019-02-10,201902


In [123]:
%%sql

INSERT INTO users_mthly_part_201706
    (user_first_name, user_last_name, user_email_id, created_dt, created_mnth)
VALUES 
    ('Mickey', 'Mouse', 'mickey@mouse.com', '2017-06-22', 201706)

 * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db
1 rows affected.


[]

In [124]:
%%sql

SELECT user_first_name, user_last_name, user_email_id, created_dt, created_mnth
FROM users_mthly_part

 * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db
3 rows affected.


user_first_name,user_last_name,user_email_id,created_dt,created_mnth
Mickey,Mouse,mickey@mouse.com,2017-06-22,201706
Scott,Tiger,scott@tiger.com,2018-10-01,201801
Donald,Duck,donald@duck.com,2019-02-10,201902


In [125]:
%%sql

SELECT user_first_name, user_last_name, user_email_id, created_dt, created_mnth
FROM users_mthly_part_201706

 * postgresql://itversity_sms_user:***@localhost:5432/itversity_sms_db
1 rows affected.


user_first_name,user_last_name,user_email_id,created_dt,created_mnth
Mickey,Mouse,mickey@mouse.com,2017-06-22,201706


## Dropping Partitions

Let us understand how we can drop partitions from a partitioned table.
* We just need to drop the partition similar to dropping a table.
* Here is the syntax - `DROP TABLE <table_for_partition>`
* One can also detach partition instead of dropping it. It is useful in case of taking the back up of partitions before dropping it.

In [None]:
%%sql

DROP TABLE users_mthly_part_201706

In [None]:
%%sql

SELECT user_first_name, user_last_name, user_email_id, created_dt, created_mnth
FROM users_mthly_part

## Managing Indexes

In [None]:
%%sql

SELECT *
FROM pg_catalog.pg_indexes
WHERE tablename = 'users_mthly_part'

In [None]:
%%sql

CREATE INDEX users_mthly_part_idx1
ON users_mthly_part(created_dt)

In [None]:
%%sql

SELECT *
FROM pg_catalog.pg_indexes
WHERE tablename = 'users_mthly_part'

## Range Partitioning

In [None]:
%sql DROP TABLE IF EXISTS users_range_part

In [None]:
%%sql

CREATE TABLE users_range_part (
    user_id SERIAL,
    user_first_name VARCHAR(30) NOT NULL,
    user_last_name VARCHAR(30) NOT NULL,
    user_email_id VARCHAR(50) NOT NULL,
    user_email_validated BOOLEAN DEFAULT FALSE,
    user_password VARCHAR(200),
    user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A
    is_active BOOLEAN DEFAULT FALSE,
    created_dt DATE DEFAULT CURRENT_DATE,
    last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (created_dt, user_id)
) PARTITION BY RANGE(created_dt)

In [None]:
%%sql

CREATE TABLE users_range_part_default
PARTITION OF users_range_part DEFAULT

In [None]:
%%sql

CREATE TABLE users_range_part_2017
PARTITION OF users_range_part
FOR VALUES FROM ('2017-01-01') TO ('2017-12-31')

In [None]:
%%sql

CREATE TABLE users_range_part_2018
PARTITION OF users_range_part
FOR VALUES FROM ('2018-01-01') TO ('2018-12-31')

In [None]:
%%sql

CREATE TABLE users_range_part_2019
PARTITION OF users_range_part
FOR VALUES FROM ('2019-01-01') TO ('2019-12-31')

In [None]:
%%sql

CREATE TABLE users_range_part_2020
PARTITION OF users_range_part
FOR VALUES FROM ('2020-01-01') TO ('2020-12-31')

In [None]:
%%sql

INSERT INTO users_range_part 
    (user_first_name, user_last_name, user_email_id, created_dt)
VALUES 
    ('Scott', 'Tiger', 'scott@tiger.com', '2018-10-01'),
    ('Donald', 'Duck', 'donald@duck.com', '2019-02-10'),
    ('Mickey', 'Mouse', 'mickey@mouse.com', '2017-06-22')

In [None]:
%%sql

SELECT user_first_name, user_last_name, user_email_id, created_dt
FROM users_range_part_default

In [None]:
%%sql

SELECT user_first_name, user_last_name, user_email_id, created_dt
FROM users_range_part_2017

In [None]:
%%sql

SELECT user_first_name, user_last_name, user_email_id, created_dt
FROM users_range_part_2018

In [None]:
%%sql

SELECT user_first_name, user_last_name, user_email_id, created_dt
FROM users_range_part_2019

In [None]:
%%sql

SELECT user_first_name, user_last_name, user_email_id, created_dt
FROM users_range_part_2020

## Hash Partitioning

## Usage Scenarios

Let us go through some of the usage scenarios with respect to partitioning.
* It is typically used to manage large tables so that the tables does not grow abnormally large over a period of time.
* Paritioning is quite often used on top of log tables, reporting tables etc.
* If a log table is partitioned and if we want to have data for 7 years, partitions older than 7 years can be quickly dropped.
* Dropping partittions to clean up huge chunk of data is much faster compared to running delete command on non partitioned table.
* For tables like orders with limited set of statuses, we often use list partitioning based up on the status. It can be 2 partitions (CLOSED orders and ACTIVE orders) or separate partition for each status.
  * As most of the operations will be on **Active Orders**, this approach can significantly improve the performance.
* In case of log tables, where we might want to retain data for several years, we tend to use range partition on date column. If we use list partition, then we might end up in duplication of data unnecessarily.