# Overview of Partitioning¶
 Most of the modern database technologies support wide variety of partitioning strategies. However, here are the most commonly used ones.

- List Partitioning
- Range Partitioning
- Hash Partitioning
- List and Range are more widely used compared to Hash Partitioning.
- We can also mix and match these to have multi level partitioning. It is known as sub partitioning.
- We can either partition a table with out primary key or partition a table with primary key when partition column is prime attribute (one of the primary key columns).
- Indexes can be added to the partitioned table. If we create on the main table, it is global index and if we create index on each partition then it is partitioned index.

# Load SQL 

In [48]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


# connect DB

In [63]:
%env DATABASE_URL = postgresql://shubham_sms_user:shubham@172.25.87.65:5432/shubham_sms_db

env: DATABASE_URL=postgresql://shubham_sms_user:shubham@172.25.87.65:5432/shubham_sms_db


# List Partitioning
Let us understand how we can take care of list partitioning of tables.

- It is primarily used to create partitions based up on the values.
- Here are the steps involved in creating table using list partitioning strategy.
- Create table using PARTITION BY LIST
- Add default and value specific partitions
- Validate by inserting data into the table
- We can detach as well as drop the partitions from the table.

## Create Partitioned Table
Let us create partitioned table with name **users_part.**

- It contains same columns as <font color = 'pink'>users</font>.

- We will partition based up on <font color = 'pink'> user_role field</font>.

In [6]:
%%sql 

Drop table if exists shubham.users

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [7]:
%%sql

   Create Table shubham.users(
        user_id SERIAL PRIMARY KEY,
        user_first_name VARCHAR(30) NOT NULL,
        user_last_name VARCHAR(30) NOT NULL,
        user_email_id VARCHAR(50) NOT NULL,
        user_email_validated BOOLEAN DEFAULT FALSE,
        user_password VARCHAR(200),
        user_role VARCHAR(1) NOT NULL DEFAULT 'U', -- U and A
        is_active BOOLEAN DEFAULT FALSE,
        created_dt DATE DEFAULT CURRENT_DATE,
        last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    );

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [8]:
%%sql 

drop table if exists users_part

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [10]:
%%sql

   Create Table shubham.users_part(
        user_id SERIAL ,
        user_first_name VARCHAR(30) NOT NULL,
        user_last_name VARCHAR(30) NOT NULL,
        user_email_id VARCHAR(50) NOT NULL,
        user_email_validated BOOLEAN DEFAULT FALSE,
        user_password VARCHAR(200),
        user_role VARCHAR(1) NOT NULL DEFAULT 'U', -- U and A
        is_active BOOLEAN DEFAULT FALSE,
        created_dt DATE DEFAULT CURRENT_DATE,
        last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        primary key (user_role, user_id)
    ) PARTITION BY LIST (user_role);

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

- additional index on the user part table on Email column 

In [13]:
%%sql 

create index users_part_email_id_idx
    on shubham.users_part(user_email_id)

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

- If we try to insert the data into the partition table at this time (without adding the partitions) it will give us error. So we need to first add the partition then we will be able to add the data.

In [16]:
%%sql 

insert into shubham.users_part(user_first_name, user_last_name,user_email_id)
    values
        ('Scott','Tiger','scott@tiger.com'),
        ('Donald', 'Duck','donald@duck.com'),
        ('Mickey','Mouse','mickey@mouse.com')

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db


IntegrityError: (psycopg2.errors.CheckViolation) no partition of relation "users_part" found for row
DETAIL:  Partition key of the failing row contains (user_role) = (U).

[SQL: insert into shubham.users_part(user_first_name, user_last_name,user_email_id)
    values
        ('Scott','Tiger','scott@tiger.com'),
        ('Donald', 'Duck','donald@duck.com'),
        ('Mickey','Mouse','mickey@mouse.com')]
(Background on this error at: https://sqlalche.me/e/14/gkpj)

## Managing Partitions - List

Let us understand how to manage partitions for a partitioned table using <font color = 'pink'>users_part</font>.

- All users data with <font color = 'pink'>user_role</font> as **‘U’** should go to one partition by name <font color = 'pink'>users_part_u</font>.
- All users data with <font color = 'pink'>user_role</font> as **‘A’** should go to one partition by name <font color= 'pink'>users_part_a</font>.
- We can add partition to existing partitioned table using <font color='pink'>CREATE TABLE partition_name PARTITION OF table_name</font>.
- We can have a partition for default values so that all the data that does not satisfy the partition condition can be added to it.
- We can have a partition for each value or for a set of values.
    - We can have one partition for <font color = 'pink'>U</font> as well as <font color ='pink'>A</font> and default partition for all other values.
    - We can have individual partitions for <font color = 'pink'>U, A</font> respectively and default partition for all other values.
    - We can use <font color = 'pink'>FOR VALUES IN (val1, val2)</font> as part of <font color= 'pink'>CREATE TABLE partition_name PARTITION OF table_name</font> to specify values for respective table created for partition.
- Once partitions are added, we can insert data into the partitioned table.
- We can detach using <font color ='pink'>ALTER TABLE</font> and drop the partition or drop the partition directly. To drop the partition we need to use <font color = 'pink'>DROP TABLE</font> command.

- we will create partition and then we will add them into the table then we will enter the data
- We need to create the default partition also becaue if the data is not belong to any partition then that data will be the part of the default partition.
- To create the default partition we will use the keyword <font color = 'pink'>DEFAULT</font> at the end of the statemet.

In [22]:
%%sql 

create table shubham.users_part_default -- name of the partition table
    partition of shubham.users_part default

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

- We have created the partition for the table and add that to the main table 
- Now we can insert the data and according to partition coloum data will be add to that partition table 
- We can insert the data directly to the specific partition table but it is recommended to add the data into main partition table it will automatically assign to the specific partition table. 

In [23]:
%%sql 

insert into shubham.users_part(user_first_name, user_last_name,user_email_id,user_role)
    values
        ('Scott','Tiger','scott@tiger.com','U'),
        ('Donald', 'Duck','donald@duck.com','U'),
        ('Mickey','Mouse','mickey@mouse.com','U')

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
3 rows affected.


[]

- Check the data into all the partition tables and main table
- till now we have only make a default partition so the data will be in the defult table.

In [24]:
%%sql
 
    select * from shubham.users_part

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
3 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
2,Scott,Tiger,scott@tiger.com,False,,U,False,2023-02-16,2023-02-16 06:42:43.342259
3,Donald,Duck,donald@duck.com,False,,U,False,2023-02-16,2023-02-16 06:42:43.342259
4,Mickey,Mouse,mickey@mouse.com,False,,U,False,2023-02-16,2023-02-16 06:42:43.342259


In [50]:
%%sql 

    select * from shubham.users_part_default

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
3 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
3,Donald,Duck,donald@duck.com,False,,U,False,2023-02-16,2023-02-16 06:42:43.342259
4,Mickey,Mouse,mickey@mouse.com,False,,U,False,2023-02-16,2023-02-16 06:42:43.342259
5,Scott,Tiger,scott@tiger.com,False,,U,False,2023-02-16,2023-02-16 06:59:03.902444


- Now we will create and add the other partition tables and insert the data into them 

In [53]:
%%sql 

    create table shubham.users_part_A
        partition of shubham.users_part
            for values IN ('A') -- for values used to give the constrain on the values we specify in the IN clause

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [56]:
%%sql 

    update shubham.users_part
        set user_role = 'A'
        where user_email_id = 'scott@tiger.com'
        

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
1 rows affected.


[]

In [57]:
%%sql

    select * from shubham.users_part

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
3 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
5,Scott,Tiger,scott@tiger.com,False,,A,False,2023-02-16,2023-02-16 06:59:03.902444
3,Donald,Duck,donald@duck.com,False,,U,False,2023-02-16,2023-02-16 06:42:43.342259
4,Mickey,Mouse,mickey@mouse.com,False,,U,False,2023-02-16,2023-02-16 06:42:43.342259


In [58]:
%%sql 

select * from shubham.users_part_default;


 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
2 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
3,Donald,Duck,donald@duck.com,False,,U,False,2023-02-16,2023-02-16 06:42:43.342259
4,Mickey,Mouse,mickey@mouse.com,False,,U,False,2023-02-16,2023-02-16 06:42:43.342259


In [59]:
%%sql 

select * from shubham.users_part_A

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
1 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
5,Scott,Tiger,scott@tiger.com,False,,A,False,2023-02-16,2023-02-16 06:59:03.902444


- Now if we try to create the partition for user role <font color = 'pink'>U</font> then it will throw error because we already have the data for the user role <font color = 'pink'>U</font>.


In [62]:
%%sql

create table shubham.users_part_U
    partition of shubham.users_part
     for values IN('U')

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db


IntegrityError: (psycopg2.errors.CheckViolation) updated partition constraint for default partition "users_part_default" would be violated by some row

[SQL: create table shubham.users_part_U partition of shubham.users_part
     for values IN('U')]
(Background on this error at: https://sqlalche.me/e/14/gkpj)

- To rectify this error we need to follow the below steps.
    1. Detach the **DEFAULT** partition table form the main table
    2. Create the partition table 
    3. Insert the data into main table again form detached default table.
    4. Drop the  default table 
    5. Re-create the default partition table 

In [64]:
%%sql 

Alter table shubham.users_part
    detach partition shubham.users_part_default

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [65]:
%%sql

create table shubham.users_part_U
    partition of shubham.users_part
     for values IN('U')

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [66]:
%%sql 

insert into shubham.users_part
select * from shubham.users_part_default

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
2 rows affected.


[]

In [67]:
%%sql

select * from shubham.users_part

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
3 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
5,Scott,Tiger,scott@tiger.com,False,,A,False,2023-02-16,2023-02-16 06:59:03.902444
3,Donald,Duck,donald@duck.com,False,,U,False,2023-02-16,2023-02-16 06:42:43.342259
4,Mickey,Mouse,mickey@mouse.com,False,,U,False,2023-02-16,2023-02-16 06:42:43.342259


In [68]:
%%sql

select * from shubham.users_part_U

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
2 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
3,Donald,Duck,donald@duck.com,False,,U,False,2023-02-16,2023-02-16 06:42:43.342259
4,Mickey,Mouse,mickey@mouse.com,False,,U,False,2023-02-16,2023-02-16 06:42:43.342259


In [70]:
%%sql 

drop table shubham.users_part_default

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [71]:
%%sql 

create table shubham.users_part_default
    partition of shubham.users_part default

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

## Manipulating the Data

- We can insert data using the table (most preferred way).
- As we define table for each partition, we can insert data using table created for specific partition.
- In the case of users_part partitioned table, we can either use table nameusers_part or partition name users_part_u to insert records with user_role ‘U’.

``` SQL
CREATE TABLE users_part_u 
PARTITION OF users_part  
FOR VALUES IN ('U') 
```

- As part of the update, if we change the value in a partitioned column which will result in changing partition, then internally data from one partition will be moved to other.
- We can delete the data using the table or the table created for each partition (either by using table name users_part or partitions such as users_part_u, users_part_a etc
- DML is same irrespective of the partitioning strategy. This applies to all 3 partitioning strategies - list, range as well as hash.

In [4]:
%%sql 

Truncate table shubham.users_part

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [6]:
%%sql 

insert into shubham.users_part (user_first_name, user_last_name,user_email_id,user_role)
values
        ('Scott', 'Tiger', 'scott@tiger.com', 'U'),
        ('Donald', 'Duck', 'donald@duck.com', 'U'),
        ('Mickey', 'Mouse', 'mickey@mouse.com', 'U')

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
3 rows affected.


[]

In [7]:
%%sql 

select * from shubham.users_part_U

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
3 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
6,Scott,Tiger,scott@tiger.com,False,,U,False,2023-02-18,2023-02-18 15:43:53.949662
7,Donald,Duck,donald@duck.com,False,,U,False,2023-02-18,2023-02-18 15:43:53.949662
8,Mickey,Mouse,mickey@mouse.com,False,,U,False,2023-02-18,2023-02-18 15:43:53.949662


In [9]:
%%sql 

Insert into shubham.users_part_A (user_first_name, user_last_name, user_email_id,user_role)
values 
        ('Matt', 'Clarke', 'matt@clarke.com', 'A')

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
1 rows affected.


[]

In [10]:
%%sql 

select * from shubham.users_part

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
4 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
9,Matt,Clarke,matt@clarke.com,False,,A,False,2023-02-18,2023-02-18 15:46:55.406824
6,Scott,Tiger,scott@tiger.com,False,,U,False,2023-02-18,2023-02-18 15:43:53.949662
7,Donald,Duck,donald@duck.com,False,,U,False,2023-02-18,2023-02-18 15:43:53.949662
8,Mickey,Mouse,mickey@mouse.com,False,,U,False,2023-02-18,2023-02-18 15:43:53.949662


In [11]:
%%sql

    update shubham.users_part 
        set user_role = 'A'
            where user_email_id = 'donald@duck.com'

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
1 rows affected.


[]

In [12]:
%%sql 

select * from shubham.users_part_A

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
2 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
9,Matt,Clarke,matt@clarke.com,False,,A,False,2023-02-18,2023-02-18 15:46:55.406824
7,Donald,Duck,donald@duck.com,False,,A,False,2023-02-18,2023-02-18 15:43:53.949662


# Range Partitioning

Let us understand how we can take care of range partitioning of tables.

- It is primarily used to create partitions based up on a given range of values.
- Here are the steps involved in creating table using range partitioning strategy.
    - Create table using <font color = 'pink'>PARTITION BY RANGE</font>
    - Add default and range specific partitions
    - Validate by inserting data into the table
- We can detach as well as drop the partitions from the table

## Create Partitioned Table
Let us create partitioned table with name <font color = 'pink'>users_range_part</font>.

- It contains same columns as <font color = 'pink'> users </font>.
- We will partition the table based up on <font color = 'pink'>created_dt</font> field.
- We will create one partition per year with naming convention **users_range_part_yyyy** (users_range_part_2016).

In [15]:
%%sql 

drop table if exists users_range_part

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

- In the below create table we have to make date field as a primary key along with user id because we are going to use this field as partition field.

In [19]:
%%sql

CREATE TABLE shubham.users_range_part (
    user_id SERIAL,
    user_first_name VARCHAR(30) NOT NULL,
    user_last_name VARCHAR(30) NOT NULL,
    user_email_id VARCHAR(50) NOT NULL,
    user_email_validated BOOLEAN DEFAULT FALSE,
    user_password VARCHAR(200),
    user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A
    is_active BOOLEAN DEFAULT FALSE,
    created_dt DATE DEFAULT CURRENT_DATE,
    last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (created_dt, user_id)
) PARTITION BY RANGE(created_dt)

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

- In order to add the data into the above table and we will get the error so, first we need to add the partition table to this table 

## Managing Partitions - Range
Let us understand how to manage partitions for the table <font color = 'pink'> users_range_part</font>.

- All users data created in a specific year should go to the respective partition created.
- For example, all users data created in the year of 2016 should go to <font color = 'pink'> users_range_part_2016</font>.
- We can add partition to existing partitioned table using <font color = 'pink'>CREATE TABLE partition_name PARTITION OF table_name</font>.
- We can have a partition for default values so that all the data that does not satisfy the partition condition can be added to it.
- We can have a partition for specific range of values using <font color = 'pink'>FOR VALUES FROM (from_value) TO (to_value) as part of CREATE TABLE partition_name PARTITION OF table_name</font>.
- Once partitions are added, we can insert data into the partitioned table.

In [24]:
%%sql 

create table shubham.users_range_part_default
partition of shubham.users_range_part default

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [25]:
%%sql 

create table shubham.users_range_part_2016
partition of shubham.users_range_part
for values from ('2016-01-01') to ('2016-12-31')

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

- if we now create a partition for 2017 and include the values form 2016 then we will get errors because we already used the 2016 data into other partition so we will not use that range again.

In [30]:
%%sql 

create table shubham.users_range_part_2017
partition of shubham.users_range_part
for values from ('2016-01-01') TO ('2017-12-31')

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
(psycopg2.errors.InvalidObjectDefinition) partition "users_range_part_2017" would overlap partition "users_range_part_2016"
LINE 2: for values from ('2016-01-01') TO ('2017-12-31')
                         ^

[SQL: create table shubham.users_range_part_2017 partition of shubham.users_range_part
for values from ('2016-01-01') TO ('2017-12-31')]
(Background on this error at: https://sqlalche.me/e/14/f405)


- As we include the 2016 range then we got error in the above one

In [31]:
%%sql 

create table shubham.users_range_part_2017
partition of shubham.users_range_part
for values from ('2017-01-01') TO ('2017-12-31')

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

- We will create for 2018,2019,2020 as below

In [32]:
%%sql 

create table shubham.users_range_part_2018
partition of shubham.users_range_part
for values from ('2018-01-01') TO ('2018-12-31');


create table shubham.users_range_part_2019
partition of shubham.users_range_part
for values from ('2019-01-01') TO ('2019-12-31');
 


create table shubham.users_range_part_2020
partition of shubham.users_range_part
for values from ('2020-01-01') TO ('2020-12-31')

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.
Done.
Done.


[]

## Inserting Data 

In [34]:
%%sql 

INSERT INTO shubham.users_range_part 
    (user_first_name, user_last_name, user_email_id, created_dt)
VALUES 
    ('Scott', 'Tiger', 'scott@tiger.com', '2018-10-01'),
    ('Donald', 'Duck', 'donald@duck.com', '2019-02-10'),
    ('Mickey', 'Mouse', 'mickey@mouse.com', '2017-06-22')


 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
3 rows affected.


[]

In [35]:
%%sql 

select * from shubham.users_range_part_default

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
0 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts


In [36]:
%%sql

select * from shubham.users_range_part_2017

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
1 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
3,Mickey,Mouse,mickey@mouse.com,False,,U,False,2017-06-22,2023-02-18 16:32:53.447217


In [37]:
%%sql

SELECT *
FROM shubham.users_range_part_2018

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
1 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
1,Scott,Tiger,scott@tiger.com,False,,U,False,2018-10-01,2023-02-18 16:32:53.447217


## Repartitioning - Range

Let us understand how we can repartition the existing partitioned table.

- We will use **users_range_part** table. It is originally partitioned for each year.
- Now we would like to partition for each month.
- Here are the steps that are involved in repartitioning from year to month.
    - Detach all yearly partitions from **users_range_part**.
    - Add new partitions for each month.
    - Load data from detached partitions into the table with new partitions for each month.
    - Validate to ensure that all the data is copied.
    - Drop all the detached partitions.

In [26]:
%%sql 

Alter table shubham.users_range_part 
    detach partition shubham.users_range_part_201601

    

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
(psycopg2.errors.UndefinedTable) relation "shubham.users_range_part_201601" does not exist

[SQL: Alter table shubham.users_range_part detach partition shubham.users_range_part_201601]
(Background on this error at: https://sqlalche.me/e/14/f405)


In [7]:
%%sql 

Alter table shubham.users_range_part 
    detach partition shubham.users_range_part_2017

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [8]:
%%sql 

Alter table shubham.users_range_part 
    detach partition shubham.users_range_part_2018

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [9]:
%%sql 

Alter table shubham.users_range_part 
    detach partition shubham.users_range_part_2019

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [10]:
%%sql 

Alter table shubham.users_range_part 
    detach partition shubham.users_range_part_2020

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [11]:
!pip install psycopg2

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3[0m[39;49m -> [0m[32;49m23.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


- how to get the start date and end date of month

In [17]:
import pandas as pd
from pandas.tseries.offsets import MonthBegin, MonthEnd

months = pd.date_range(start = '1/1/2016',end = '3/31/2016', freq = '1M')

for month in months:
    print(month)
    begin_date = month - MonthBegin(1)
    print (begin_date)
    end_date = month - MonthEnd(0)
    print(end_date)
    print(str(month)[:7].replace('-', ''), end=':')
    print(str(begin_date).split(' ')[0], end=':')
    print(str(end_date).split(' ')[0])

2016-01-31 00:00:00
2016-01-01 00:00:00
2016-01-31 00:00:00
201601:2016-01-01:2016-01-31
2016-02-29 00:00:00
2016-02-01 00:00:00
2016-02-29 00:00:00
201602:2016-02-01:2016-02-29
2016-03-31 00:00:00
2016-03-01 00:00:00
2016-03-31 00:00:00
201603:2016-03-01:2016-03-31


In [13]:
import psycopg2

In [29]:
import pandas as pd 
from pandas.tseries.offsets import MonthBegin,MonthEnd

months = pd.date_range(start = '1/1/2016',end = '12/31/2020', freq = '1M')
#postgresql://shubham_sms_user:shubham@172.25.87.65:5432/shubham_sms_db
connection = psycopg2.connect(
    host = '172.25.87.65',
    port = '5432',
    database = 'shubham_sms_db',
    user = 'shubham_sms_user',
    password = 'shubham'
)


cursor = connection.cursor()
table_name = 'shubham.users_range_part'
query = '''
CREATE TABLE {table_name}_{yyyymm}
PARTITION OF {table_name}
FOR VALUES FROM('{begin_date}') TO ('{end_date}')'''


for month in months:
    print(month)
    begin_date = month - MonthBegin(1)
    end_date = month - MonthEnd(0)
    print(f'Adding partition for {begin_date} and {end_date}')
    cursor.execute(
        query.format(
            table_name = table_name,
            yyyymm= str(month)[:7].replace('-',''),
            begin_date = str(begin_date).split(' ')[0],
            end_date = str(end_date).split(' ')[0]
        ),()
    )
connection.commit()
cursor.close()
connection.close()

2016-01-31 00:00:00
Adding partition for 2016-01-01 00:00:00 and 2016-01-31 00:00:00
2016-02-29 00:00:00
Adding partition for 2016-02-01 00:00:00 and 2016-02-29 00:00:00
2016-03-31 00:00:00
Adding partition for 2016-03-01 00:00:00 and 2016-03-31 00:00:00
2016-04-30 00:00:00
Adding partition for 2016-04-01 00:00:00 and 2016-04-30 00:00:00
2016-05-31 00:00:00
Adding partition for 2016-05-01 00:00:00 and 2016-05-31 00:00:00
2016-06-30 00:00:00
Adding partition for 2016-06-01 00:00:00 and 2016-06-30 00:00:00
2016-07-31 00:00:00
Adding partition for 2016-07-01 00:00:00 and 2016-07-31 00:00:00
2016-08-31 00:00:00
Adding partition for 2016-08-01 00:00:00 and 2016-08-31 00:00:00
2016-09-30 00:00:00
Adding partition for 2016-09-01 00:00:00 and 2016-09-30 00:00:00
2016-10-31 00:00:00
Adding partition for 2016-10-01 00:00:00 and 2016-10-31 00:00:00
2016-11-30 00:00:00
Adding partition for 2016-11-01 00:00:00 and 2016-11-30 00:00:00
2016-12-31 00:00:00
Adding partition for 2016-12-01 00:00:00 and 

In [31]:
%%sql

INSERT INTO shubham.users_range_part
SELECT * FROM shubham.users_range_part_2016

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
0 rows affected.


[]

In [32]:
%%sql

INSERT INTO shubham.users_range_part
SELECT * FROM shubham.users_range_part_2017

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
1 rows affected.


[]

In [33]:
%%sql

INSERT INTO shubham.users_range_part
SELECT * FROM shubham.users_range_part_2018

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
1 rows affected.


[]

In [34]:
%%sql

INSERT INTO shubham.users_range_part
SELECT * FROM shubham.users_range_part_2019

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
1 rows affected.


[]

In [35]:
%%sql

INSERT INTO shubham.users_range_part
SELECT * FROM shubham.users_range_part_2020

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
0 rows affected.


[]

In [36]:
%%sql

SELECT * FROM shubham.users_range_part

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
3 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
3,Mickey,Mouse,mickey@mouse.com,False,,U,False,2017-06-22,2023-02-18 16:32:53.447217
1,Scott,Tiger,scott@tiger.com,False,,U,False,2018-10-01,2023-02-18 16:32:53.447217
2,Donald,Duck,donald@duck.com,False,,U,False,2019-02-10,2023-02-18 16:32:53.447217


In [37]:
%%sql

SELECT * FROM shubham.users_range_part_201706

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
1 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
3,Mickey,Mouse,mickey@mouse.com,False,,U,False,2017-06-22,2023-02-18 16:32:53.447217


In [38]:
%%sql

SELECT * FROM shubham.users_range_part_201810

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
1 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
1,Scott,Tiger,scott@tiger.com,False,,U,False,2018-10-01,2023-02-18 16:32:53.447217


In [39]:
%%sql

SELECT * FROM shubham.users_range_part_201902

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
1 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
2,Donald,Duck,donald@duck.com,False,,U,False,2019-02-10,2023-02-18 16:32:53.447217


- As we are able to see the data in the monthly partitioned table, we can drop the tables which are created earlier using yearly partitioning strategy.

In [41]:
%%sql

DROP TABLE shubham.users_range_part_2016;
DROP TABLE shubham.users_range_part_2017;
DROP TABLE shubham.users_range_part_2018;
DROP TABLE shubham.users_range_part_2019;
DROP TABLE shubham.users_range_part_2020;


 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.
Done.
Done.
Done.
Done.


[]

In [42]:
%%sql

SELECT table_catalog, 
    table_schema, 
    table_name FROM information_schema.tables
WHERE table_name ~ 'users_range_part_'
ORDER BY table_name

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
61 rows affected.


table_catalog,table_schema,table_name
shubham_sms_db,shubham,users_range_part_201601
shubham_sms_db,shubham,users_range_part_201602
shubham_sms_db,shubham,users_range_part_201603
shubham_sms_db,shubham,users_range_part_201604
shubham_sms_db,shubham,users_range_part_201605
shubham_sms_db,shubham,users_range_part_201606
shubham_sms_db,shubham,users_range_part_201607
shubham_sms_db,shubham,users_range_part_201608
shubham_sms_db,shubham,users_range_part_201609
shubham_sms_db,shubham,users_range_part_201610


# Hash Partitioning

Let us understand how we can take care of Hash partitioning of tables.

- It is primarily used to create partitions based up on modulus and reminder.
- Here are the steps involved in creating table using hash partitioning strategy.
    - Create table using <font color= 'pink'>PARTITION BY HASH</font>
    - Add default and remainder specific partitions based up on modulus.
    - Validate by inserting data into the table
- We can detach as well as drop the partitions from the table.
- Hash partitioning is typically done on sparse columns such as <font color = 'pink' >user_id</font>.
- If we want to use hash partitioning on more than one tables with common key, we typically partition all the tables using same key.


## Create Partitioned Table

Let us create partitioned table with name <font color= 'pink'>users_hash_part</font>.

- It contains same columns as <font color= 'pink'>users</font>.
- We will partition the table based up on <font color= 'pink'>user_id</font> field.
- We will create one partition for each reminder with modulus 8.

In [45]:
%sql DROP TABLE IF EXISTS users_hash_part

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [47]:
%%sql

CREATE TABLE shubham.users_hash_part (
    user_id SERIAL,
    user_first_name VARCHAR(30) NOT NULL,
    user_last_name VARCHAR(30) NOT NULL,
    user_email_id VARCHAR(50) NOT NULL,
    user_email_validated BOOLEAN DEFAULT FALSE,
    user_password VARCHAR(200),
    user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A
    is_active BOOLEAN DEFAULT FALSE,
    created_dt DATE DEFAULT CURRENT_DATE,
    last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id)
) PARTITION BY HASH(user_id)

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

- We will not be able to insert the data until we add at least one partition.

## Managing Partitions - Hash

Let us understand how to manage partitions using table <font color = 'pink'>users_hash_part</font> which is partitioned using **hash**.

- We would like to divide our data into 8 hash buckets.
- While adding partitions for **hash partitioned table**, we need to specify modulus and remainder.
- For each and every record inserted, following will happen for the column specified as partitioned key.
    - A hash will be computed. Hash is nothing but an integer.
    - The integer generated will be divided by the value specified in **modulus**.
    - Based up on the remainder, the record will be inserted into corresponding partition.

- We cannot have a default partition for hash partitioned table.

In [53]:
%%sql

CREATE TABLE shubham.users_hash_part_default
PARTITION OF shubham.users_hash_part DEFAULT

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
(psycopg2.errors.InvalidTableDefinition) a hash-partitioned table may not have a default partition

[SQL: CREATE TABLE shubham.users_hash_part_default PARTITION OF shubham.users_hash_part DEFAULT]
(Background on this error at: https://sqlalche.me/e/14/f405)


- Let us add partitions using modulus as 8. For each remainder between 0 to 7. we need to add a partition.

In [55]:
%%sql 

create table shubham.users_hash_part_0_to_8
partition of shubham.users_hash_part
for values with (modulus 8, remainder 0)

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [56]:
%%sql 

create table shubham.users_hash_part_1_to_8
partition of shubham.users_hash_part
for values with (modulus 8, remainder 1);


create table shubham.users_hash_part_2_to_8
partition of shubham.users_hash_part
for values with (modulus 8, remainder 2);

create table shubham.users_hash_part_3_to_8
partition of shubham.users_hash_part
for values with (modulus 8, remainder 3);

create table shubham.users_hash_part_4_to_8
partition of shubham.users_hash_part
for values with (modulus 8, remainder 4);

create table shubham.users_hash_part_5_to_8
partition of shubham.users_hash_part
for values with (modulus 8, remainder 5);

create table shubham.users_hash_part_6_to_8
partition of shubham.users_hash_part
for values with (modulus 8, remainder 6);

create table shubham.users_hash_part_7_to_8
partition of shubham.users_hash_part
for values with (modulus 8, remainder 7);



 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.
Done.
Done.
Done.
Done.
Done.
Done.


[]

In [57]:
%%sql

INSERT INTO shubham.users_hash_part
    (user_first_name, user_last_name, user_email_id, created_dt)
VALUES 
    ('Scott', 'Tiger', 'scott@tiger.com', '2018-10-01'),
    ('Donald', 'Duck', 'donald@duck.com', '2019-02-10'),
    ('Mickey', 'Mouse', 'mickey@mouse.com', '2017-06-22')

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
3 rows affected.


[]

- **user_id** is populated by sequence.The hash of every sequence generated integer will be divided by modulus (which is 8) and based up on the remainder data will be inserted into corresponding partition.

In [58]:
%%sql

SELECT * FROM shubham.users_hash_part

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
3 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
1,Scott,Tiger,scott@tiger.com,False,,U,False,2018-10-01,2023-02-26 13:00:10.185976
3,Mickey,Mouse,mickey@mouse.com,False,,U,False,2017-06-22,2023-02-26 13:00:10.185976
2,Donald,Duck,donald@duck.com,False,,U,False,2019-02-10,2023-02-26 13:00:10.185976


In [61]:
%%sql

SELECT * FROM shubham.users_hash_part_0_to_8

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
1 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,created_dt,last_updated_ts
1,Scott,Tiger,scott@tiger.com,False,,U,False,2018-10-01,2023-02-26 13:00:10.185976


## Usage Scenarios

Let us go through some of the usage scenarios with respect to partitioning.

- It is typically used to manage large tables so that the tables does not grow abnormally over a period of time.
- Partitioning is quite often used on top of log tables, reporting tables etc.
- If a log table is partitioned and if we want to have data for 7 years, partitions older than 7 years can be quickly dropped.
- Dropping partitions to clean up huge chunk of data is much faster compared to running delete command on non partitioned table.
- For tables like orders with limited set of statuses, we often use list partitioning based up on the status. It can be 2 partitions (CLOSED orders and ACTIVE orders) or separate partition for each status.
    - As most of the operations will be on **Active Orders**, this approach can significantly improve the performance.
- In case of log tables, where we might want to retain data for several years, we tend to use range partition on date column. If we use list partition, then we might end up in duplication of data unnecessarily.

# Sub Partitioning

We can have sub partitions created with different permutations and combinations. Sub Partitioning is also known as nested partitioning.

- List - List

- List - Range and others.


- Try different sub-partitioning strategies based up on your requirements.

### List - List Partitioning

Let us understand how we can create table using list - list sub partitioning. We would like to have main partition per year and then sub partitions per quarter.

   - Create table <font color = 'pink'>users_qtly</font> with <font color = 'pink'> PARTITION BY LIST</font> with <font color = 'pink' >created_year</font>.
   - Create tables for yearly partitions with <font color = 'pink'>PARTITION BY LIST</font> with <font color = 'pink'>created_month</font>.
   - Create tables for quarterly partitions with list of values using <font color = 'pink'>FOR VALUES IN</font>.

In [64]:
%%sql

DROP TABLE IF EXISTS shubham.users_qtly

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [65]:
%%sql

CREATE TABLE shubham.users_qtly (
    user_id SERIAL,
    user_first_name VARCHAR(30) NOT NULL,
    user_last_name VARCHAR(30) NOT NULL,
    user_email_id VARCHAR(50) NOT NULL,
    user_email_validated BOOLEAN DEFAULT FALSE,
    user_password VARCHAR(200),
    user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A
    is_active BOOLEAN DEFAULT FALSE,
    created_dt DATE DEFAULT CURRENT_DATE,
    created_year INT,
    created_mnth INT,
    last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (created_year, created_mnth, user_id)
) PARTITION BY LIST(created_year)

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [66]:
%%sql

CREATE TABLE shubham.users_qtly_2016
PARTITION OF shubham.users_qtly
FOR VALUES IN (2016)
PARTITION BY LIST (created_mnth)

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [67]:
%%sql

CREATE TABLE shubham.users_qtly_2016q1
PARTITION OF shubham.users_qtly_2016
FOR VALUES IN (1, 2, 3)

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [68]:
%%sql

CREATE TABLE shubham.users_qtly_2016q2
PARTITION OF shubham.users_qtly_2016
FOR VALUES IN (4, 5, 6)

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

### List - Range Partitioning

Let us understand how we can create table using list - Range sub partitioning using same example as before (partitioning by year and then by quarter).

   - Create table with <font color = 'pink'>PARTITION BY LIST</font> with <font color = 'pink'>created_year</font>
   - Create tables for yearly partitions with <font color = 'pink'>PARTITION BY RANGE</font> with <font color = 'pink'>created_month</font>.
   - Create tables for quarterly partitions with the range of values using <font color = 'pink'>FOR VALUES FROM (lower_bound) TO (upper_bound)</font>.



In [69]:
%%sql

DROP TABLE IF EXISTS shubham.users_qtly

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [70]:
%%sql

CREATE TABLE shubham.users_qtly (
    user_id SERIAL,
    user_first_name VARCHAR(30) NOT NULL,
    user_last_name VARCHAR(30) NOT NULL,
    user_email_id VARCHAR(50) NOT NULL,
    user_email_validated BOOLEAN DEFAULT FALSE,
    user_password VARCHAR(200),
    user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A
    is_active BOOLEAN DEFAULT FALSE,
    created_dt DATE DEFAULT CURRENT_DATE,
    created_year INT,
    created_mnth INT,
    last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (created_year, created_mnth, user_id)
) PARTITION BY LIST(created_year)

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [71]:
%%sql

CREATE TABLE shubham.users_qtly_2016
PARTITION OF shubham.users_qtly
FOR VALUES IN (2016)
PARTITION BY RANGE (created_mnth)

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [72]:
%%sql

CREATE TABLE shubham.users_qtly_2016q1
PARTITION OF shubham.users_qtly_2016
FOR VALUES FROM (1) TO (3)

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]

In [73]:
%%sql

CREATE TABLE shubham.users_qtly_2016q2
PARTITION OF shubham.users_qtly_2016
FOR VALUES FROM (4) TO (6)

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
Done.


[]