Advanced SQL for Query Tuning and Performance Optimization

How to SQL executes queries

From declarative SQL to a procedural execution plan

Declarative: specify what you want, not how to get it
Declarative statements create execution plans

Scanning tables and indexes

Cost: number of scanned rows
Index reduces scanned rows
Index are ordered

Index types

B-tree: used for equality and range queries
Hash: used for equality
Bitmap: used for inclusion
Specialized: geo-spatial or custom

Joinng tables

Nested loop join: compare all rows in both tables
Hash join: compute hash value of key and join
Sort merge join: sort both table and join

Partitioning data

Sort data in multiple sub-tables, known as partitions
Improve queries

Tools for tuning

Usign PostgreSQL

select * from staff;

Explain and analyze

SELECT * from staff

EXPLAIN SELECT * FROM STAFF
EXPLAIN ANALYZE select * from staff
EXPLAIN ANALYZE select last_name from staff

Where clause

SELECT * FROM staff WHERE salary > 75000

EXPLAIN SELECT * FROM staff where salary > 75000
EXPLAIN SELECT * FROM staff

EXPLAIN ANALYZE SELECT * FROM staff where salary > 75000

Indexes

CREATE INDEX idx_staff_salary ON staff(salary)

EXPLAIN SELECT * FROM staff
EXPLAIN ANALYZE SELECT * FROM staff where salary > 75000
EXPLAIN ANALYZE SELECT * FROM staff where salary > 150000

Types of index

Indexing

Speed up data access
Enforce constraints
Ordered
Smaller than tables
Reduce table scans
Duplicates data
Different organization than table

Index type

B-tree
Bitmap
Hash
Special purpose

B-tree index

Balanced tree
Most common
High cardinality
Time based on tree depth

B-tree index plan

select count(*) from staff
select * from staff where email = 'bphillips5@time.com'

explain select * from staff where email = 'bphillips5@time.com'
create index idx_staff_email on staff(email);
explain select * from staff where email = 'bphillips5@time.com'

drop index idx_staff_email;

Bitmap index

Boolean operations
Small number of possible values in a column
Time based on bitwise operation to perform

Bitmap index execution plan

select distinct job_title from staff order by job_title
select * from staff where job_title = 'operator'
create index idx_staff_job_title on staff(job_title);
explain select * from staff where job_title = 'operator'

Hash index

Maps data length to fixed string
Virtually unique
Input changes produce new hash
Only for equality
Smaller than B-tree
As fast as B-tree

Hash index execution plan

create index idx_staff_email on staff using hash (email)
explain select * from staff
where email = 'bphillips5@time.com'

Bloom filter indexes

Probabilistic and space efficient
Lossy reppresentation
False positive
Arbitrary combinations
B-tree is faster but bigger

Specilized indexes

GIS: generalize search tree
SP_GIS: space partitioned gis
GIN: text indexing
BRIN: block range index

Tuning joins

Types of joins

Inner: all matching rows from both tables
Left outer: all rows from left table and matching ones from right
Right outer: all rows from right table and matching ones from left
Full outer: all rows from both tables

Nested loops

Works for all types of join
2 loops
Outer loop: driver (it runs once)
Inner loop: join (it runs for each row)

Nested loop plan

SELECT
  s.id, s.last_name, s.job_title, cr.country
FROM
   staff s
INNER JOIN
   company_regions cr
ON
   s.region_id = cr.region_id

EXPLAIN SELECT
  s.id, s.last_name, s.job_title, cr.country
FROM
   staff s
INNER JOIN
   company_regions cr
ON
   s.region_id = cr.region_id

set enable_nestloop=true;
set enable_hashjoin=false;
set enable_mergejoin=false;

SELECT
  s.id, s.last_name, s.job_title, cr.country
FROM
   staff s
INNER JOIN
   company_regions cr
ON
   s.region_id = cr.region_id

EXPLAIN SELECT
  s.id, s.last_name, s.job_title, cr.country
FROM
   staff s
INNER JOIN
   company_regions cr
ON
   s.region_id = cr.region_id

Hash joins

Function that creates data for mapping data
Can act as an index for fetching that data
Virtually unique
It uses the smaller table and it stores its values
Equality only
Time based on table size
Fast lookup

Hash join plan

set enable_nestloop=false;
set enable_hashjoin=true;
set enable_mergejoin=false;

EXPLAIN SELECT
  s.id, s.last_name, s.job_title, cr.country
FROM
   staff s
INNER JOIN
   company_regions cr
ON
   s.region_id = cr.region_id

Merge join

Sort merge
First step i sorting both tables
Sort reduces the number of checks
Equality only
Time based on table size
Large tables join

Merge join plan

set enable_nestloop=false;
set enable_hashjoin=false;
set enable_mergejoin=true;

EXPLAIN SELECT
  s.id, s.last_name, s.job_title, cr.country
FROM
   staff s
INNER JOIN
   company_regions cr
ON
   s.region_id = cr.region_id

Sub-queries vs joins

SELECT s.id, s.last_name, s.department,
 (SELECT 
 company_regions
 FROM
 company_regions cr)
 WHERE
 cr.region_id = s.region_staff s

Same logical outcome
More than one way to express same thing
Join more efficient
Choose more clarity one in code

Partitioning Data

Horizontal vs Verical

Horizontal partitioning

Large tables could lead to bad query performance
Split tables by rows into partitions
Treat each partition like a table
Limit scan on subsets
Local index for each partition
Efficient add and delete operations
Data warehouse
Timeseries
Naturally driven

Vertical patitioning

Separates columns into multiple tables
Keep frequently queried columns togheter
Same primary key
More rows for data block
Global indexes
Reduce I/O
Data analytics
Tech data

Range partition

Horizontal partitioning
Partition on non overlapping keys
Partition by dates
Numeric range
Alphabetic range
Partition key determines which partition
Min and max value for each partition
Each partition has its own constraints
Queries latest data
Comparative queries
Report with ranges

Range partition example

CREATE TABLE iot_measurement
( location_id int not null,
measure_date timestamp not null,
temp_celcius int,
rel_humidity_pct int)
PARTITION BY RANGE (measure_date);


CREATE TABLE iot_measurement_wk1_2024 PARTITION OF iot_measurement
FOR VALUES FROM ('2024-01-01') TO ('2024-01-08');


CREATE TABLE iot_measurement_wk2_2024 PARTITION OF iot_measurement
FOR VALUES FROM ('2024-01-08') TO ('2024-01-15');


CREATE TABLE iot_measurement_wk3_2024 PARTITION OF iot_measurement
FOR VALUES FROM ('2024-01-15') TO ('2024-01-22');

List partitioning

Horizontal partitioning
On non overlapping keys
On list of values
Partition key determines which partition
Partition bounds determines the portion on values
Each partition has its own constraints

List partitioning example

create table products
 (prod_id int not null,
  prod_name text not null,
  prod_descr text not null,
  prod_category text)
partition by list (prod_category);

create table product_clothing partition of products
 for values in (‘casual_clothing’, ‘business_attire’, ‘formal_clothing);

create table product_electronics partition of products
 for values in (‘mobile_phones’, ‘tablets’, ‘laptop_computers’);

create table product_kitches partition of products
 for values in (‘food_processor, ‘cutlery’, ‘blenders’);

Partition by hash

Horizontal partitioning type
Hash as partition key

Partition by hash example

create table customer_interaction
 (ci_id int not null,
  ci_url text not null,
  time_at_url int not null,
  click_sequence int not null)
partition by hash(ci_id);

create table customer_interactions_1 partition of customer_interaction
   for values with (modulus 5 remainder 0);

create table customer_interactions_2 partition of customer_interaction
   for values with (modulus 5 remainder 1);

create table customer_interactions_3 partition of customer_interaction
   for values with (modulus 5 remainder 2);

create table customer_interactions_4 partition of customer_interaction
   for values with (modulus 5 remainder 3);

create table customer_interactions_5 partition of customer_interaction
   for values with (modulus 5 remainder 4);

Materialized views

Materialized view

Pre computed queries
Join and store results
Apply other operations
Duplicates data
Potential inconsistency

Materialized view example

create materialized view mv_staff as
   select
      s.last_name, s.department, s.job_title,
      cr.company_regions
 from 
   staff s
inner join
  company_regioins cr
on
   s.region_id = cr.region_id

Refreshing materialized views

select * from mv_staff
refresh materialized view mv_staff;

Database statistics

Collect statistics

REINDEX INDEX INDEX_NAME
REINDEX TABLE TABLE_NAME
REINDEX SCHEMA SCHEMA_NAME

Analyzing statistics

pg_stat_statements

Review statistics

auto_explain.min_duration
auto_explain.log_nested_statements

Additional analysis

SELECT * from PG_STATS;

Optimization technoques

Common Table Expressions

Computed once (default)
Muliple teimes in line

with cte items AS (...)
select ... FROM items WHERE ...

with cte items AS MATERIALIZED (...)
select ... FROM items WHERE ...

Hints to query optimizer

Suggestion to quesry builder

SET command
SET enabled_nestloop=on

Parallel queries

Optimizer detects parallel queries
Executes them in parallel
Gathers results
May be less efficient
B-trees only

GATHER
GATHER MERGE

Improving cache

Stores results in order to reuse them
Explain plan: hits from cache, read from disk
Shared memoru buffer: 128Mb by default

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
2_postgres		2_postgres
3_indexes		3_indexes
4_joins		4_joins
5_partitioning		5_partitioning
6_materialized_views		6_materialized_views
CertificateOfCompletion_Advanced SQL for Query Tuning and Performance Optimization.pdf		CertificateOfCompletion_Advanced SQL for Query Tuning and Performance Optimization.pdf
README.md		README.md

mariolazzari/adv-postgres-tuning

Folders and files

Latest commit

History

Repository files navigation

Advanced SQL for Query Tuning and Performance Optimization

How to SQL executes queries

From declarative SQL to a procedural execution plan

Scanning tables and indexes

Index types

Joinng tables

Partitioning data

Tools for tuning

Usign PostgreSQL

Explain and analyze

Where clause

Indexes

Types of index

Indexing

Index type

B-tree index

B-tree index plan

Bitmap index

Bitmap index execution plan

Hash index

Hash index execution plan

Bloom filter indexes

Specilized indexes

Tuning joins

Types of joins

Nested loops

Nested loop plan

Hash joins

Hash join plan

Merge join

Merge join plan

Sub-queries vs joins

Partitioning Data

Horizontal vs Verical

Horizontal partitioning

Vertical patitioning

Range partition

Range partition example

List partitioning

List partitioning example

Partition by hash

Partition by hash example

Materialized views

Materialized view

Materialized view example

Refreshing materialized views

Database statistics

Collect statistics

Analyzing statistics

Review statistics

Additional analysis

Optimization technoques

Common Table Expressions

Hints to query optimizer

Parallel queries

Improving cache

Helpful tips

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages