Skip to content

mariolazzari/adv-postgres-tuning

Repository files navigation

Advanced SQL for Query Tuning and Performance Optimization

How to SQL executes queries

From declarative SQL to a procedural execution plan

  • Declarative: specify what you want, not how to get it
  • Declarative statements create execution plans

Scanning tables and indexes

  • Cost: number of scanned rows
  • Index reduces scanned rows
  • Index are ordered

Index types

  • B-tree: used for equality and range queries
  • Hash: used for equality
  • Bitmap: used for inclusion
  • Specialized: geo-spatial or custom

Joinng tables

  • Nested loop join: compare all rows in both tables
  • Hash join: compute hash value of key and join
  • Sort merge join: sort both table and join

Partitioning data

  • Sort data in multiple sub-tables, known as partitions
  • Improve queries

Tools for tuning

Usign PostgreSQL

select * from staff;

Explain and analyze

SELECT * from staff

EXPLAIN SELECT * FROM STAFF
EXPLAIN ANALYZE select * from staff
EXPLAIN ANALYZE select last_name from staff

Where clause

SELECT * FROM staff WHERE salary > 75000

EXPLAIN SELECT * FROM staff where salary > 75000
EXPLAIN SELECT * FROM staff

EXPLAIN ANALYZE SELECT * FROM staff where salary > 75000

Indexes

CREATE INDEX idx_staff_salary ON staff(salary)

EXPLAIN SELECT * FROM staff
EXPLAIN ANALYZE SELECT * FROM staff where salary > 75000
EXPLAIN ANALYZE SELECT * FROM staff where salary > 150000

Types of index

Indexing

  • Speed up data access
  • Enforce constraints
  • Ordered
  • Smaller than tables
  • Reduce table scans
  • Duplicates data
  • Different organization than table

Index type

  • B-tree
  • Bitmap
  • Hash
  • Special purpose

B-tree index

  • Balanced tree
  • Most common
  • High cardinality
  • Time based on tree depth

B-tree index plan

select count(*) from staff
select * from staff where email = 'bphillips5@time.com'

explain select * from staff where email = 'bphillips5@time.com'
create index idx_staff_email on staff(email);
explain select * from staff where email = 'bphillips5@time.com'

drop index idx_staff_email;

Bitmap index

  • Boolean operations
  • Small number of possible values in a column
  • Time based on bitwise operation to perform

Bitmap index execution plan

select distinct job_title from staff order by job_title
select * from staff where job_title = 'operator'
create index idx_staff_job_title on staff(job_title);
explain select * from staff where job_title = 'operator'

Hash index

  • Maps data length to fixed string
  • Virtually unique
  • Input changes produce new hash
  • Only for equality
  • Smaller than B-tree
  • As fast as B-tree

Hash index execution plan

create index idx_staff_email on staff using hash (email)
explain select * from staff
where email = 'bphillips5@time.com'

Bloom filter indexes

  • Probabilistic and space efficient
  • Lossy reppresentation
  • False positive
  • Arbitrary combinations
  • B-tree is faster but bigger

Specilized indexes

  • GIS: generalize search tree
  • SP_GIS: space partitioned gis
  • GIN: text indexing
  • BRIN: block range index

Tuning joins

Types of joins

  • Inner: all matching rows from both tables
  • Left outer: all rows from left table and matching ones from right
  • Right outer: all rows from right table and matching ones from left
  • Full outer: all rows from both tables

Nested loops

  • Works for all types of join
  • 2 loops
  • Outer loop: driver (it runs once)
  • Inner loop: join (it runs for each row)

Nested loop plan

SELECT
  s.id, s.last_name, s.job_title, cr.country
FROM
   staff s
INNER JOIN
   company_regions cr
ON
   s.region_id = cr.region_id

EXPLAIN SELECT
  s.id, s.last_name, s.job_title, cr.country
FROM
   staff s
INNER JOIN
   company_regions cr
ON
   s.region_id = cr.region_id

set enable_nestloop=true;
set enable_hashjoin=false;
set enable_mergejoin=false;

SELECT
  s.id, s.last_name, s.job_title, cr.country
FROM
   staff s
INNER JOIN
   company_regions cr
ON
   s.region_id = cr.region_id

EXPLAIN SELECT
  s.id, s.last_name, s.job_title, cr.country
FROM
   staff s
INNER JOIN
   company_regions cr
ON
   s.region_id = cr.region_id

Hash joins

  • Function that creates data for mapping data
  • Can act as an index for fetching that data
  • Virtually unique
  • It uses the smaller table and it stores its values
  • Equality only
  • Time based on table size
  • Fast lookup

Hash join plan

set enable_nestloop=false;
set enable_hashjoin=true;
set enable_mergejoin=false;

EXPLAIN SELECT
  s.id, s.last_name, s.job_title, cr.country
FROM
   staff s
INNER JOIN
   company_regions cr
ON
   s.region_id = cr.region_id

Merge join

  • Sort merge
  • First step i sorting both tables
  • Sort reduces the number of checks
  • Equality only
  • Time based on table size
  • Large tables join

Merge join plan

set enable_nestloop=false;
set enable_hashjoin=false;
set enable_mergejoin=true;

EXPLAIN SELECT
  s.id, s.last_name, s.job_title, cr.country
FROM
   staff s
INNER JOIN
   company_regions cr
ON
   s.region_id = cr.region_id

Sub-queries vs joins

SELECT s.id, s.last_name, s.department,
 (SELECT 
 company_regions
 FROM
 company_regions cr)
 WHERE
 cr.region_id = s.region_staff s
  • Same logical outcome
  • More than one way to express same thing
  • Join more efficient
  • Choose more clarity one in code

Partitioning Data

Horizontal vs Verical

Horizontal partitioning

  • Large tables could lead to bad query performance
  • Split tables by rows into partitions
  • Treat each partition like a table
  • Limit scan on subsets
  • Local index for each partition
  • Efficient add and delete operations
  • Data warehouse
  • Timeseries
  • Naturally driven

Vertical patitioning

  • Separates columns into multiple tables
  • Keep frequently queried columns togheter
  • Same primary key
  • More rows for data block
  • Global indexes
  • Reduce I/O
  • Data analytics
  • Tech data

Range partition

  • Horizontal partitioning
  • Partition on non overlapping keys
  • Partition by dates
  • Numeric range
  • Alphabetic range
  • Partition key determines which partition
  • Min and max value for each partition
  • Each partition has its own constraints
  • Queries latest data
  • Comparative queries
  • Report with ranges

Range partition example

CREATE TABLE iot_measurement
( location_id int not null,
measure_date timestamp not null,
temp_celcius int,
rel_humidity_pct int)
PARTITION BY RANGE (measure_date);


CREATE TABLE iot_measurement_wk1_2024 PARTITION OF iot_measurement
FOR VALUES FROM ('2024-01-01') TO ('2024-01-08');


CREATE TABLE iot_measurement_wk2_2024 PARTITION OF iot_measurement
FOR VALUES FROM ('2024-01-08') TO ('2024-01-15');


CREATE TABLE iot_measurement_wk3_2024 PARTITION OF iot_measurement
FOR VALUES FROM ('2024-01-15') TO ('2024-01-22');

List partitioning

  • Horizontal partitioning
  • On non overlapping keys
  • On list of values
  • Partition key determines which partition
  • Partition bounds determines the portion on values
  • Each partition has its own constraints

List partitioning example

create table products
 (prod_id int not null,
  prod_name text not null,
  prod_descr text not null,
  prod_category text)
partition by list (prod_category);

create table product_clothing partition of products
 for values in (‘casual_clothing’, ‘business_attire’, ‘formal_clothing);

create table product_electronics partition of products
 for values in (‘mobile_phones’, ‘tablets’, ‘laptop_computers’);

create table product_kitches partition of products
 for values in (‘food_processor, ‘cutlery’, ‘blenders’);

Partition by hash

  • Horizontal partitioning type
  • Hash as partition key

Partition by hash example

create table customer_interaction
 (ci_id int not null,
  ci_url text not null,
  time_at_url int not null,
  click_sequence int not null)
partition by hash(ci_id);

create table customer_interactions_1 partition of customer_interaction
   for values with (modulus 5 remainder 0);

create table customer_interactions_2 partition of customer_interaction
   for values with (modulus 5 remainder 1);

create table customer_interactions_3 partition of customer_interaction
   for values with (modulus 5 remainder 2);

create table customer_interactions_4 partition of customer_interaction
   for values with (modulus 5 remainder 3);

create table customer_interactions_5 partition of customer_interaction
   for values with (modulus 5 remainder 4);

Materialized views

Materialized view

  • Pre computed queries
  • Join and store results
  • Apply other operations
  • Duplicates data
  • Potential inconsistency

Materialized view example

create materialized view mv_staff as
   select
      s.last_name, s.department, s.job_title,
      cr.company_regions
 from 
   staff s
inner join
  company_regioins cr
on
   s.region_id = cr.region_id

Refreshing materialized views

select * from mv_staff
refresh materialized view mv_staff;

Database statistics

Collect statistics

REINDEX INDEX INDEX_NAME
REINDEX TABLE TABLE_NAME
REINDEX SCHEMA SCHEMA_NAME

Analyzing statistics

pg_stat_statements

Review statistics

auto_explain.min_duration
auto_explain.log_nested_statements

Additional analysis

SELECT * from PG_STATS;

Optimization technoques

Common Table Expressions

  • Computed once (default)
  • Muliple teimes in line
with cte items AS (...)
select ... FROM items WHERE ...

with cte items AS MATERIALIZED (...)
select ... FROM items WHERE ...

Hints to query optimizer

  • Suggestion to quesry builder
SET command
SET enabled_nestloop=on

Parallel queries

  • Optimizer detects parallel queries
  • Executes them in parallel
  • Gathers results
  • May be less efficient
  • B-trees only
GATHER
GATHER MERGE

Improving cache

  • Stores results in order to reuse them
  • Explain plan: hits from cache, read from disk
  • Shared memoru buffer: 128Mb by default

Helpful tips

About

Advanced SQL for Query Tuning and Performance Optimization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published