Range types are data types representing a range of values of some element type (called the range's subtype
). For example, ranges of timestamp might be used to represent the ranges of time that a meeting room is reserved. In this case the data type is tsrange
(short for “timestamp range”), and timestamp is the subtype. The subtype must have a total order so that it is well-defined whether element values are within, before, or after a range of values.
The aim of this project is to improve the overall scheme of statistics collection and cardinality estimation for range types in PostgreSQL.
So, project can be divided up into three parts:
- Statistics
- Selectivity Estimations
- Join Estimations
Note: We will be working with PostgreSQL 13 stable version for the development purpose.
In the current implementation, you have range_typeanalyze
function which is called whenever you do vacuum analyze
on some relation/table having a range type. You can find the function definition for range_typeanalyze
here.
This function sets few configurations/settings and sets a handle for computing range type stats
stats->compute_stats = compute_range_stats;
Here, we are setting compute_range_stats
function as a handle to be called later for computing stats.
When you will look into compute_range_stats
function (see here), you will notice that three different kinds of histograms are being calculated.
- Lower Bound Histogram
- Upper Bound Histogram
- Length based Histogram
And we store all these into the pg_statistics
for later usage.
Note: These stats are usually calculated once per million or so inserts. Or when you do
vacuum analyze
statement.
The first goal of our project is to determine which statistics should we calculate during the analysis phase. Do we need something extra for doing proper/better estimations for selectivity and joins for range type.
The second goal of our project is to implement selectivity estimation functions for the overlaps &&
and the strictly left of <<
predicates/operators.
The final and main goal of our project is to implement join cardinality estimation for the overlaps &&
predicate/operator for the range type.
Please refer to debugging guide for more details.
To run the benchmarks on different range_type
, follow the below mentioned steps:
- Clone this repo
git clone https://github.com/mohammadzainabbas/database-system-architecture-project.git
cd database-system-architecture-project
- Run the benchmark script
sh scripts/run_benchmark.sh -d test
Note: Replace
test
with the name of your database
Note: If you see an error
Binary 'psql' not found'
, run the following command and re-try:
echo 'export PATH="/usr/local/pgsql/bin:$PATH"' >> ~/.bashrc && source ~/.bashrc