-
Notifications
You must be signed in to change notification settings - Fork 2
evandrewry/main-memory-selection-query-optimizer
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
README - COMS W4112, Project 2 (Phase II)
In this phase of project 2, we implement a query optimizer for selection
conditions in main memory as described in algorithm 4.11 of Kenneth Ross's
Selection Conditions in Main Memory [Columbia University, 2004].
----------------------------SUMMARY-------------------------------
The optimization is based on avoiding branch mispredictions by optimizing
both the order of and the type of &-conjunction between the elements in a
set of input selection conditions. The decisions made by the optimizer are
based on the selectivities of each input condition as well as several
(user-specified) properties of the machine on which the query will be run
on. The types of &-conjunctions we choose between are:
Branching-And: This implementation uses the "&&" (boolean) operator
between two selection conditions and creates a branch in the compiled
code, which leaves us vulnerable to branch misprediction penalties
but saves work when the first term is very selective.
for(i=0; i < number_of_records; i++) {
if(f1(r1[i]) && ... && fk(rk[i])) {
answer[j++] = i;
}
}
Logical-And: This implementation uses the "&" (bitwise) operator
between two selection conditions and does not create a branch in
the compiled code. Because there is no branch created in the resulting
compiled code, there is no possibility of branch misprediction.
However, it can still perform poorly when the first term is very
selective, as we must consider the right operand even when the left
operand evaluates to false.
for(i = 0; i < number_of_records; i++) {
if(f1(r1[i]) & ... & fk(rk[i])) {
answer[j++] = i;
}
}
No-Branch: This implementation pulls the selection condition out of
the if-statement altogether, instead placing a series of "Logical-And"
("&") conjunctions directly in the body of the for-loop. Because there
is neither an if-statement nor any branching operators, there is no
branching at all in the resulting compiled code.
for(i = 0; i < number_of_records; i++) {
answer[j] = i;
j += (f1(r1[i]) & ... & fk(rk[i]));
}
----------------------------USAGE-------------------------------
The program can be compiled with the included Makefile.
$ make
Once compiled, the program can be run with the included shell script.
$ ./stage2.sh query.txt config.txt
The query file should consist of space-delimited lists of selectivities
separated by newlines, as follows:
0.8 0.5 0.3 0.2
0.2 0.1 0.9
0.6 0.75 0.8 1 0.9
0.8 0.8 0.9 0.7 0.7 0.7
In this example, the first line represents that f1's selectivity
p1 = 0.8, f2's selectivity p2 = 0.5, f3's selectivity p3 = 0.3,
and f4's selectivity p4 = 0.2. The second line is another case
wherein f1's selectivity p1 = 0.2, f2's selectivity p2 = 0.1, and
f3's selectivity p3 = 0.9. (fi is a selection function for a basic
term.) Each line is separated by newline, and each selectivity
value is separated by single white space. All values in this file
are 0 <= x <= 1.
The configuration file includes values of estimated costs. Those values
are estimated from CPU specification. The file format follows standard
java property convention.
r = 1
t = 2
l = 1
m = 16
a = 2
f = 4
All values shown in this file represents parameters for the cost
estimation explained in section 4.2. We assume that the costs of
applying function are equal for all functions fi, therefore "f"
represents all of them. "m" means the cost of a branch misprediction.
It corresponds to the length of stage pipeline.
The program will print results to STDOUT in the following format.
==================================================================
0.7 0.4 0.2 0.3 0.6
------------------------------------------------------------------
if((t1[o1[i]] & t2[o2[i]]) && t3[o3[i]]) {
answer[j] = i;
j += (t4[o4[i]] & t5[o5[i]]);
}
------------------------------------------------------------------
cost: 10.5
==================================================================
0.7 0.8 0.8 0.9
------------------------------------------------------------------
if(t1[o1[i]] && (t2[o2[i]] && (t3[o3[i]] && t4[o4[i]]))) {
answer[j++] = i;
}
------------------------------------------------------------------
cost: 28.3
==================================================================
-------------------------IMPLEMENTATION----------------------------
We chose to use Java to implement the optimization algorithm. We used a class
"QueryOptimizer" to represent an optimizer for a single query containing only
simple selection conditions. Multiple queries can be optimized concurrently by
instantiating multiple QueryOptimizers.
Another class, "QueryPlan" is an abstraction of the preliminary plans (or plan
subtrees) that we consider during the optimization. We use a bitmap to keep track
of the conditions present in each plan, and this format also allows us to
easily and efficiently calculate power sets, unions, and intersections (all of
which are required by the optimization algorithm.
Third, we refactored some of the static code into a utility class (stuff like
bitmap operations, output formatting, and configuration)
About
A query optimizer for selection conditions in main memory as described in algorithm 4.11 of Kenneth Ross's Selection Conditions in Main Memory [Columbia University, 2004].
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published