Skip to content

Commit 597fbab

Browse files
authored
Follow-up to SQL Hints rewrite (#279)
1 parent 855ae6a commit 597fbab

File tree

1 file changed

+25
-35
lines changed

1 file changed

+25
-35
lines changed

documentation/concept/sql-optimizer-hints.md

Lines changed: 25 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -48,23 +48,23 @@ past from the row that matches by timestamp. For this, we need a more
4848
sophisticated algorithm.
4949

5050
Our optimized algorithms assume the JOIN condition matches additional columns by
51-
equality. Basically, we have a join key that must match on both sides. An even
52-
narrower common case we optimize for is matching on a _symbol column_ on both
53-
sides, but many optimizations work for other key combinations as well.
51+
equality. Basically, there's a join key that must match on both sides. An even
52+
narrower common case we optimize more aggressively for is matching on a _symbol
53+
column_ on both sides.
5454

5555
We distinguish these two cases:
5656

5757
### 1. Localized matching
5858

5959
In this case, when scanning the right-hand table backward from the timestamp of
6060
the left-hand row, we find a match much sooner than reaching the timestamp of
61-
the previous left-hand row. We end up scanning only a small subset of right-hand
62-
rows. In the diagram, we show the scanned portions of the right-hand dataset in
63-
red.
61+
the previous left-hand row. We end up scanning only a small subset of the
62+
right-hand rows. In the diagram, we show the scanned portions of the right-hand
63+
dataset in red.
6464

65-
The best way to perform this join is to first locate the right-hand row that
66-
matches by timestamp (marked with the dotted line), then scan backward to find
67-
the row satisfying additional join conditions.
65+
The best way to perform this join is the straightforward one: first locate the
66+
right-hand row that matches by timestamp (marked with the dotted line), then
67+
scan backward to find the row satisfying additional join conditions.
6868

6969
<Screenshot
7070
alt="Diagram showing localized row matching"
@@ -75,13 +75,13 @@ width={300}
7575
### 2. Distant matching
7676

7777
In this case, the matching row is in the more distant past, earlier than the
78-
previous left-hand row. Now we must scan almost the entire right-hand dataset.
79-
If we do a separate scan for each left-hand row, we'll end up going over the
80-
same rows many times. In the diagram, this shows up as more intensely red
81-
regions in the right-hand table.
78+
previous left-hand row. The scanning ranges now ovelap, and we end up scanning
79+
almost the entire right-hand dataset. If we do a separate scan for each
80+
left-hand row, we'll end up going over the same rows many times. In the diagram,
81+
this shows up as more intensely red regions in the right-hand table.
8282

83-
The best way in this case is to just scan the entire red region once, collect
84-
the join keys in a hashtable, and match up with the left-hand rows as needed.
83+
The best way in this case is to scan the entire red region once, collect the
84+
join keys in a hashtable, and match up with the left-hand rows as needed.
8585

8686
<Screenshot
8787
alt="Diagram showing distant row matching"
@@ -107,14 +107,13 @@ be the best. It is the only one that allows QuestDB to use its parallelized
107107
filtering to quickly identify the filtered subset.
108108

109109
The default algorithm is _Fast_, and you can enable others through query hints.
110-
For a quick orientation, here's the decision tree:
111110

112111
### List of hints
113112

114113
### `asof_dense(l r)`
115114

116115
This hint enables the [Dense](#dense-algo) algorithm, the best choice (when it's
117-
available) in a variety of cases.
116+
available) for the case of distant row matching.
118117

119118
```questdb-sql title="Applying the query hint for the Dense algorithm"
120119
SELECT /*+ asof_dense(orders md) */
@@ -132,13 +131,14 @@ This hint applies to `LT` joins as well.
132131
:::
133132

134133
This enables the [Light](#light-algo) algorithm, similar to Dense but simpler.
135-
It has the pitfall of searching through all the history in the RHS table, but is
136-
more generic and available in some queries where the Dense algo isn't.
134+
It is more generic and selected automatically in queries where the Dense algo
135+
isn't applicable. Its downside is that it must scan the entire history in
136+
the RHS table, up to the most recent LHS timestamp.
137137

138-
Particularly, the light algo is at an advantage when the right-hand side is a
139-
subquery with a WHERE clause that is highly selective, passing through a small
140-
number of rows. QuestDB has parallelized filtering support, which cannot be used
141-
with the other algorithms.
138+
There's a case where the Light algo is at an advantage even when the Dense algo
139+
is also available: when the right-hand side is a subquery with a WHERE clause
140+
that is highly selective, passing through a small number of rows. QuestDB has
141+
parallelized filtering support, which cannot be used with the other algorithms.
142142

143143
```questdb-sql title="Applying the query hint for the Light algorithm"
144144
SELECT /*+ asof_linear(orders md) */
@@ -154,13 +154,8 @@ ASOF JOIN (
154154

155155
This hint enables [Memoized](#memoized-algo), a variant of the
156156
[Fast](#fast-algo) algorithm. It works for queries that join on a symbol column,
157-
as in `left ASOF JOIN right ON (symbol)`. It uses additional RAM to remember
158-
where it last saw a symbol in the right-hand table.
159-
160-
This hint will help you if many left-hand rows use a symbol that occurs rarely
161-
in the right-hand table, so that the same right-hand row matches several
162-
left-hand rows. It is especially helpful if some symbols occur way in the past,
163-
because it will search for each such symbol only once.
157+
as in `left ASOF JOIN right ON (symbol)`. It helps when there's a mix of
158+
localized and distant matches by reusing the results of earlier backward scans.
164159

165160
```questdb-sql title="Appling the query hint for the Memoized algorithm"
166161
SELECT /*+ asof_memoized(orders md) */
@@ -308,11 +303,6 @@ way, scanning backward to row 4. But when it encounters the same symbol A in row
308303
15, it scans backward only until reaching row 6, and then directly uses the
309304
remembered result of the previous scan, and matches up with row 4.
310305

311-
With Drive-By caching enabled, Memoized algo will memorize not just the symbol
312-
it's looking for, but also any other symbol. However, it can only memorize it on
313-
the first encounter. This is valuable for rare symbols that occur deep in the
314-
past, but otherwise it just introduces more overhead.
315-
316306
#### Dense algo
317307

318308
The Dense algo starts like the Fast algo, performing a binary search to zero in

0 commit comments

Comments
 (0)