@@ -48,23 +48,23 @@ past from the row that matches by timestamp. For this, we need a more
4848sophisticated algorithm.
4949
5050Our optimized algorithms assume the JOIN condition matches additional columns by
51- equality. Basically, we have a join key that must match on both sides. An even
52- narrower common case we optimize for is matching on a _ symbol column _ on both
53- sides, but many optimizations work for other key combinations as well .
51+ equality. Basically, there's a join key that must match on both sides. An even
52+ narrower common case we optimize more aggressively for is matching on a _ symbol
53+ column _ on both sides .
5454
5555We distinguish these two cases:
5656
5757### 1. Localized matching
5858
5959In this case, when scanning the right-hand table backward from the timestamp of
6060the left-hand row, we find a match much sooner than reaching the timestamp of
61- the previous left-hand row. We end up scanning only a small subset of right-hand
62- rows. In the diagram, we show the scanned portions of the right-hand dataset in
63- red.
61+ the previous left-hand row. We end up scanning only a small subset of the
62+ right-hand rows. In the diagram, we show the scanned portions of the right-hand
63+ dataset in red.
6464
65- The best way to perform this join is to first locate the right-hand row that
66- matches by timestamp (marked with the dotted line), then scan backward to find
67- the row satisfying additional join conditions.
65+ The best way to perform this join is the straightforward one: first locate the
66+ right-hand row that matches by timestamp (marked with the dotted line), then
67+ scan backward to find the row satisfying additional join conditions.
6868
6969<Screenshot
7070alt="Diagram showing localized row matching"
@@ -75,13 +75,13 @@ width={300}
7575### 2. Distant matching
7676
7777In this case, the matching row is in the more distant past, earlier than the
78- previous left-hand row. Now we must scan almost the entire right-hand dataset.
79- If we do a separate scan for each left-hand row, we'll end up going over the
80- same rows many times. In the diagram, this shows up as more intensely red
81- regions in the right-hand table.
78+ previous left-hand row. The scanning ranges now ovelap, and we end up scanning
79+ almost the entire right-hand dataset. If we do a separate scan for each
80+ left-hand row, we'll end up going over the same rows many times. In the diagram,
81+ this shows up as more intensely red regions in the right-hand table.
8282
83- The best way in this case is to just scan the entire red region once, collect
84- the join keys in a hashtable, and match up with the left-hand rows as needed.
83+ The best way in this case is to scan the entire red region once, collect the
84+ join keys in a hashtable, and match up with the left-hand rows as needed.
8585
8686<Screenshot
8787alt="Diagram showing distant row matching"
@@ -107,14 +107,13 @@ be the best. It is the only one that allows QuestDB to use its parallelized
107107filtering to quickly identify the filtered subset.
108108
109109The default algorithm is _ Fast_ , and you can enable others through query hints.
110- For a quick orientation, here's the decision tree:
111110
112111### List of hints
113112
114113### ` asof_dense(l r) `
115114
116115This hint enables the [ Dense] ( #dense-algo ) algorithm, the best choice (when it's
117- available) in a variety of cases .
116+ available) for the case of distant row matching .
118117
119118``` questdb-sql title="Applying the query hint for the Dense algorithm"
120119SELECT /*+ asof_dense(orders md) */
@@ -132,13 +131,14 @@ This hint applies to `LT` joins as well.
132131:::
133132
134133This enables the [ Light] ( #light-algo ) algorithm, similar to Dense but simpler.
135- It has the pitfall of searching through all the history in the RHS table, but is
136- more generic and available in some queries where the Dense algo isn't.
134+ It is more generic and selected automatically in queries where the Dense algo
135+ isn't applicable. Its downside is that it must scan the entire history in
136+ the RHS table, up to the most recent LHS timestamp.
137137
138- Particularly, the light algo is at an advantage when the right-hand side is a
139- subquery with a WHERE clause that is highly selective, passing through a small
140- number of rows. QuestDB has parallelized filtering support, which cannot be used
141- with the other algorithms.
138+ There's a case where the Light algo is at an advantage even when the Dense algo
139+ is also available: when the right-hand side is a subquery with a WHERE clause
140+ that is highly selective, passing through a small number of rows. QuestDB has
141+ parallelized filtering support, which cannot be used with the other algorithms.
142142
143143``` questdb-sql title="Applying the query hint for the Light algorithm"
144144SELECT /*+ asof_linear(orders md) */
@@ -154,13 +154,8 @@ ASOF JOIN (
154154
155155This hint enables [ Memoized] ( #memoized-algo ) , a variant of the
156156[ Fast] ( #fast-algo ) algorithm. It works for queries that join on a symbol column,
157- as in ` left ASOF JOIN right ON (symbol) ` . It uses additional RAM to remember
158- where it last saw a symbol in the right-hand table.
159-
160- This hint will help you if many left-hand rows use a symbol that occurs rarely
161- in the right-hand table, so that the same right-hand row matches several
162- left-hand rows. It is especially helpful if some symbols occur way in the past,
163- because it will search for each such symbol only once.
157+ as in ` left ASOF JOIN right ON (symbol) ` . It helps when there's a mix of
158+ localized and distant matches by reusing the results of earlier backward scans.
164159
165160``` questdb-sql title="Appling the query hint for the Memoized algorithm"
166161SELECT /*+ asof_memoized(orders md) */
@@ -308,11 +303,6 @@ way, scanning backward to row 4. But when it encounters the same symbol A in row
30830315, it scans backward only until reaching row 6, and then directly uses the
309304remembered result of the previous scan, and matches up with row 4.
310305
311- With Drive-By caching enabled, Memoized algo will memorize not just the symbol
312- it's looking for, but also any other symbol. However, it can only memorize it on
313- the first encounter. This is valuable for rare symbols that occur deep in the
314- past, but otherwise it just introduces more overhead.
315-
316306#### Dense algo
317307
318308The Dense algo starts like the Fast algo, performing a binary search to zero in
0 commit comments