Index Intersection #1751

Open
Sysvision opened this Issue Oct 18, 2013 · 2 comments

Comments

Projects
None yet
4 participants

The basic idea behind index intersection/merge is that for certain types of queries which contain WHERE clauses with columns that have single-column indexes on them, the DBMS could make use of the multiple indexes. It's part of the query optimizer.

For example the following statement "SELECT val FROM table WHERE indexed_col1 = X AND/OR indexed_col2 = Y" could make use of both indexes and do a set-theoretic union (OR) or set-theoretic intersection (AND) on the results. This can significantly speed up some queries.

But this can also have some downsides. Let's say the table has 1m rows and we use the query above with "AND" and let's assume the index for "indexed_col1 = X" returns 800k rows but the index for "indexed_col2 = Y" returns only 1k rows. The index merge algorithm would need to intersect 800k rows with the other 1k rows, so basically drop 799k rows from the resultset, where it maybe would have been a lot faster using only indexed_col2 and do a "full scan" on the remaining 1k rows for "indexed_col1 = X".

Of course one would have created a composite index on (indexed_col1, indexed_col2) in this case in the first place. But let's assume we have a more complex query "WHERE (col1 = val1 OR col2 = val2) AND (col3 = val3 OR col4 = val4)" where all 4 columns are indexed and "(col1 = val1 OR col2 = val2)" is the one that returns 800k rows and "(col3 = val3 OR col4 = val4)" is the one that returns 1k rows. In this case it would be the best option to use two indexes (col3 and col4) and do a "full scan" on the remaining 1k rows for "(col1 = val1 OR col2 = val2)"

So the optimizer would need to have some advanced logic to not use some of the available indexes. I do not know if that's even possible. In MySQL there is something called index hints. For example adding "USE INDEX(index_name_for_col3, index_name_for_col4)" would tell the optimzer to only use the indexes for "col3" and "col4"

Owner

lvca commented Oct 19, 2013

@enisher we already partial support such optimization. What is missed so far ?

enisher was assigned Oct 19, 2013

@lvca lvca assigned luigidellaquila and unassigned enisher Aug 28, 2014

@lvca lvca modified the milestone: 2.0rc1, 2.1 Aug 28, 2014

Contributor

luigidellaquila commented Mar 12, 2015

we should add a concept like "query weight on index".
Eg. you have a query that could involve two indexes, the executor doesn't fetch all the results from both, but just asks every index for an estimate of how many records will be returned given a WHERE condition. With this information, the engine can choose if it's convenient using only one (and which one) or both.

@luigidellaquila luigidellaquila modified the milestone: 2.2, 2.1 Mar 12, 2015

@lvca lvca modified the milestone: 3.0, 2.2 Jul 10, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment