Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize "where 1=1" #2676

Open
numeralnathan opened this issue Jun 1, 2020 · 1 comment
Open

Optimize "where 1=1" #2676

numeralnathan opened this issue Jun 1, 2020 · 1 comment

Comments

@numeralnathan
Copy link

numeralnathan commented Jun 1, 2020

The following query runs very fast (2 milliseconds) because there is an index on START_DATE:

select
   min(TIME_CREATED)
from
   EMPLOYEE

The following query runs very slow (72 seconds) even though there is an index on START_DATE. The table has 75 million rows. I assume H2 is going through all of the rows. Please optimize this query to use the index and run in 2 milliseconds.

select
   min(START_DATE)
from
   EMPLOYEE
where
   1=1

The following query runs very slow as well even though there is an index on START_DATE. Please optimize this query to use the index and run in 2 milliseconds. H2 should be able to use the index and start working through rows that have the minimum START_DATE. H2 should then be able to execute the predicate and find a row very quickly (i.e. the first row examined in this case).

select
   min(START_DATE)
from
   EMPLOYEE
where
   END_DATE < date '2020-01-01'
@katzyn
Copy link
Contributor

katzyn commented Jun 2, 2020

  1. WHERE 1 = 1 and other TRUE conditions are removed from queries by current H2. You need to build H2 from its current sources if you want to have this optimization right now.

  2. SELECT MIN(COLUMN1) … WHERE COLUMN2 < something can't be optimized that easy. There is no guarantee that scanning over COLUMN1 will be faster, actually it can be slower. Optimizer should take a decision based on some properties of these conditions. It requires a lot of new code in very complex area of H2. If START_DATE can't be NULL you can use something like this instead:

SELECT START_DATE FROM EMPLOYEE
    USE INDEX ("name of ascending index on START_DATE column")
    WHERE END_DATE < DATE '2020-01-01'
    ORDER BY START_DATE FETCH FIRST ROW ONLY;
  1. Hypothetically we can try to automatically optimize only queries like that
SELECT MIN(START_DATE) FILTER (WHERE END_DATE < DATE '2020-01-01')
    FROM EMPLOYEE;

because in such query filter condition is expected to be less significant than other conditions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants