Optimize join ordering in N-table joins where N > 2 #41

arthurhsu · 2015-05-12T00:08:12Z

Join ordering is a pretty heavy concept in query optimization. The available approaches can get very involved and maybe outside the scope of Lovefield.

As an example of different levels of addressing the issue consider the case of a 3 vs 4 table join.
A 3 table join graph has always the same chain structure as A -> B-> C, where edges in this graph represent a join condition between (leftTable, rightTable). On the other hand, In a 4 table join case the graph can have different structure (chain, cycle, star, clique etc), and therefore optimizing such graphs is a harder problem.

Need to find the extend to which Lovefield should address the join ordering optimization issue. There are simplifications that can be made, for example considering left-deep only trees.

At the very least, for the 3table join case, there should be no unnecessary cross-product operations, which with our current approach is not guaranteed (depends on the order of the tables as supplied in the query).

agershun · 2015-05-12T18:54:56Z

I think you can start with of cost function which depends on the number of estimated records in each table and the sequence of these table (and may be other parameters). One approach - simply model it with multiple samples runs (with different order of joined tables, and then process it with neural network or any other machine learning methods to esimate this function coefficients.

I suppose that this function will be different for Lovefield and other databases, because the cost of joining is different for different engines. If you would like we can compare database engine coefficients on the different sets of data.

The problem is interesting, I will try to model this task and send you results.

agershun · 2015-05-12T19:04:57Z

BTW You are talking about INNER JOINs only, right? Because other JOINs are non-commutative, AFAIK.

agershun · 2015-05-12T20:20:32Z

I tried to model the situation with four joined tables and compare results in direct and reversed order.
Please, see this test file

    SELECT * 
      FROM one
      INNER JOIN two ON one.b = two.b
      INNER JOIN three ON two.c = three.c
      INNER JOIN four ON three.d = four.d;

   SELECT * 
      FROM four 
      INNER JOIN three ON three.d = four.d
      INNER JOIN two ON two.c = three.c
      INNER JOIN one ON one.b = two.b;

The hypothesis was: direct order is faster if number of records in first two tables more than next two records. The probability of positive test was about 60%. It does not worth for special optimization (of course, for these paticular kind of joins).

Of course, there are many factors, which affects on the result, and preindexation is the first one.

arthurhsu added the performance label May 12, 2015

freshp86 mentioned this issue Oct 6, 2015

Optimize join ordering in N-table joins where N > 2. #4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize join ordering in N-table joins where N > 2 #41

Optimize join ordering in N-table joins where N > 2 #41

arthurhsu commented May 12, 2015

agershun commented May 12, 2015

agershun commented May 12, 2015

agershun commented May 12, 2015

Optimize join ordering in N-table joins where N > 2 #41

Optimize join ordering in N-table joins where N > 2 #41

Comments

arthurhsu commented May 12, 2015

agershun commented May 12, 2015

agershun commented May 12, 2015

agershun commented May 12, 2015