Analyzer rules should be split in groups, phases or priorities #141

erizocosmico · 2018-03-28T11:01:49Z

Right now, all analyzer rules have the same priority and they're executed in the order they're added to the analyzer. This makes it a bit difficult to set foreign rules (as gitquery will do) in a specific place to happen before some certain rules.

For example, let's imagine a rule to squash inner joins. It needs the tables resolved, so it needs to be after "resolve_tables". But before the pushdown, because after that it won't be transformed again.

Instead of knowing by heart and relying on a number that may change to insert the rule in a specific place we should implement a few phases.

For example:

type AnalyzerPhase int

const (
  // ResolutionPhase is the phase in which all unresolved nodes are resolved.
  ResolutionPhase AnalyzerPhase = iota
  // PostResolutionPhase is the phase in which all nodes are already resolved and they can 
  // be changed and rearranged with the certainty everything is resolved and before the
  // tree is optimized.
  PostResolutionPhase
  // OptimizationPhase is the phase in which optimizations to improve query performance
  // are applied.
  OptimizationPhase
)

Then we could insert rules in any of these phases knowing in which state the tree is.

Thoughts? @smola @jfontan @ajnavarro

smola · 2018-06-08T07:26:40Z

I agree. We should separate PreAnalysis, Analysis, PostAnalysis, Optimizations, PostOptimization and Planning, PostPlanning (or some subset). That should better match the theory and practice on how to implement a SQL engine.

The bad thing with the squash rule is that it is physical planning (supposedly to run after optimization) but it interacts with logical optimization. Fortunately, since we don't really have decoupled logical and physical plans, it could just run on PostAnalysis.

ajnavarro · 2018-06-08T08:15:09Z

Maybe we can do like on Spark, have a Batch type that allows you to execute a group of rules n times. This will allow us more options creating different phases, even for implementors of the library.

erizocosmico added the proposal proposal for new additions or changes label Mar 28, 2018

ajnavarro modified the milestone: Index-2 Jun 13, 2018

ajnavarro mentioned this issue Jun 14, 2018

Remove calls to UDFs whose index is being used #187

Open

ajnavarro added this to the Index-2 milestone Jun 14, 2018

ajnavarro assigned ajnavarro and unassigned ajnavarro Jun 14, 2018

ajnavarro mentioned this issue Jun 18, 2018

analyzer: execute rules in batches. #224

Merged

ajnavarro closed this as completed in #224 Jun 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyzer rules should be split in groups, phases or priorities #141

Analyzer rules should be split in groups, phases or priorities #141

erizocosmico commented Mar 28, 2018 •

edited

smola commented Jun 8, 2018

ajnavarro commented Jun 8, 2018

Analyzer rules should be split in groups, phases or priorities #141

Analyzer rules should be split in groups, phases or priorities #141

Comments

erizocosmico commented Mar 28, 2018 • edited

smola commented Jun 8, 2018

ajnavarro commented Jun 8, 2018

erizocosmico commented Mar 28, 2018 •

edited