Skip to content
This repository has been archived by the owner on Jan 28, 2021. It is now read-only.

Analyzer rules should be split in groups, phases or priorities #141

Closed
erizocosmico opened this issue Mar 28, 2018 · 2 comments
Closed

Analyzer rules should be split in groups, phases or priorities #141

erizocosmico opened this issue Mar 28, 2018 · 2 comments
Assignees
Labels
proposal proposal for new additions or changes
Milestone

Comments

@erizocosmico
Copy link
Contributor

erizocosmico commented Mar 28, 2018

Right now, all analyzer rules have the same priority and they're executed in the order they're added to the analyzer. This makes it a bit difficult to set foreign rules (as gitquery will do) in a specific place to happen before some certain rules.

For example, let's imagine a rule to squash inner joins. It needs the tables resolved, so it needs to be after "resolve_tables". But before the pushdown, because after that it won't be transformed again.

Instead of knowing by heart and relying on a number that may change to insert the rule in a specific place we should implement a few phases.

For example:

type AnalyzerPhase int

const (
  // ResolutionPhase is the phase in which all unresolved nodes are resolved.
  ResolutionPhase AnalyzerPhase = iota
  // PostResolutionPhase is the phase in which all nodes are already resolved and they can 
  // be changed and rearranged with the certainty everything is resolved and before the
  // tree is optimized.
  PostResolutionPhase
  // OptimizationPhase is the phase in which optimizations to improve query performance
  // are applied.
  OptimizationPhase
)

Then we could insert rules in any of these phases knowing in which state the tree is.

Thoughts? @smola @jfontan @ajnavarro

@erizocosmico erizocosmico added the proposal proposal for new additions or changes label Mar 28, 2018
@smola
Copy link
Collaborator

smola commented Jun 8, 2018

I agree. We should separate PreAnalysis, Analysis, PostAnalysis, Optimizations, PostOptimization and Planning, PostPlanning (or some subset). That should better match the theory and practice on how to implement a SQL engine.

The bad thing with the squash rule is that it is physical planning (supposedly to run after optimization) but it interacts with logical optimization. Fortunately, since we don't really have decoupled logical and physical plans, it could just run on PostAnalysis.

@ajnavarro
Copy link
Contributor

Maybe we can do like on Spark, have a Batch type that allows you to execute a group of rules n times. This will allow us more options creating different phases, even for implementors of the library.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
proposal proposal for new additions or changes
Projects
None yet
Development

No branches or pull requests

3 participants