Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Operator Printing #3664

Closed
andyfengHKU opened this issue Jun 18, 2024 · 2 comments · Fixed by #3973
Closed

Feature: Operator Printing #3664

andyfengHKU opened this issue Jun 18, 2024 · 2 comments · Fixed by #3973
Assignees
Labels
feature New features or missing components of existing features

Comments

@andyfengHKU
Copy link
Contributor

andyfengHKU commented Jun 18, 2024

Description

As we approach the benchmark stage, plan printing is being used much more frequent. We need to make the following change to plan printing.

Stage 1: Print static information

The first step is to print operator static information correctly and thoroughly, e.g. what table is being scanned and what columns are being scanned ...

A complete list is as follow

  • SimpleAggregate
    • aggregate expressions
  • SimpleAggregateScan
    • N/A
  • HashAggregate
    • hash keys, hash payloads, aggregate expressions
  • HashAggregateScan
    • N/A
  • Alter
    • alter information, e.g. rename, add column, ...
  • Attach
    • database name
  • CopyFrom
    • table name and source, e.g. file name, subquery, ...
  • CopyTo
    • table name to file name
  • CreateMacro
    • macro name
  • CreateSequence
    • sequence name
  • CreateTable
    • table name and possible config
  • CreateType
    • type name
  • CrossProduct
    • N/A
  • Delete
    • expressions and possible config
  • Detach
    • database name
  • Distinct
    • expressions
  • DropSequence
    • sequence name
  • DropTable
    • table name
  • DummyScan
    • N/A
  • EmptyResult
    • N/A
  • Explain
    • Profile/Explain
  • ExpressionsScan
    • expressions
  • Extension
    • action and extension name
  • ExportDatabase
    • N/A
  • Flatten
    • ideally we should print one expression to indicate which data chunk are we flattening
  • Filter
    • expression name
  • GDSCall
    • algorithm name
  • HashJoinBuild
    • key & payloads
  • HashJoinProbe
    • key
  • ImportDatabase
    • N/A
  • IndexLookup
    • index name (maybe table name)
  • Insert
    • expressions & action (e.g. ON CONFLICT DO NOTHING)
  • IntersectBuild
    • same as hash join build
  • Intersect
    • same as hash join probe
  • InstallExtension
    • extension name
  • Limit
    • number
  • LoadExtension
    • extension name
  • Merge
    • the pattern to merge
  • MultplicityReducer
    • N/A
  • OffsetScanNodeTable
    • N/A
  • Partitioner
  • PathPropertyProbe
    • path expression
  • PrimaryKeyScanNodeTable
    • path expression
  • Projection
    • expression names
  • Profile
    • N/A
  • RecursiveJoin
    • N/A
  • RenameProperty
    • from name to new name
  • RenameTable
    • same as above
  • ResultCollector
    • done
  • ScanNodeTable
    • table name, property names, zonemaps
  • ScanRelTable
    • table name, property names, direction, zonemaps
  • SemiMasker
    • scan operators that this semi mask is applying to
  • SetProperty
    • left = right
  • Skip
    • number
  • StandaloneCall
    • print cmd (ideally we want this to be a function name)
  • TableFunctionCall
    • function name
  • TopK
    • order by keys and payloads and K
  • TopKScan
    • N/A
  • Transaction
    • action
  • OrderBy
    • same as top K except for K
  • OrderByMerge
    • N/A
  • OrderByScan
    • N/A
  • UnionAllScan
    • N/A or expressions
  • Unwind
    • unwind expression as expression
  • UseDatabase
    • database name

Stage 2 Rendering

Rendering plan on shell is tricky when the plan becomes big. Plus it's not sufficient to just render it on the shell. We need a mechanism to render big plan on the web and in the explorer too. I don't have concrete road map for stage2 so @mewim should edit this part. One thing I'm fairly certain is that we need to first print plan to json format

  • print plan to json (try to reuse our json feature instead of relying on third party)

Stage 3 Print logical plan

Since we have printed physical plan already, there is nothing prevent us from printing logical plan either.

We want the logical plan to print with a cypher command Explain Logical. It will process and print similar to the plan printer for the physical plan so we can adapt this code to print logical plans as well. The logical operators will need print info structs to handle the list of information below.

As well, since both physical operators and logical operators will have their own printing structs, we no longer need getExpressionsForPrinting().

  • remove this function from every logical operator

A complete list for logical plan printing is as follows:

  • Accumulate
    • N/A
  • Aggregate
    • Keys, Aggregates
  • Alter
    • alter information, e.g. rename, add column, ...
  • Attach Database
    • database name
  • Copy From
    • table name and source, e.g. file name, subquery, ...
  • Copy To
    • table name to file name
  • Create Macro
    • macro name
  • Create Sequence
    • sequence name
  • Create Table
    • table name and possible config
  • Create Type
    • type name
  • Cross Product
    • N/A
  • Delete
    • expressions and possible config
  • Detach Database
    • database name
  • Distinct
    • expressions
  • Drop
    • drop type, sequence name/table name
  • Dummy Scan
    • N/A
  • Empty Result
    • N/A
  • Explain
    • N/A
  • Expressions Scan
    • expressions
  • Extend
    • table name, property names, direction, zonemaps
  • Extension
    • action and extension name
  • Export Database
    • N/A
  • Filter
    • expression name
  • Flatten
    • ideally we should print one expression to indicate which data chunk are we flattening
  • GDS Call
    • algorithm name
  • Hash Join
    • join conditions, join type
  • Import Database
    • N/A
  • Index Look Up
    • index name (maybe table name)
  • Intersect
    • Keys, payload
  • Insert
    • expressions & action (e.g. ON CONFLICT DO NOTHING)
  • Limit
    • limit and skip number
  • Mark Accumulate
    • keys, mark
  • Merge
    • the pattern to merge
  • Multiplicity Reducer
    • N/A
  • Node Label Filter
    • N/A
  • Order By
    • order by keys and payloads and K
  • Partitioner
  • Path Property Probe
    • path expression
  • Projection
    • expression names
  • Recursive Extend
    • Same as extend
  • Scan Node Table
    • table name, property names, zonemaps
  • Semi Masker
    • scan operators that this semi mask is applying to
  • Set Property
    • left = right
  • Standalone Call
    • print cmd (ideally we want this to be a function name)
  • Table Function Call
    • function name
  • Transaction
    • action
  • Union All
    • N/A or expressions
  • Unwind
    • unwind expression as expression
  • Use Database
    • database name

Stage 4 Print statistics and cardinality

  • Print estimated cardinality for logical plan
  • Print actual cardinality for physical plan
  • Propagate and print estimated cardinality to physical plan to see how off we are

Stage 5 Advanced statistics printing

I haven't decided if we should go this far. But printing disk IO for scan operators make sense to me.

@ray6080
Copy link
Contributor

ray6080 commented Sep 10, 2024

Not sure when we can get to stage2 and be able to visualize the plan in a web page. Alternatively, we can provide a more succinct way of printing the plan. One example is what Postgres does here, so it should work better in more cases, though readability decreases a lot.

@andyfengHKU
Copy link
Contributor Author

The printing of logical & physical plan is done. @ray6080 will add statistics incrementally. So I'm considering this issue to be mostly done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New features or missing components of existing features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants