Skip to content
This repository has been archived by the owner on May 6, 2024. It is now read-only.

v0.9.0

Latest
Compare
Choose a tag to compare
@xieqi xieqi released this 28 Oct 06:51
· 303 commits to main since this release
e699dc1
    Release Notes - BDTK - Version 0.9.0

Improvement

  • [Doc] One fix for developer doc
  • [Code refactor] Remove CiderExecutionKernel from cider module
  • [M2] [Integration] VeloxToSubstrait- support variadic function lookup
  • Support partial avg check in Velox with old Cider data format
  • Support complex data print in CiderBatch

Bug

  • DISABLE_FAILED_UT option doesn't take affect
  • [VP_Cider_Op_Fil_Proj_Integration] fix values node issue
  • [VP_Cider_Op_Fil_Proj_Integration] Fix ciderbatch issue to allow mulit-rows results in filter/project cases
  • [VP_Velox_plan_2_Substrait]Change the emit order of projectNode
  • [Cider_Substrait_2_EU] Fix EU generator issue in filter case
  • [ Cider_runtime_proj_fil_module]Fix CiderRuntimeModule bugs to get correct outBatch in filter case
  • [ VP_Cider_data_format_convertor ] Fix bugs caused by changes from Velox, e.g., "cdvi::EMPTY_METADATA"
  • Build fails with macro error in eu-mocker
  • [VP_Cider_Op_Integration] Remove CiderPlan constructor with VeloxPlan as parameter
  • [Bug] return wrong number of columns after processNextBatch
  • Don't generate groupby skip mask code for project only case
  • velox-plugin can't properly handle the representation of null in BDTK
  • Set Column range bug for project case
  • Set Column range bug for project case
  • Different inner join behavior for JOIN ON clause and where clause
  • Bug for query with multiple agg functions.
  • [Velox2Substrait] fix convertor problems in project AS
  • incorrect col index update
  • Groupby result is empty when key is "CASE...WHEN.."
  • CiderOperatorBenchmark change to use CiderVeloxPluginCtx transform plan
  • Incorrect CiderOperator initialize
  • Init folly in CiderOperatorTest
  • Fix issues in String support
  • Unsupport tinyint and smallint simple qual filter
  • Mathematical Op divide zero and overflow
  • Mathematical Op decimal constant support
  • CiderBatch data lost after a turn around in simple method
  • Fix the error about return value of partial avg
  • Fix avg with null value
  • Fix bug by TPC-H Q21
  • fix bug 'Clone velox plannode:Not supported velox plannode type.'
  • Not properly handled when fetchResults Return Empty CiderBatch
  • Use join substrait plan converter instead of hardcode in cider op
  • between...and... scalar function date type support
  • enable some TPC-H queries
  • verify_function_ir bugfix
  • Incorrect non-encoding string conversion
  • core dump when string op is used to filter
  • fix bug of "not a scalar type! kind: ARRAY"
  • fix bug in CiderAggHashTable::fillColsInfo, which group_target_offset_ calculation block not called
  • incorrect filter result because of null varchar and len 0 varchar
  • Fix bug of PlanPattern write race under multi-threading
  • Fix bug of "Remove folly::split of function look up"
  • Fix Old CiderBatch Memory Allocation for Group-By Result Fetching
  • fix bug of construct col name in substrait to eu
  • call loadedVector if DictionaryVector is a lazy vector

Epic

  • Separate prototype code into different repos
  • Substrait Plan Builder
  • The convertor between velox and substrait
  • RelAlgExecutionUnit generator based on substrait
  • Data convertor
  • Join - Velox plugin integration (including operator fusing)
  • Modular_SQL test framework
  • [Infrastructure] Support Join End 2 End in Cider (from presto_cpp to cider)
  • [Infrastructure] Support Agg End 2 End in Cider (from presto_cpp to cider)
  • [Infrastructure] Support filter/project End 2 End in Cider (from presto_cpp to cider)
  • Function/type system support using Velox alpha release (TPC-H)
  • [Functionality] Primary Data Type Support: integer,bigint,decimal,real,double,boolean
  • [Functionality] Operation&Function Support: Logical Ops, Compare Ops etc.
  • [Functionality] RelAlg Op Support: Filter, Project covering all TPC-H cases
  • [Test] Testcase Development for MileStone1 Features
  • [Infrastructure] Consolidate jit-engine branch with Cider
  • [Customer] Integrate native functions provided by ADB into Codegen framework (e.g., expr eval)
  • [Functionality] Primary Data Type Support:String,DataTime
  • [Functionality] Operation&Function Support:Compare Op( between…and) etc.
  • [Functionality] RelAlg Op Support:Agg with having etc.
  • [ Upstream ] Velox-2-Substrait util to Velox repo
  • [Infrastructure] Replace BDTK internal format representation with Velox format
  • [ Infrastructure ] Implement an expr evaluation module
  • [ Documentation ] Implement a API user doc
  • Code Refacator
  • [M2][Functionality][Expr-AggFunctions] Expr - Aggregate Functions
  • [M2][Functionality][JoinOp] Join Ops
  • [M2][Functionality][AggOp] Aggregate Ops
  • [M2][Functionality][Expr-ScalarFunctions] Expr - Scalar Functions
  • [M3] [Code Refactor][Placeholder] Product Improvement

Story

  • BDTK code gen module refactor design
  • Arrow Data convertor impl of varchar/nested data type
  • Port code from presto-java to presto-cpp
  • BDTK Codegen deep dive
  • Investigate agg in velox and BDTK
  • CompileWorkUnit API impl
  • Test Framework about Velox-Arrow-BDTK data type converter
  • Parse the Presto query json log
  • Presto performance projection per query w/ ops break down
  • investigate/analyzing mapping info between velox Plan Node and substrait relations
  • Support Q6 in Execution Unit generator
  • Velox BDTK data transfer micro benchmark
  • Integrate cider into velox UT
  • Impl the IR converter between substrait and velox in SubstraitIRConverter.cpp
  • abstract hash table from BDTK
  • Investigate the velox-substrait plan node translation
  • Support fallback in execution unit generator
  • Data convertor(velox::RowVectorPtr -> BDTK )
  • integration - Agg(blocking op) work flow
  • CICD system setup
  • Data Conversion Utils Implement for Velox and BDTK
  • noisepage investigation
  • Debug failing unit tests in velox and BDTK
  • fix results of query/ data with Null value
  • UT could pass but clean up stage have memory related issues.
  • create getting started guidance for BDTK+velox
  • create dockerfile for BDTK + velox
  • rebase velox change to upstream latest
  • presto tpch queries plan fragment pattern
  • velox driver workflow
  • Setup pre-commit check
  • enable github actions for BDTK
  • Seprate module into deifferent repo
  • rebase the substrait code to align with upstream
  • pre-commit check migration to github action for velox plugin
  • check title and code style
  • update substraitRel to meet our requirements
  • Refine Cider module API
  • check license header
  • E2E workflow integration
  • Design for substrait plan builder
  • Support substrait FilterRel in RelAlgExecutionUnit generator passing "select l_suppkey, l_quantity from lineitem where l_orderkey > 10"
  • fix bugs in pr-title-checker and license-header-checker
  • Velox Plan Transformer: A Pattern Match - Rewriter Framework
  • Code refactor in Substrait to Velox plan util seperate type, function,expr + operator convertors
  • update dockerfile
  • update velox URL for ci
  • BDTK JOIN investigation
  • Generate explanation document for the investigation result of Jacques’s code
  • Arrow data convertor utils for timestamp support
  • Detail design for velox BDTK hash join
  • Rebase to substrait latest version and using index based
  • Data convertor round trip string support Velox type to BDTK type (non-encoding)
  • Group by prototype aggregation support (BDTK ResultSet provider, Velox hybrid operator integration)
  • Complete TPC-H Q6 prototype to product (veloxPlugin->BDTK codegen API refactor and pass UTs )
  • Test plan + execution plan + design for Modular_SQL (Presto Java + Presto_cpp + Velox_plugin + Velox)
  • Generate a CiderOperator prototype with Hash Join with Velox Plan Node contains JoinNode validated by plan node mocking join bridge
  • BDTK Join API
  • BDTK col_buffer internal layout investigation
  • Data convertor code/API refactor with CiderBatch (new API adapter)
  • BDTK Join API implements including code refactor to support build hash table with one use case
  • Agg support in substrait2EU covering (agg + group by) without count distinct
  • update submodule in ci
  • Integration test with velox-substrait and substrait-eu: filter + project
  • Special semantics support in substrait2EU (between/and, count(*))
  • Upstream Velox-To-Substrait code to velox repo (filter+ project)
  • Using Yaml File to do the Function Convertor in velox-substrait utils
  • CI maintain
  • Presto-cpp/velox-plugin/BDTK Integration TPC-H Q6
  • Column Range(Expression range) case fix
  • [Velox2Substrait] Json format for integration test
  • [Velox2Substrait] Move Substrait utils from Velox repo to VeloxPlugin with mocked function parameters
  • [CiderBatch] CiderBatch convertor in VeloxPlugin for primary data with substrait type alignment and gap fulfillment (+decimal)
  • [CiderCompiler] Group By Spill Support (Group By Agg Runtime State Update)
  • [Substrait 2 EU] Code refactor to reuse more code in EU context update
  • Cider Implemetation for Velox Plan Transformer framework
  • [Substrait2Eu] Decimal type support in expression translation
  • [PlanDispatcher] Dispatch codegen op-sequence using supported data type to Cider
  • [CiderRuntime] Verfiy decimal support as Agg Hash table key
  • [Velox2Substrait] Support agg count + count(*) + avg
  • [Velox2Substrait] function convertor using yaml format
  • [CiderRuntimeModule] refactor for agg based on new API partial phase
  • [Substrait2Eu] %, avg, count(col), not support
  • [Dev Integration-Velox2Substrait/Substrait2Eu/AggHashTable]sum, count(col), min, max, avg integrate test with veloxToSubstrait, AggHashTable
  • [Substrait2Eu]Join plan translation(simple inner join case)
  • [Dispatcher] Dispatch codegen op-sequence with milestone1 supported expressions to Cider (and filter unsupported out)
  • [Velox2Substrait] Agg w/o groupby and w/ on one/multi cols(partial)
  • [CiderOperator] Refactor VeloxPlugin Operator (addinput/getoutput...) in VeloxPlugin based on new API to support agg
  • [DevIntegration] VeloxToSubstrait and substraitToEu on groupby
  • [CiderOperator] Placeholder- final need add
  • [Cider Operator] Refactor to support customized merge join bridge passing MergeJoinTest
  • [Cider Operator] Enable Phase I equivalence tests in VeloxPlugin
  • [Cider Operator] Code refactor to move operator, plan and bridge into VeloxPlugin for join
  • [Velox Upstream] Upstream Velox Join related changes to Meta Velox
  • [Cider Operator] Integrate with other components (e.g., data convertor, cider join API, without Velox to Substrait)
  • [Cider Operator] Support Cider Join operator fusion with other operators Design Basic workflow
  • [Velox2Substrait] Join plan translation(simple inner join case)
  • [CiderCompiler] Group By Spill Support (SpillFile and SpillBuffer Abstraction)
  • CiderTableSchema generator for primitive types
  • [Velox2Substrait] Respect engine difference in function mapping in Substrait to Velox utils
  • [Placeholder] Provide column infos in CiderTableSchema
  • Enable existing BDTK ExecuteTest under no-catalog test setting (phase 1 agg/project/filter)
  • Remove catalog and dataMgr
  • Add cider code into the branch
  • [Placeholder] Move RelAlg ResultSet/RelAlgExecutor to test namespace as test utility
  • [Placeholder] Do code cleaning up for jit engine branch rebase
  • [Integration][Prototype] function support crossing substrait, EU, function call, codegen, etc
  • Enable existing BDTK ExecuteTest under no-catalog test setting (phase 2 join case)
  • [SubstraitToEU] Support "Select *" case
  • Comparison Test: investigation JIT engine in ClickHouse
  • Initial perf collection for agg with groupby
  • PlanTransformer/velox->substrait convertor integration
  • Add an API in VeloxToSubstraitPlanConvertor to accept velox plan fragment
  • Upstream Substrait-To-Velox code to velox repo.
  • CiderBatch based DataProvider
  • RelAlgExecutionUnit mocker for Cider expression evaluation
  • Prepare test case list for cider based on velox existing cases
  • Test development for SimpleArithmetic
  • Test development for ComparisonConjunct
  • [Cider Compiler] Cider Join compile path fix and verify
  • [Cider Runtime] Cider join runtime path fix and verify
  • [Cider Runtime Module]integration test, verify runtime module functionality
  • Enable Project/Filter Integration Test for Cider module (a.k.a. BDTK)
  • [Code refactor] Unified blocking (e.g., group-by) and non-blocking (e.g., filter) result data retriever
  • skip_mask support with utils and codegen procedure
  • Hasher class supports perfect hash and normal hash
  • Integrate hash probe procedure to CiderAggHashtable
  • Cider Test framework
  • [Cider] Add more test to cover join case
  • Implement a benchmark tool to compare Modular SQL against Velox at plan fragment Level similar to TPC-H
  • Implement benchmark tool kit to evaluate Modular SQL at operator level against Velox.
  • proposed change for expression evaluation module
  • substrait expression maker and evaluate API
  • support presto function "between" as an example
  • Different behavior between CiderBatch Converter and Construct CiderBatch Directly for CiderModuleCompile
  • Consolidate Cider test framework
  • Filter Op test case design
  • Set Up CPU Information LLVM engine
  • MIP8 document structure improvement
  • [M2][Functionality][DataTypes] [02000] Complex type: DATE
  • [M2][Functionality][DataTypes] [01000] Primitive type including :Boolean, TINYINT, SMALLINT, INTEGER,BIGINT,REAL,DOUBLE
  • [M2][Functionality][JoinOp] [11001] Explicit InnerJoin-ON clause -equal
  • [M2][Functionality][JoinOp] [11004] Explicit InnerJoin-WHERE clause -equal
  • [M2][Functionality][JoinOp] [12001] Explicit Left(outer) Join -ON clause-equal
  • [M2][Functionality][JoinOp] [12009] Full (Outer) Join- USING clause- equal
  • [M2][Functionality][FilterProjectOp] [22004] Project Op- *
  • [M2][Functionality][FilterProjectOp] [21005] Filter Op- IN
  • [M2][Functionality][FilterProjectOp] [21011][53000] Filter-Expr-CASE ... WHEN ...
  • [M2][Functionality][FilterProjectOp] [21002] Filter Op-BETWEEN ... AND ...
  • [M2][Functionality][FilterProjectOp] [21000] Filter OP-Comparison Op
  • [M2][Functionality][AggOp] [33013] having-on expr-with conditional expr
  • [M2][Functionality][AggOp] [33000] Agg Op-group by limits-on col-name
  • [M2][Functionality][AggOp] [33002] group by limits-on col-with conditional expr
  • [M2][Functionality][AggOp] [33003] group by limits-on col- multi cols
  • [M2][Functionality][AggOp] [33014] having-subqueries
  • [M2][Functionality][Expr-AggFunctions] [41010] count(distinct)
  • [M2][Functionality][Expr-AggFunctions] [41003] avg on col
  • [M2][Functionality][Expr-AggFunctions] [41002] general agg-sum on expr-without conditional expr
  • [M2][Functionality][Expr-AggFunctions] [41005] avg on expr without conditional expr
  • [M2][Functionality][Expr-AggFunctions] [41006][41008] count(*) and count on col
  • [M2][Functionality][Expr-AggFunctions] [41000] general Agg-max-min-sum on col
  • [M2][Functionality][Expr-ScalarFunctions] [52005] [56001] [21007] SCALAR-Func-LIKE
  • [M2][Functionality][Expr-ScalarFunctions] [52002] IS NOT NULL
  • [M2][Functionality][Expr-ScalarFunctions] [55000] Mathematical Op
  • [M2][Functionality][Expr-ScalarFunctions] [51000]Scalar-Func-logical op and some compare op
  • [M2][Functionality][Expr-ScalarFunctions] [57008] Date -8 extract functions
  • [M2][Functionality][Expr-ScalarFunctions] [52001] IS NULL
  • [M2][Functionality][Expr-ScalarFunctions] [52000] Compare Op BETWEEN ...AND- NULL
  • [Velox2Substrait][Upstream] Upstream root Rel and Type Nullablity to velox repo
  • [Placeholder] Performance dashboard setting up
  • [M2][Functionality][DataTypes][Cider test development][01000] Primitive type including :Boolean, SMALLINT, INTEGER,BIGINT,REAL,DOUBLE
  • [M2][Functionality][DataTypes] [Cider test development][02000] Complex type: DATE
  • [M2][Functionality][JoinOp] [Cider test development][11001] Explicit InnerJoin-ON clause -equal
  • [M2][Functionality][JoinOp] [Cider test development][11004] Explicit InnerJoin-WHERE clause -equal
  • [M2][Functionality][JoinOp] [Cider test development][12001] Explicit Left(outer) Join -ON clause-equal
  • [M2][Functionality][JoinOp] [Cider test development][12009] Full (Outer) Join- USING clause- equal
  • [M2][Functionality][FilterProjectOp] [Cider test development][21000] Filter OP-Comparison Op
  • [M2][Functionality][FilterProjectOp] [Cider test development][21002] Filter Op-BETWEEN ... AND ...
  • [M2][Functionality][FilterProjectOp] [Cider test development][22004] Project Op- *
  • [M2][Functionality][FilterProjectOp] [Cider test development][21005] Filter Op- IN
  • [M2][Functionality][FilterProjectOp] [Cider test development][21011][53000] Filter-Expr-CASE ... WHEN ...
  • [M2][Functionality][AggOp] [Cider test development][33000] Agg Op-group by limits-on col-name
  • [M2][Functionality][AggOp] [Cider test development][33003] group by limits-on col- multi cols
  • [M2][Functionality][AggOp] [Cider test development][33002] group by limits-on col-with conditional expr
  • [M2][Functionality][AggOp] [Cider test development][33014] having-subqueries
  • [M2][Functionality][AggOp] [Cider test development][33013] having-on expr-with conditional expr
  • [M2][Functionality][Expr-AggFunctions] [Cider test development][41000] general Agg-max-min-sum on col
  • [M2][Functionality][Expr-AggFunctions] [Cider test development][41002] general agg-sum on expr-without conditional expr
  • [M2][Functionality][Expr-AggFunctions] [Cider test development][41006][41008] cound(*) and count on col
  • [M2][Functionality][Expr-AggFunctions] [Cider test development][41003] avg on col
  • [M2][Functionality][Expr-AggFunctions] [Cider test development][41005] avg on expr without conditional expr
  • [M2][Functionality][Expr-AggFunctions] [Cider test development][41010] count(distinct)
  • [M2][Functionality][Expr-ScalarFunctions] [Cider test development][51000] Scalar-Func-logical op and some compare op
  • [M2][Functionality][Expr-ScalarFunctions] [Cider test development][55000] Mathematical Op
  • [M2][Functionality][Expr-ScalarFunctions] [Cider test development][52000] Compare Op BETWEEN ...AND- NULL
  • [M2][Functionality][Expr-ScalarFunctions] [Cider test development][52001] IS NULL
  • [M2][Functionality][Expr-ScalarFunctions] [Cider test development][52002] IS NOT NULL
  • [M2][Functionality][Expr-ScalarFunctions] [Cider test development][52005] [56001] [21007] SCALAR-Func-LIKE
  • [M2][Functionality][Expr-ScalarFunctions] [Cider test development][57008] Date -8 extract functions
  • [M2][Integration][benchmarkDashboardMaintenance] [Placeholder]
  • [M2][Integration][benchmarkDashboardMaintenance] [Placeholder]
  • [M2][Upstream][Velox] [Placeholder]
  • [M3][Functionality][DataType][01002] STRING-VARCHAR.
  • [M2][Infrastructure] [AggOp][31000] distinct on col/ on expr(w/wo conditional).
  • [M2][Infrastructure] [JoinOp][11002] Explicit Inner Join- ON clause-nonEqual.
  • [M3][Functionality] [Expr-AggFunctions][41001] General Agg- sum - on expr with conditional expr.
  • [M2][Infrastructure] [AggOp][32000] order by on col.
  • [M2][Integration][DataTypes][Velox] [02000] Complex type: DATE
  • [M2][Integration][DataTypes][Velox] [Benchmark development against Velox][01000] Primitive type including :Boolean, SMALLINT, INTEGER,BIGINT,REAL,DOUBLE
  • [M2][Integration][DataTypes][Velox] [01000] Primitive type including :Boolean, SMALLINT, INTEGER,BIGINT,REAL,DOUBLE
  • [M2][Integration][JoinOp][Velox] [11001] Explicit InnerJoin-ON clause -equal
  • [M2][Integration][AggOp][Velox] [33000] Agg Op-group by limits-on col-name
  • [M2][Integration][AggOp][Velox] [33014] having-subqueries
  • [M2][Integration][AggOp][Velox] [33002] group by limits-on col-with conditional expr
  • [M2][Integration][AggOp][Velox] [33003] group by limits-on col- multi cols
  • [M2][Integration][AggOp][Velox] [33013] having-on expr-with conditional expr
  • [M2][Integration][Expr-AggFunctions][Velox] [41003] avg on col
  • [M2][Integration][Expr-AggFunctions][Velox] [41005] avg on expr without conditional expr
  • [M2][Integration][Expr-AggFunctions][Velox] [Benchmark development against Velox][41003] avg on col
  • [M2][Integration][Expr-AggFunctions][Velox] [41010] count(distinct)
  • [M2][Integration][Expr-AggFunctions][Velox] [Benchmark development against Velox][41000] general Agg-max-min-sum on col
  • [M2][Integration][Expr-AggFunctions][Velox] [41002] general agg-sum on expr-without conditional expr
  • [M2][Integration][Expr-AggFunctions][Velox] [41006][41008] cound(*) and count on col
  • [M2][Integration][Expr-ScalarFunctions][Velox] [Code refactor]
  • [M2][Integration][Expr-ScalarFunctions][Velox] [52002] IS NOT NULL
  • [M2][Integration][Expr-ScalarFunctions][Velox] [51000] Scalar-Func-logical op and some compare op
  • [M2][Integration][Expr-ScalarFunctions][Velox] [55000] Mathematical Op
  • [M2][Integration][Expr-ScalarFunctions][Velox] [52000] Compare Op BETWEEN ...AND- NULL
  • [M2][Integration][Expr-ScalarFunctions][Velox] [57008] Date and Time -17 extract functions
  • [M2][Integration][Expr-ScalarFunctions][Velox] [52005] [56001] [21007] SCALAR-Func-LIKE
  • [M2][Integration][Expr-ScalarFunctions][Velox] [52001] IS NULL
  • [M2][Integration][FilterProjectOp][Velox] [Benchmark development against Velox][21011][53000] Filter-Expr-CASE ... WHEN ...
  • [M2][Integration][FilterProjectOp][Velox] [21011][53000] Filter-Expr-CASE ... WHEN ...
  • [M2][Integration][FilterProjectOp][Velox] [22004] Project Op- *
  • [M2][Integration][FilterProjectOp][Velox] [21002] Filter Op-BETWEEN ... AND ...
  • [M2][Integration][FilterProjectOp][Velox] [Benchmark development against Velox][21000] Filter OP-Comparison Op
  • [M2][Integration][FilterProjectOp][Velox] [21000] Filter OP-Comparison Op
  • add monitor and restart scripts for github workflow runners
  • [M2][Infra] [Infra4Product]Unified Memory Management
  • [M2][Infra] [Infra4Product]Bump up Substrait version to V0.7.0
  • Prestodb/Velox/Modular SQL integration
  • [M2] [Functionality][Expr-ScalarFunctions][57005] Date and Time op/functions-dateAdd
  • [M2] [Functionality][Expr-ScalarFunctions][Cider test development][57005] Date and Time op/functions-dateAdd
  • PlanTransformer: Add FilterPattern for Cider
  • Add CiderVeloxPluginCtx for velox-plugin
  • upgrade substrait in cider context
  • refine CiderBatchChecker
  • [M3][Functionality][DataTypes] [02002] [Cider test development] TIMESTAMP/INTERVAL DAY TO SECOND/INTERVAL YEAR TO MONTH
  • [M3][Functionality][DataTypes] [02002] TIMESTAMP/INTERVAL DAY TO SECOND/INTERVAL YEAR TO MONTH
  • [M3][Functionality][DataTypes] [01004] String:VARBINARY
  • [M3][Functionality][JoinOp] [11003] Explicit Inner Join:ON clause mixed
  • [M3][Functionality][JoinOp] [11003][Cider test development] Explicit Inner Join:ON clause mixed
  • [M3][Functionality][JoinOp] [11002] Explicit Inner Join:ON clause non-equal
  • [M3][Functionality][JoinOp] [12002] Explicit Left (Outer) Join: ON clause non-equal
  • [M3][Functionality][JoinOp] [12003] Explicit Left (Outer) Join:ON clause mixed
  • [M3][Functionality][JoinOp] [12002][Cider test development] Explicit Left (Outer) Join: ON clause non-equal
  • [M3][Functionality][JoinOp] [11005] Implicit Inner Join-EXISTS clause-equal
  • [M3][Functionality][JoinOp] [14000] Semi Join:implicit Semi Join IN clause
  • [M3][Functionality][JoinOp] [11002][Cider test development] Explicit Inner Join:ON clause non-equal
  • [M3][Functionality][FilterProjectOp] [21004][Cider test development] Filter Op:IS DISTINCT/IS NOT DISTINCT
  • [M3][Functionality][FilterProjectOp] [21004] Filter Op:IS DISTINCT/IS NOT DISTINCT
  • [M3][Functionality][FilterProjectOp] [22006]project OP:(scalar project)
  • [M3][Functionality][FilterProjectOp] [22005] project OP:(simple project)
  • [M3][Functionality][AggOp] [31000] distinct on col/ on expr(w/wo conditional).
  • [M3][Functionality][AggOp] [33007] GROUP BY DISTINCT GROUPING SETS(...)
  • [M3][Functionality][AggOp] [33006] GROUP BY ALL GROUPING SETS(...)
  • [M3][Functionality][Expr-AggFunctions] [41009] [Cider test development] count(1)
  • [M3][Functionality][Expr-AggFunctions] [41009] count(1)
  • [M3][Functionality][Expr-ScalarFunctions] [52007] Compare Ops & Functions ANY
  • [M3][Functionality][Expr-ScalarFunctions] [54000] [Cider test development]Conversion Functions Cast
  • [M3][Functionality][Expr-ScalarFunctions] [53001] [Cider test development] Conditional Expr IF ...THEN
  • [M3][Functionality][Expr-ScalarFunctions] [54000] Conversion Functions Cast
  • [M3][Functionality][Expr-ScalarFunctions] [56005] substr
  • [M3][Functionality][Expr-ScalarFunctions] [53001] Conditional Expr IF ...THEN
  • [M3][Functionality][Expr-ScalarFunctions] [52006] Compare Ops & Functions: All
  • [M3][Functionality][Expr-ScalarFunctions] [56005] [Cider test development] substr
  • [M3] [Code Refactor][Placeholder]Improvement
  • [M3][Functionality][Expr-ScalarFunctions] [Cider test development][53002] Conditional Expr: COALESCE
  • update all header
  • [VeloxToSubstrait] - add between function and avg function support
  • [M2][Infrastructure][GroupBy]Support Group-By Key with VarChar Type
  • Cider lib benchmark design(vs duckdb)
  • [Velox2Substrait] add scalar function: substring, in, like
  • [M2][Performance][Analysis_Optimization] [Placeholder] Cider Test Cases Query Analysis
  • [M2][Integration] VeloxToSubstrait - add substrait extension registry in Cider-velox for function validator and lookup
  • Group-by-key varchar support
  • output struct2row support in data convertor
  • Transform presto's special function to other formats in cider
  • Add missing function in V2S convertor :colcase
  • [M3][Functionality][AggOp] [33015] Support string as key/target in group by
  • Fix bugs by TCP-H in EndToEnd testing
  • [M2][Integration] remove hardcode function reference for upstream velox
  • Add support of cast ( string AS date)
  • In [varchar] support
  • substr() in [varchar] support
  • Upstream Velox-To-Substrait code to velox repo(ifThen/switch)
  • [PlaceHolder] support V2S feature gaps got from test bugs after enable more pattern
  • PlanTransformer: Change clone velox plan nodes to const_cast the node source
  • add cider feature support doc.

Feature

  • turn on all passed TPC-H queries
  • Data converter support CONSTANT encoding

Task

  • Plan transformation for Cider in Presto_CPP - Part I
  • Support build BDTK/velox with same gcc
  • BDTK dbengine api wrapper for velox
  • Substrait: TPC-H queries semantic coverage check
  • SQL Modularization create kernel
  • Analyzing function mapping mechanism in substrait
  • Velox plan translator for BDTK execution unit as framework "select a from b where c=1+2"
  • Adopt PlanBuilder for exe_unit genrator test
  • support Q1 in Execution Unit generator
  • evaluate translate approach between velox join and BDTK join
  • Design doc for Function semantic consistency
  • Data preparation for TPC-H Q6
  • target_exprs can't be compiled as lost epxr detailed info
  • Plan translation design based on substrait
  • initial draft for overall Modular-SQL plan
  • BDTK UT failures fixes
  • Investigate how to register extension function to BDTK via ExtensionFunctionWhitelist
  • Velox level execution API investigation
  • Disable failed BDTK UTs
  • Enable 4 use cases for filter + project cases with UTs
  • Switch to use CiderCompileModule in substraitToEU test
  • refactor Substrati2ExecutionUnit Module
  • comment jira link when submit PR.
  • Allow codegen sum(bigint) with output type Double
  • Feature Coverage Test work flow definition
  • Enable cpplint in cider and velox plugin.
  • Update submodule substrait to tag v0.3.0(commit id: e4fdf87)
  • feature list collected based on presto
  • refine col_hints_ in CiderTableSchema
  • Upstream Velox-To-Substrait code to velox repo (aggregation)
  • Performance feature collections from velox
  • update github action's runners
  • update test json files in cider
  • Remove support of old substrait in Substrait2EU
  • Integration CICD maintance - Enable UT End-to-end Test.
  • [M2] [Infra] Old data convertor support TPC-H
  • remove todo and fixme
  • pressure test for Cider library
  • TODO and FIXME resolve in CiderFilterTest and expression evaluation module
  • TODO and FIXME resolvement in CiderBatchChecker and CiderGroupByTest
  • Identify feature gap between tpc-h planfragment and cider
  • Inner join with no criteria should not be offloaded
  • fix core dump caused by in[varchar] + binop
  • Deploy presto_cpp in distributed system.
  • remove CMAKE_CXX_EXTENSIONS settings
  • turn on multi-threading drivers
  • Rebase Velox WW43 branch
  • Preparation for release 0.9

Sub-task

  • Investigate memory ownership of BDTK arrow/duckdb arrayholder
  • investigate the gap of test framework between presto-java and presto-cpp
  • port the presto-java tpch test framework to presto-cpp
  • Build Velox Arrow BDTK data type mapping & test framework
  • Wramp up
  • Write Test Framework
  • Wrapper for UDFOutputString to Save Velox String function result
  • Design doc for data convert between velox and BDTK
  • Define class and API
  • APIs about convert from substrait to velox
  • APIs about convert from velox to substrait
  • unit test
  • Data Convertor Implement for fixed-length basic types
  • Add Unsupported Velox function to BDTK
  • complete and Pass UT of the API transformVFilter
  • distributeRel proposal
  • Timestamp round-trip convert support
  • complete and pass UT for the Aggregate Node transform
  • Invesitgate tpc-h presto plan fragments to understand the pattern
  • Support basic types data convert with arrow
  • Velox Hash Join Investigation
  • Port prototype code to velox-plugin
  • decouple BDTK header
  • CiderTableSchema/CiderBench API Update
  • CiderCompileModule Refactor
  • Test case coverage for Cider Codegen API
  • velox-plugin refactor based on codegen API
  • Test case for velox-plugin
  • Change project and filter Node Convertor to index based
  • Expr unexpected rewritten during workunit compilation
  • Group by integration to Velox with SUM and partial AVG
  • CiderAggHashTable bug fixing and interface adding
  • Implement CiderHashJoin (one-to-one)
  • CiderRuntimeModule Refactor
  • Velox plan to Substrait plan convertor integration test: project+filter
  • Some initial code for Cider join integration
  • One-To-Many perfect hash table impl.
  • velox-plugin refactor based on Cider Codegen API
  • Initial code allowing customized cross join bridge
  • Initial code to support merge join
  • real type support in expression translation
  • support modulus and not
  • Bump Velox function in BDTK
  • Fix core dump when running HashJoinTest and MergeJoinTest in pipeline for JoinBridgeBranch
  • Unify cross join, merge join and hash join plan node into CiderPlan without Substrait dependency
  • Unify cross join, merge join and hash join operator into CiderOperator without codegen lib dependency
  • Prepare JoinPrototype branch in VeloxPlugin
  • Velox JoinBridge branch code migration to VeloxPlugin
  • Move Substrait utils from Velox repo to VeloxPlugin
  • Use substrait plan for CiderCompileModue
  • support count(col), avg
  • Rebase onto latest Velox code and fix build issue
  • Enable data convertor UTs with json-defined substrait type and CiderTableSchema functions
  • Add getOutputCiderTableSchema in CiderCompileResult
  • use substrait plan in CiderPlanNode
  • collect utils functions in Generator to a header file
  • enable nullability setting for velox to substrait type convertor
  • Add UTs for customized join bridge in velox code for future upstream
  • CiderTableSchema generator for primitive types(output table schema)
  • CiderTableSchema generator for primitive types(input table schema)
  • E2E dev env setup: presto_cpp+velox
  • Add TPC-H Q6 test
  • Group-by agg integration to Velox-plugin with new API
  • set root_reference names
  • [velox2substrait2eu] integration test on agg
  • enable agg avg
  • E2E dev env setup: presto_cpp+velox+veloxplugin+BDTK
  • Decouple DataMgr from QueryEngine. Leave DataMgr only for UTs
  • Move Chunk, MemoryLevel , etc from DataMgr to DataProvider
  • Fix compile method issue after merging Cider branch
  • Hotfix for POAE-1599
  • update dockerfile to build code in BDTK develop branch
  • Filter bug under nullable input investigation
  • E2E dev env setup: update setup scripts in BDTK
  • E2E dev env setup: add Dockerfile and DEVELOPER_GUIDE in velox-plugin
  • E2E dev env setup: forbid the update of submodule in presto
  • Enable CI for new develop branch
  • Convertor from substrait VirtualTable to velox ValuesNode
  • avg partial function support
  • struct type support in getOutputSchema of CiderBatch
  • compile option fix (DENABLE_JIT_DEBUG)
  • E2E dev env setup: remove unnecessary dependencies of BDTK
  • [Velox2Substrait] support count(1) + count(col)
  • add filterNode convertor from substrait to velox and support more expr PR
  • groupby performance analysis
  • fix bugs in CiderRuntimeModuleTest to pass UT
  • Rebase velox JoinPrototype with upstream main branch
  • Enable CI/CD in velox join branch
  • [Code refactor] Reconstruct package layout for substrait related code
  • [ Code refactor] Velox plugin Join Branch to remove intermediate CiderPlanNode and Cider join related operator
  • Cider benchmark for eq, gt, lt
  • Collect all batches in build side and deliver to probe side
  • Rebase with upstream velox code to remove innersource submodule
  • Convert the velox batch to cider batch and deliver it to cider in probe side
  • invoke the runtime module api to get the join result from cider
  • support neq in Cider
  • Velox Plan Transformer framework: Complete the Velox Node Type for clone
  • Velox Plan Transformer framework: Improve Error Handling
  • Velox Plan Transformer framework: Test Util and More Test Cases for framework
  • add test case for nested expression
  • [bug fix] fix bugs for op "neq"
  • Add Isthmus framework in Cider repo
  • Add DuckDBQueryRunner in test framework
  • Add CiderBatchBuilder CiderBatchChecker in test framework
  • Add CiderQueryRunner in test framework
  • upgrade docker dev env
  • remove -masm=intel from CMAKE_CXX_FLAGS in presto
  • Pattern Match-Rewriter framework main part Implementation
  • Add reusable CiderBatch support for Cider
  • Add reuse CiderBatch support for Velox-Plugin
  • Add QueryDataGenerator for test framework
  • Add null data support for test framework
  • Modify code style and add guidance for ut
  • Make BDTK code style base on Google code style
  • CiderBatch refactor with self allocator
  • partial avg workflow test
  • Substrait expression maker API design
  • RelAlgExecutionUnit wrap for substrait expression
  • presto function registration in Cider
  • expression translation to Analyzer::FunctionOper for Presto function
  • partial avg support of Cider side
  • Development of substrait expression maker
  • Development of Expression evaluator
  • Generate Substrait plan from json file in Cider test framework
  • Add CiderBatch and test framework Date type support
  • Substrait2EU support Date conversion
  • duckdb insert null data
  • Judge isExtensionFunction based on function name and arg types
  • Add a CiderBatch util method for better debug.
  • remove debug print in uts
  • Support project+join+project+readrel plan fragment
  • CiderOperator refactor
  • change all license header to intel ASF
  • fix code-format issue after repo merge
  • add labels
  • update substrait 0.7.0
  • Inner join - one to one hash join support
  • folder placeholder for package refactor
  • Move googletest/duckdb etc. to submodule
  • Plan for NULL representation replacement
  • VeloxPlugin SDLE Taks
  • [M2][Functionality][GroupBy][Bug Fix] Wrong integer width picking when using default range
  • [M2][Integration] Presto+ModularSQL Integeration
  • Code refactor for CiderRuntime module
  • Data format migration branch setup
  • ColumnVar new data format support.
  • CiderAggHashTable row memory layout update
  • Cider aggregate functions rewrite to support null vector
  • GroupBy codegen procedure update to support new data format
  • enable Cpp lint check for whitespace
  • Cider string type Layout definition
  • cider string type feed input and fetching/parse results
  • Cider string related test framework enhancemant
  • String related enhancement
  • Support additional filter in Join quals
  • CiderOperatorBenchmark code refactor
  • Add license and README
  • remove cider-velox proto
  • VarChar Query result fix.
  • remove resultset component
  • CiderBatch Refactor with ArrowSchema & ArrowArray.
  • ROW Type Support for partial avg.
  • Move Cider QueryEngine to exec
  • count(distinct) without groupby support
  • count(distinct) with groupby support
  • MIP: code refactor for Substrait2EU
  • Move runtime function to func
  • Code refactor to set cider exec/compile options with property file
  • New Data Format Integration & Test under Group-by with ColumnarVar.
  • Enable avg test support
  • Enable velox ci
  • Code refactor for Substrait2EU module RelNode refactor
  • Add simple function support after velox update (hard code now)
  • Code refactor for Substrait2EU module Function Look up
  • BinOp new data format support
  • UOp new data format support
  • Add intel and BDTK copyright
  • update velox version
  • remove columnFetcher module
  • nongroupby agg output with null support
  • Improve Hasher to Convert VarChar & Uid to Each Other
  • Support VarChar Group Key in Codegen Procedure
  • Support VarChar Result Fetch in CiderRuntimeModule
  • RawDataConvertor VarChar Support: VeloxVector2CiderBatch
  • RawDataConvertor VarChar Support: CiderBatch2VeloxVector
  • Filter Codegen Procedure Update to Support Null Vector
  • Constant Expression new data format support
  • Project Result Fetching
  • Cider profiling framework
  • Cider benchmark framework
  • bypass Check in release mode
  • V2S join op convertor
  • Transformer support multi-source Cider plan
  • Package refactor
  • add vector<RowVectorPtr> -> CiderBatch converter
  • change namespace to cider and remove some unused files
  • Add BDTK readme
  • CiderBatch Refactor for Better Null Allocation
  • update BDTK readme
  • hotfix for substrait dependency
  • Enable labels for BDTK repo
  • build presto package
  • replace abort
  • Fix non-GroupBy avg with null value
  • Fix Float and GroupBy avg with null value
  • Remove TODO for project and non-groupby and throw exception for unsupported type
  • rebase cider-velox part on WW43
  • rebase cider part on WW43

Sub-Feature

  • upstream this pr into BDTK/BDTK
  • BDTK group by hash table
  • Test plan proposal
  • between/and support in substrait2EU
  • reduce core for build
  • AggregateRel support for Expression IR
  • Groupby support for Expression IR
  • Code refactor to enable hashjoinTest.bigintArrayForCider in JoinPrototype branch
  • Enable dependabot in VeloxPlugin for Velox submodule
  • repo merge
  • DataConvertor support for DATE type
  • [M2][BenchmarkDashboardMaintenance] Performance report&email automation
  • [M2][BenchmarkDashboardMaintenance] Presto + Cider Basic Tuning
  • [M2][BenchmarkDashboardMaintenance] Removing Docker Env
  • [M2][Integration][benchmarkDashboardMaintenance] Implemented CICD scripts
  • [M2][Integration][benchmarkDashboardMaintenance] Update presto-cpp
  • change build method and apply to ci
  • [M2][Integration][benchmarkDashboardMaintenance] Set up presto-java end to end environment.
  • [M2][Integration][benchmarkDashboardMaintenance] Presto integration debug env.
  • [M2][Integration] V2Substrait-Code Refactor use velox registry instead of substrait extension YAML files

Technical Task

  • remove unused component

Limitation

  • Parquet file format support is experimental
  • Hashing with a high cardinality key is limited (currently only support keys whose distinct value is no more than 100,000)