-
-
Notifications
You must be signed in to change notification settings - Fork 0
GPORCA OSS 101
Presenter: Xin Zhang (xzhang@pivotal.io)
Event: GPDB Meetup
Date: April 2017
Data Team at Palo Alto
On Jan 2016, Open Source GPORCA
- Repository: http://github.com/greenplum-db/gporca
- Blog: http://engineering.pivotal.io/post/gporca-open-source/
WHAT is GPORCA architecture?
WHY using GPORCA (pros and cons)?
HOW to contribute to GPORCA?
- Add a feature
- Unit test it
- Contribute back to community
- Jul 2015: PQO: Pivotal Query Optimizer (Released in GPDB 4.3.5.0)
- Jun 2014: ORCA: SIGMOD 2014 Paper
-
GPOPT:
src/backend/gpopt,select gp_opt_version(); - GPORCA = Greenplum ORCA = ...
SELECT s.beer, s.price
FROM Bars b, Sells s
WHERE b.name = s.bar
AND b.city = 'San Francisco'Distribution:
- Bars is distributed randomly
- Sells is distributed by bar
Slice Architecture:
- Slice 1: Motion (Gather) on Master
- Slice 2: Project, HashJoin on Segments
- Slice 3: Filter, Scan on Segments
GPORCA can optimize all 111 TPC-DS Queries (According to SIGMOD 2014 Paper)
- ~60 Logical Operators
- ~50 Physical Operators
- ~40 Scalar Operators
- ~110 Transformation Rules
- Smarter partition elimination
- Multi-level partitioning support (4.3.6)
- Subquery unnesting
- Common table expressions (CTE)
- Join on castable data types
- Improved join ordering (better join order)
- Join-Aggregate reordering (push/pull AGG over join)
- Sort order optimization (avoid sort multiple times)
- Skew awareness (redistribution vs. broadcast)
Host System
┌─────────┬─────────┬─────────┐
│ Parser │ Catalog │Executor │
│ ↓ │ ↑ │ ↑ │
│ Q2DXL │MD Provider│DXL2Plan│
└─────────┴─────────┴─────────┘
│ │ │
↓ ↓ ↑
DXL DXL DXL
Query MD Plan
│ │ │
└─────────┼─────────┘
GPORCA
-
GPOS (Foundation Layer)
- File I/O
- Exception Handling
- Concurrency Control
- Memory Management
- Search Engine & Job Scheduler
- Memo Structure
-
Optimizer Components:
- Operators
- Xforms (Transformation Rules)
- Cost Model
- Property Enforcers
- Cardinality Model
- Metadata Cache
- [Step 0] Pre-Process: e.g., predicates pushdown
- [Step 1] Exploration: All equivalent logical plans
- [Step 2] Statistics Derivation: histograms
- [Step 3] Implementation: Logical to Physical
- [Step 4] MPP Optimization: Enforcing and Costing
Memo Structure:
- Group: Container of equivalent expressions
- Group Expression: Operator that has other groups as its children
Example:
Inner Join(T1.a = T2.b)
├── Get(T1)
└── Get(T2)
Memo Structure:
GROUP 0: [0: Inner Join [1,2]]
GROUP 1: [0: Get (T1) [ ]]
GROUP 2: [0: Get (T2) [ ]]
After applying join commutativity:
GROUP 0: [0: Inner Join [1,2], 1: Inner Join [2,1]]
GROUP 1: [0: Get (T1) [ ]]
GROUP 2: [0: Get (T2) [ ]]
Process:
- ORCA requests required statistics from back-end system
- MD Accessor retrieves statistics from catalog via MD Provider
- Statistics are cached in MD Cache
- Histograms are derived for join columns
Logical to Physical Transformation:
Inner Join (T1.a=T2.b) [1,2] → Inner Hash Join (T1.a=T2.b) [1,2]
├── GROUP 1 ├── GROUP 1
└── GROUP 2 └── GROUP 2
Three Sub-phases:
- Requirement: Distribution and Order requirements
- Costing: Cost calculation for different alternatives
- Enforcement: Property enforcing (Sort, Gather, Redistribute)
Example Enforcement:
Original Plan:
Inner Hash Join
├── Scan(T1)
└── Redistribute(T2.b)
└── Scan(T2)
After Enforcement:
GatherMerge(T1.a)
└── Sort(T1.a)
└── Inner Hash Join
├── Scan(T1)
└── Redistribute(T2.b)
└── Scan(T2)
Goal: Split an aggregate into a pair of local and global aggregate.
Schema:
CREATE TABLE foo (a int, b int, c int) distributed by (a);Query:
SELECT sum(c) FROM foo GROUP BY bStrategy:
- Do local aggregation on segments
- Then global aggregation on master
Before: After:
GpAgg (b) → GpAgg (b)
├── Sum(c) ├── Sum(c)
└── Get(foo) └── GpAgg (b)
├── Sum(c)
└── Get(foo)
File Locations:
-
Header:
~/orca/libgpopt/include/gpopt/xforms -
Source:
~/orca/libgpopt/src/xforms
Transformation Trigger Components:
- Pattern matching
- Pre-condition check
Pattern Definition:
GPOS_NEW(pmp) CExpression(
pmp,
// logical aggregate operator
GPOS_NEW(pmp) CLogicalGbAgg(pmp),
// relational child
GPOS_NEW(pmp) CExpression(pmp, GPOS_NEW(pmp) CPatternLeaf(pmp)),
// scalar project list
GPOS_NEW(pmp) CExpression(pmp, GPOS_NEW(pmp) CPatternTree(pmp))
);Pre-condition Check:
// Compatibility function for splitting aggregates
virtual BOOL FCompatible(CXform::EXformId exfid) {
return (CXform::ExfSplitGbAgg != exfid);
}Purpose: Do not fire this rule on a logical operator produced by the same rule (Avoid Infinite Recursion)
The Actual Transformation:
void Transform(
CXformContext *pxfctxt, // update
CXformResult *pxfres, // output
CExpression *pexpr // input
) const;Register Transformation:
void CXformFactory::Instantiate() {
...
Add(GPOS_NEW(m_pmp) CXformSplitGbAgg(m_pmp));
...
}.
├── cmake
├── concourse
├── data
├── libgpdbcost
├── libgpopt
├── libgpos
├── libnaucrates
├── patches
├── scripts
└── server
Purpose: Memory, task scheduler, exception handling, unit-test framework
libgpos
├── include
└── src
├── common
├── error
├── io
├── memory
├── net
├── string
├── sync
├── task
└── test
Purpose: DXL, metadata, statistics, traceflags
libnaucrates
├── include
│ └── naucrates
│ ├── base
│ ├── dxl
│ │ ├── operators
│ │ ├── parser
│ │ └── xml
│ ├── md
│ ├── statistics
│ └── traceflags
└── src
├── base
├── md
├── operators
├── parser
├── statistics
└── xml
Purpose: Engine, metadata cache, minidump, operators, memo, xform rules
libgpopt
├── include
│ └── gpopt
└── src
├── base
├── engine
├── eval
├── mdcache
├── metadata
├── minidump
├── operators
├── optimizer
├── search
├── translate
└── xforms
Purpose: Cost model implementation
libgpdbcost
├── CMakeLists.txt
├── include
│ └── gpdbcost
└── src
├── CCostModelGPDB.cpp
├── CCostModelGPDBLegacy.cpp
├── CCostModelParamsGPDB.cpp
├── CCostModelParamsGPDBLegacy.cpp
└── ICostModel.cpp
Purpose: Unit tests server
server
├── include
└── src
├── startup
└── unittest
├── dxl
│ ├── base
│ └── statistics
└── gpopt
├── base
├── cost
├── csq
├── engine
├── eval
├── mdcache
├── metadata
├── minidump
├── operators
├── search
├── translate
└── xforms
Purpose: All the test data
data
├── dxl
│ ├── cost
│ ├── csq_tests
│ ├── expressiontests
│ ├── indexjoin
│ ├── metadata
│ ├── minidump
│ │ ├── CArrayExpansionTest
│ │ ├── CJoinOrderDPTest
│ │ ├── CPhysicalParallelUnionAllTest
│ │ ├── CPruneColumnsTest
│ │ └── sql
│ ├── multilevel-partitioning
│ ├── parse_tests
│ ├── plstmt
│ ├── query
│ ├── search
│ ├── statistics
│ ├── tpcds
│ ├── tpcds-partitioned
│ ├── tpch
│ └── tpch-partitioned
- GP-XERCES: https://github.com/greenplum-db/gp-xerces
- CMake 3.0+
mkdir build
cd build
cmake ../
make && make installCI/CD: https://ci.orca.pivotalci.info/teams/main/pipelines/gporca
# run all unit tests
ctest
# run all unit tests in parallel with 7 threads
ctest -j7
# run only one unit test called CAggTest
./server/gporca_test -U CAggTestFollow instructions from: https://github.com/d/bug-free-fortnight
It's very useful to verify installcheck-good locally with latest GPORCA changes.
GPORCA relies on Traceflags to change runtime behavior: Traceflag.h
Exposed in GPDB as GUC (Grand Unified Configuration): guc_gp.c
-- turn on GPORCA
set optimizer=on;
-- print input query (GPORCA TF 101000)
set optimizer_print_query=on;Turn on the minidump:
set client_min_messages='log';
set optimizer=on;
set optimizer_enable_constant_expression_evaluation=off;
set optimizer_enumerate_plans=on;
set optimizer_minidump=always;Run a query: GPORCA creates a *.mdp file in the $MASTER_DATA_DIRECTORY/minidump
# run only one minidump directly
./server/gporca_test -d ../data/dxl/minidump/TVFRandom.mdpDebug the plans:
set client_min_messages='log';
set optimizer=on;
set optimizer_print_query=on; -- input query, and preprocessed query
set optimizer_print_plan=on; -- output final physical planTurn on the plan enumerations:
set client_min_messages='log';
set optimizer=on;
set optimizer_enumerate_plans=on;Pick a plan out of search space:
set optimizer=on;
set client_min_messages='log';
set optimizer_enumerate_plans=on;
set optimizer_plan_id=1;Debug optimizer stages:
set client_min_messages='log';
set optimizer=on;
set optimizer_print_optimization_stats=on;Debug the transformation rules details:
set client_min_messages='log';
set optimizer=on;
set optimizer_print_xform=on;set optimizer_print_memo_after_exploration=on;
set optimizer_print_memo_after_implementation=on;
set optimizer_print_memo_after_optimization=on;ROOT group is indicated as ROOT
select disable_xform('CXformJoinAssociativity');
select enable_xform('CXformJoinAssociativity');All the xform rules can be found from the class names under libgpopt/include/gpopt/xforms
CXformFactory::Instantiate lists all the activated xform rules (~130 rules)
// Entry point of optimizer
COptimizer::PdxlnOptimize
// DXL: Translate DXL into Query
CTranslatorDXLToExpr::PexprTranslateQuery
// Step 1: Pre-processor
CExpressionPreprocessor::PexprPreprocess
// Step 2-3-4: Optimization
COptimizer::PexprOptimize
// Individual rule transformation, all CXform* classes
CXformSplitGbAgg::Transform
// Enforceable Property
CEngine::FCheckEnfdProps
CPartitionPropagationSpec::AppendEnforcers
// DXL: Translate Plan back in DXL
CTranslatorExprToDXL::PdxlnTranslate- Fork GPORCA at: https://github.com/greenplum-db/gporca
- Pick an issue: https://github.com/greenplum-db/gporca/issues
- Send a Pull Request (PR)
Join GPORCA mailing list: gpdb-users+subscribe@greenplum.org
Contact: xzhang@pivotal.io
Q: What happen to SORT generated in Enforcement?
A: There is only ONE implementation of SORT, so, that's a physical operator and won't be pushed anywhere after enforcement.
Q: Why CXformResult has more than one alternatives?
A: Some rule (e.g. CXformExpandNAryJoinDP) can produce multiple choices after transformation
Q: How hard to make GPORCA adapt to a new host?
A: The hard part is on the MD translation. So far, GPORCA is very Postgres friendly.
Q: Why there is no separate SQL parser included in GPORCA?
A: GPORCA focused on relational algebra and let host handle the binding, view expansions, and permissions.
Q: How to add a new property like 'reliability' or 'dollar cost' to this optimizer?
A: That can be added to Property Enforcement as the Order/Distribution/Partition/Rewindability. For example, if people want to favor a more 'reliable' data source, they can add a CEnfdReliability class to cost that choice. It's an interesting combination of 'reliability' and 'dollar cost', usually, when it's more reliable is more expensive. It's an interesting balance to achieve.
Q: How long does the ctest run?
A: Around 5min on 2.8Ghz Intel i7. Running with ctest -j7 finished in < 2min.
Q: Is GPORCA multi-threaded?
A: It's multi-thread READY, but currently, we run with single thread. There are still few caveats (thread safe issues) to iron out before we can fully turn it on.
Orca: A Modular Query Optimizer Architecture for Big Data, SIGMOD 2014
Mohamed A. Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw, Zhongxian Gu, Entong Shen, George C. Caragea, Carlos Garcia-Alvarado, Foyzur Rahman, Michalis Petropoulos, Florian Waas, Sivaramakrishnan Narayanan, Konstantinos Krikellas, Rhonda Baldwin
Optimization of Common Table Expressions in MPP Database Systems, VLDB 2015
Amr El-Helw, Venkatesh Raghavan, Mohamed A. Soliman, George C. Caragea, Zhongxian Gu, Michalis Petropoulos.
Optimizing Queries over Partitioned Tables in MPP Systems, SIGMOD 2014
Lyublena Antova, Amr El-Helw, Mohamed Soliman, Zhongxian Gu, Michalis Petropoulos, Florian Waas
Reversing Statistics for Scalable Test Databases Generation, DBTest 2013
Entong Shen, Lyublena Antova
Total Operator State Recall - Cost-Effective Reuse of Results in Greenplum Database, ICDE Workshops 2013
George C. Caragea, Carlos Garcia-Alvarado, Michalis Petropoulos, Florian M. Waas
Testing the Accuracy of Query Optimizers, DBTest 2012
Zhongxian Gu, Mohamed A. Soliman, Florian M. Waas
Automatic Capture of Minimal, Portable, and Executable Bug Repros using AMPERe, DBTest 2012
Lyublena Antova, Konstantinos Krikellas, Florian M. Waas
Automatic Data Placement in MPP Databases, ICDE Workshops 2012
Carlos Garcia-Alvarado, Venkatesh Raghavan, Sivaramakrishnan Narayanan, Florian M. Waas