GPORCA OSS 101

Presenter: Xin Zhang (xzhang@pivotal.io)
Event: GPDB Meetup
Date: April 2017

Who, Where, When

Data Team at Palo Alto

On Jan 2016, Open Source GPORCA

Repository: http://github.com/greenplum-db/gporca
Blog: http://engineering.pivotal.io/post/gporca-open-source/

What, Why, How

WHAT is GPORCA architecture?

WHY using GPORCA (pros and cons)?

HOW to contribute to GPORCA?

At the End of the Talk, You CAN

Add a feature
Unit test it
Contribute back to community

GPORCA Overview

Evolution Timeline

Jul 2015: PQO: Pivotal Query Optimizer (Released in GPDB 4.3.5.0)
Jun 2014: ORCA: SIGMOD 2014 Paper

In Source Code

GPOPT: src/backend/gpopt, select gp_opt_version();
GPORCA = Greenplum ORCA = ...

Query Plan Example

Example Query

SELECT s.beer, s.price
FROM Bars b, Sells s
WHERE b.name = s.bar
AND b.city = 'San Francisco'

Distribution:

Bars is distributed randomly
Sells is distributed by bar

Query Execution Plan Structure

Slice Architecture:

Slice 1: Motion (Gather) on Master
Slice 2: Project, HashJoin on Segments
Slice 3: Filter, Scan on Segments

Performance Capabilities

GPORCA can optimize all 111 TPC-DS Queries (According to SIGMOD 2014 Paper)

Extensive SQL Surface Coverage

~60 Logical Operators
~50 Physical Operators
~40 Scalar Operators
~110 Transformation Rules

Key Features vs. Planner

Smarter partition elimination
Multi-level partitioning support (4.3.6)
Subquery unnesting
Common table expressions (CTE)
Join on castable data types
Improved join ordering (better join order)
Join-Aggregate reordering (push/pull AGG over join)
Sort order optimization (avoid sort multiple times)
Skew awareness (redistribution vs. broadcast)

Architecture

DXL API and Multi-Host Support

Host System
┌─────────┬─────────┬─────────┐
│ Parser  │ Catalog │Executor │
│    ↓    │    ↑    │    ↑    │
│ Q2DXL   │MD Provider│DXL2Plan│
└─────────┴─────────┴─────────┘
     │         │         │
     ↓         ↓         ↑
   DXL       DXL       DXL
  Query       MD       Plan
     │         │         │
     └─────────┼─────────┘
              GPORCA

GPORCA Architecture Components

Kernel Level

GPOS (Foundation Layer)
- File I/O
- Exception Handling
- Concurrency Control
- Memory Management

Optimizer Tools Layer

Search Engine & Job Scheduler
Memo Structure
Optimizer Components:
- Operators
- Xforms (Transformation Rules)
- Cost Model
- Property Enforcers
- Cardinality Model
- Metadata Cache

Optimization Steps in Orca

[Step 0] Pre-Process: e.g., predicates pushdown
[Step 1] Exploration: All equivalent logical plans
[Step 2] Statistics Derivation: histograms
[Step 3] Implementation: Logical to Physical
[Step 4] MPP Optimization: Enforcing and Costing

Step 1 Exploration: Memoization

Memo Structure:

Group: Container of equivalent expressions
Group Expression: Operator that has other groups as its children

Example:

Inner Join(T1.a = T2.b)
├── Get(T1)
└── Get(T2)

Memo Structure:
GROUP 0: [0: Inner Join [1,2]]
GROUP 1: [0: Get (T1) [ ]]  
GROUP 2: [0: Get (T2) [ ]]

Step 1 Exploration: Transformation

After applying join commutativity:

GROUP 0: [0: Inner Join [1,2], 1: Inner Join [2,1]]
GROUP 1: [0: Get (T1) [ ]]
GROUP 2: [0: Get (T2) [ ]]

Step 2 Statistics Derivation

Process:

ORCA requests required statistics from back-end system
MD Accessor retrieves statistics from catalog via MD Provider
Statistics are cached in MD Cache
Histograms are derived for join columns

Step 3 Implementation

Logical to Physical Transformation:

Inner Join (T1.a=T2.b) [1,2]  →  Inner Hash Join (T1.a=T2.b) [1,2]
├── GROUP 1                       ├── GROUP 1  
└── GROUP 2                       └── GROUP 2

Step 4 MPP Optimization

Three Sub-phases:

Requirement: Distribution and Order requirements
Costing: Cost calculation for different alternatives
Enforcement: Property enforcing (Sort, Gather, Redistribute)

Example Enforcement:

Original Plan:
Inner Hash Join
├── Scan(T1)
└── Redistribute(T2.b)
    └── Scan(T2)

After Enforcement:
GatherMerge(T1.a)
└── Sort(T1.a)
    └── Inner Hash Join
        ├── Scan(T1)
        └── Redistribute(T2.b)
            └── Scan(T2)

Development

Walkthrough: Split Groupby Aggregate

Goal: Split an aggregate into a pair of local and global aggregate.

Schema:

CREATE TABLE foo (a int, b int, c int) distributed by (a);

Query:

SELECT sum(c) FROM foo GROUP BY b

Strategy:

Do local aggregation on segments
Then global aggregation on master

Transformation Visualization

Before:                    After:
GpAgg (b)          →      GpAgg (b)
├── Sum(c)                ├── Sum(c)
└── Get(foo)              └── GpAgg (b)
                              ├── Sum(c)  
                              └── Get(foo)

Implementation: CXformSplitGbAgg

File Locations:

Header: ~/orca/libgpopt/include/gpopt/xforms
Source: ~/orca/libgpopt/src/xforms

Transformation Trigger Components:

Pattern matching
Pre-condition check

Pattern Definition:

GPOS_NEW(pmp) CExpression(
    pmp,
    // logical aggregate operator
    GPOS_NEW(pmp) CLogicalGbAgg(pmp), 
    // relational child
    GPOS_NEW(pmp) CExpression(pmp, GPOS_NEW(pmp) CPatternLeaf(pmp)),
    // scalar project list
    GPOS_NEW(pmp) CExpression(pmp, GPOS_NEW(pmp) CPatternTree(pmp))
);

Pre-condition Check:

// Compatibility function for splitting aggregates
virtual BOOL FCompatible(CXform::EXformId exfid) {
    return (CXform::ExfSplitGbAgg != exfid);
}

Purpose: Do not fire this rule on a logical operator produced by the same rule (Avoid Infinite Recursion)

The Actual Transformation:

void Transform(
    CXformContext *pxfctxt, // update
    CXformResult *pxfres,   // output  
    CExpression *pexpr      // input
) const;

Register Transformation:

void CXformFactory::Instantiate() {
    ...
    Add(GPOS_NEW(m_pmp) CXformSplitGbAgg(m_pmp));
    ...
}

Code Base

Top Level Structure

.
├── cmake
├── concourse  
├── data
├── libgpdbcost
├── libgpopt
├── libgpos
├── libnaucrates
├── patches
├── scripts
└── server

libgpos: Foundation Layer

Purpose: Memory, task scheduler, exception handling, unit-test framework

libgpos
├── include
└── src
    ├── common
    ├── error
    ├── io
    ├── memory
    ├── net
    ├── string
    ├── sync
    ├── task
    └── test

libnaucrates: Data Exchange Layer

Purpose: DXL, metadata, statistics, traceflags

libnaucrates
├── include
│   └── naucrates
│       ├── base
│       ├── dxl
│       │   ├── operators
│       │   ├── parser
│       │   └── xml
│       ├── md
│       ├── statistics
│       └── traceflags
└── src
    ├── base
    ├── md
    ├── operators
    ├── parser
    ├── statistics
    └── xml

libgpopt: Optimizer Core

Purpose: Engine, metadata cache, minidump, operators, memo, xform rules

libgpopt
├── include
│   └── gpopt
└── src
    ├── base
    ├── engine
    ├── eval
    ├── mdcache
    ├── metadata
    ├── minidump
    ├── operators
    ├── optimizer
    ├── search
    ├── translate
    └── xforms

libgpdbcost: Cost Model

Purpose: Cost model implementation

libgpdbcost
├── CMakeLists.txt
├── include
│   └── gpdbcost
└── src
    ├── CCostModelGPDB.cpp
    ├── CCostModelGPDBLegacy.cpp
    ├── CCostModelParamsGPDB.cpp
    ├── CCostModelParamsGPDBLegacy.cpp
    └── ICostModel.cpp

server: Unit Tests

Purpose: Unit tests server

server
├── include
└── src
    ├── startup
    └── unittest
        ├── dxl
        │   ├── base
        │   └── statistics
        └── gpopt
            ├── base
            ├── cost
            ├── csq
            ├── engine
            ├── eval
            ├── mdcache
            ├── metadata
            ├── minidump
            ├── operators
            ├── search
            ├── translate
            └── xforms

data: Test Data

Purpose: All the test data

data
├── dxl
│   ├── cost
│   ├── csq_tests
│   ├── expressiontests
│   ├── indexjoin
│   ├── metadata
│   ├── minidump
│   │   ├── CArrayExpansionTest
│   │   ├── CJoinOrderDPTest
│   │   ├── CPhysicalParallelUnionAllTest
│   │   ├── CPruneColumnsTest
│   │   └── sql
│   ├── multilevel-partitioning
│   ├── parse_tests
│   ├── plstmt
│   ├── query
│   ├── search
│   ├── statistics
│   ├── tpcds
│   ├── tpcds-partitioned
│   ├── tpch
│   └── tpch-partitioned

Build

Dependencies

GP-XERCES: https://github.com/greenplum-db/gp-xerces
CMake 3.0+

CMake Build Process

mkdir build
cd build
cmake ../
make && make install

Concourse Pipeline

CI/CD: https://ci.orca.pivotalci.info/teams/main/pipelines/gporca

Test

Test GPORCA (<2min)

# run all unit tests
ctest

# run all unit tests in parallel with 7 threads  
ctest -j7

# run only one unit test called CAggTest
./server/gporca_test -U CAggTest

Test with GPDB OSS in Docker

Follow instructions from: https://github.com/d/bug-free-fortnight

It's very useful to verify installcheck-good locally with latest GPORCA changes.

Debug

GPORCA Traceflags and GPDB GUC

GPORCA relies on Traceflags to change runtime behavior: Traceflag.h
Exposed in GPDB as GUC (Grand Unified Configuration): guc_gp.c

-- turn on GPORCA
set optimizer=on;

-- print input query (GPORCA TF 101000)
set optimizer_print_query=on;

Minidump: DXL Document

Turn on the minidump:

set client_min_messages='log';
set optimizer=on;
set optimizer_enable_constant_expression_evaluation=off;
set optimizer_enumerate_plans=on;
set optimizer_minidump=always;

Run a query: GPORCA creates a *.mdp file in the $MASTER_DATA_DIRECTORY/minidump

# run only one minidump directly
./server/gporca_test -d ../data/dxl/minidump/TVFRandom.mdp

Input Query, Output Plan

Debug the plans:

set client_min_messages='log';
set optimizer=on;
set optimizer_print_query=on; -- input query, and preprocessed query
set optimizer_print_plan=on;  -- output final physical plan

Plan Enumeration

Turn on the plan enumerations:

set client_min_messages='log';
set optimizer=on;
set optimizer_enumerate_plans=on;

Pick a plan out of search space:

set optimizer=on;
set client_min_messages='log';
set optimizer_enumerate_plans=on;
set optimizer_plan_id=1;

Optimization Stats and Xform Rules

Debug optimizer stages:

set client_min_messages='log';
set optimizer=on;
set optimizer_print_optimization_stats=on;

Debug the transformation rules details:

set client_min_messages='log';
set optimizer=on;
set optimizer_print_xform=on;

MEMO Groups

set optimizer_print_memo_after_exploration=on;
set optimizer_print_memo_after_implementation=on;
set optimizer_print_memo_after_optimization=on;

ROOT group is indicated as ROOT

Way to Disable Xform Rules

select disable_xform('CXformJoinAssociativity');
select enable_xform('CXformJoinAssociativity');

All the xform rules can be found from the class names under libgpopt/include/gpopt/xforms

CXformFactory::Instantiate lists all the activated xform rules (~130 rules)

Useful Breakpoints

// Entry point of optimizer
COptimizer::PdxlnOptimize

// DXL: Translate DXL into Query
CTranslatorDXLToExpr::PexprTranslateQuery

// Step 1: Pre-processor
CExpressionPreprocessor::PexprPreprocess

// Step 2-3-4: Optimization
COptimizer::PexprOptimize

// Individual rule transformation, all CXform* classes
CXformSplitGbAgg::Transform

// Enforceable Property
CEngine::FCheckEnfdProps
CPartitionPropagationSpec::AppendEnforcers

// DXL: Translate Plan back in DXL
CTranslatorExprToDXL::PdxlnTranslate

Contribute

1-2-3

Fork GPORCA at: https://github.com/greenplum-db/gporca
Pick an issue: https://github.com/greenplum-db/gporca/issues
Send a Pull Request (PR)

Join GPORCA mailing list: gpdb-users+subscribe@greenplum.org
Contact: xzhang@pivotal.io

FAQ

Q: What happen to SORT generated in Enforcement?
A: There is only ONE implementation of SORT, so, that's a physical operator and won't be pushed anywhere after enforcement.

Q: Why CXformResult has more than one alternatives?
A: Some rule (e.g. CXformExpandNAryJoinDP) can produce multiple choices after transformation

Q: How hard to make GPORCA adapt to a new host?
A: The hard part is on the MD translation. So far, GPORCA is very Postgres friendly.

Q: Why there is no separate SQL parser included in GPORCA?
A: GPORCA focused on relational algebra and let host handle the binding, view expansions, and permissions.

Q: How to add a new property like 'reliability' or 'dollar cost' to this optimizer?
A: That can be added to Property Enforcement as the Order/Distribution/Partition/Rewindability. For example, if people want to favor a more 'reliable' data source, they can add a CEnfdReliability class to cost that choice. It's an interesting combination of 'reliability' and 'dollar cost', usually, when it's more reliable is more expensive. It's an interesting balance to achieve.

Q: How long does the ctest run?
A: Around 5min on 2.8Ghz Intel i7. Running with ctest -j7 finished in < 2min.

Q: Is GPORCA multi-threaded?
A: It's multi-thread READY, but currently, we run with single thread. There are still few caveats (thread safe issues) to iron out before we can fully turn it on.

Publications

Orca: A Modular Query Optimizer Architecture for Big Data, SIGMOD 2014
Mohamed A. Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw, Zhongxian Gu, Entong Shen, George C. Caragea, Carlos Garcia-Alvarado, Foyzur Rahman, Michalis Petropoulos, Florian Waas, Sivaramakrishnan Narayanan, Konstantinos Krikellas, Rhonda Baldwin

Optimization of Common Table Expressions in MPP Database Systems, VLDB 2015
Amr El-Helw, Venkatesh Raghavan, Mohamed A. Soliman, George C. Caragea, Zhongxian Gu, Michalis Petropoulos.

Optimizing Queries over Partitioned Tables in MPP Systems, SIGMOD 2014
Lyublena Antova, Amr El-Helw, Mohamed Soliman, Zhongxian Gu, Michalis Petropoulos, Florian Waas

Reversing Statistics for Scalable Test Databases Generation, DBTest 2013
Entong Shen, Lyublena Antova

Total Operator State Recall - Cost-Effective Reuse of Results in Greenplum Database, ICDE Workshops 2013
George C. Caragea, Carlos Garcia-Alvarado, Michalis Petropoulos, Florian M. Waas

Testing the Accuracy of Query Optimizers, DBTest 2012
Zhongxian Gu, Mohamed A. Soliman, Florian M. Waas

Automatic Capture of Minimal, Portable, and Executable Bug Repros using AMPERe, DBTest 2012
Lyublena Antova, Konstantinos Krikellas, Florian M. Waas

Automatic Data Placement in MPP Databases, ICDE Workshops 2012
Carlos Garcia-Alvarado, Venkatesh Raghavan, Sivaramakrishnan Narayanan, Florian M. Waas

Uh oh!

GPORCA OSS 101

GPORCA OSS 101

Table of Contents

Who, Where, When

What, Why, How

At the End of the Talk, You CAN

GPORCA Overview

Evolution Timeline

In Source Code

Query Plan Example

Example Query

Query Execution Plan Structure

Performance Capabilities

Extensive SQL Surface Coverage

Key Features vs. Planner

Architecture

DXL API and Multi-Host Support

GPORCA Architecture Components

Kernel Level

Optimizer Tools Layer

Optimization Steps in Orca

Step 1 Exploration: Memoization

Step 1 Exploration: Transformation

Step 2 Statistics Derivation

Step 3 Implementation

Step 4 MPP Optimization

Development

Walkthrough: Split Groupby Aggregate

Transformation Visualization

Implementation: CXformSplitGbAgg

Code Base

Top Level Structure

libgpos: Foundation Layer

libnaucrates: Data Exchange Layer

libgpopt: Optimizer Core

libgpdbcost: Cost Model

server: Unit Tests

data: Test Data

Build

Dependencies

CMake Build Process

Concourse Pipeline

Test

Test GPORCA (<2min)

Test with GPDB OSS in Docker

Debug

GPORCA Traceflags and GPDB GUC

Minidump: DXL Document

Input Query, Output Plan

Plan Enumeration

Optimization Stats and Xform Rules

MEMO Groups

Way to Disable Xform Rules

Useful Breakpoints

Contribute

1-2-3

FAQ

Publications

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally