# Access Plan Comparison

This section is intended to establish a structured approach to access plan analysis. The inherit nature of an SQL (CBO) access plan is derived from an acyclic tree structure. Each node in the tree structure, denotes a data access operator, instructing the underlying engine how best to access data from the database. Representing access plan is particularly useful since this allows efficient traversal of the access plan, where in the most crucial data access operators span from the bottom up of the tree (SQL costs start from the children operators,  and culminate at the root operator). 

Past literature covers this modelling aspect to some degree:

***
Plan Selection based on Query Clustering - http://www.vldb.org/conf/2002/S06P02.pdf

Through a tool acroynmed as 'PLASTIC' (PLan Selection Through Incremental Clustering), optimizer access plan have been successfully modelled using acyclic tree. The authors, also go on to establish a similarity check technique, called SIMCHECK:

The SIMCHECK algorithm, whose pseudocode is shown in Figure 5, takes as input two query feature vectors and
outputs a boolean value indicating whether or not they are similar.   The algorithm operates  in  two  phases,  “Feature Vector Comparisons” and “Mapping Tables”.  In the first phase, the feature vectors are compared for equality on the number of tables, the sum of the table degrees, and the sum of  the  join  index  and  predicate  counts.   Only  if  there  is equality on all these structural features is the second phase invoked, otherwise the queries are deemed to be dis-similar. The  equality check is  done first  in  order to  identify dissimilar queries as early and as simply as possible.  For example, it is obvious that if the number of tables in the two
queries do not match, then their plans will also necessarily have to be different.  Such structural feature checks are used as an effective mechanism for stopping unproductive matching at an early stage.

<div style="width:image width px; font-size:80%; text-align:center;"><img src='Images/simcheck.png' alt="alternate text" width="width" height="height" style="padding-bottom:0.5em;" /><b>Simcheck Pseudocode</b></div>

In the Mapping Tables phase, we attempt to establish the closest possible one-to-one correspondence between the tables of the two queries. The tables are mapped to each other in order to check whether it is possible for the optimizer to use similar plans for accessing the mapped tables. The first step in this process is to determine the sets of compatible tables.  For every possible pair of compatible tables, SIMCHECK checks whether their original and (estimated) effective sizes are comparable through the use of a distance function.  If the outcome of the distance computations is less than a threshold value which is an algorithmic parameter, the queries are said to be similar. The notion of compatibility and the distance function are elucidated below.

__Table Compatibility__

We define two tables to be compatible if the degrees, join index counts and predicate counts are the same for both tables. The rationale for this notion of compatibility is explained below. Let  us  first  consider predicate counts. The predicate count for table in Figure 4(a) is  (2,1) since there are
two SARGable predicates and one non-SARGable predicate. Similarly, for table in Figure 4(b), the predicate count is (1, 2), and by our definition the tables are not compatible.  This makes intuitive sense when viewed in light
of the fact that if a predicate on a table is not SARGable, an optimizer cannot use an index to access that table. Thus, plans can change considerably even if the two queries differ on only a single table with respect to this criteria. A similar and stronger argument holds for join index counts. If indexes are available for a join predicate in one query and not in the other, it is very likely that the plans for the two queries will not match. This is because if both the attributes in a join predicate are indexed and the selectivities of the tables are high then it is possible to choose a plan involving an index join.  Similarly, if one of the attributes is indexed then the optimizer may choose to index on one table and fetch (table scan) on the other.
Note  that  even  if  the  join  index  counts  and  predicate counts for two queries match, the plans chosen by the optimizer may differ as there are other statistical factors such as the table sizes that affect plan choices. These factors are captured in the distance function discussed next.

__Query Distance Function__

After  compatible tables  are  identified,  SIMCHECK  tries to establish valid one-to-one mappings between the sets of compatible tables. These mappings are then compared using their original and estimated effective sizes, through a distance function dist(T1, T2), where T1 and T2 are the tables whose distance is to be computed.
***

In [15]:
# pandas
import pandas as pd
print('pandas: %s' % pd.__version__)
# numpy
import numpy as np
print('numpy: %s' % np.__version__)
# matplotlib
import matplotlib.pyplot as plt
# sklearn
import sklearn as sk
from sklearn import preprocessing
from sklearn.metrics.pairwise import euclidean_distances
#
# AnyTree
from anytree import Node, RenderTree, PostOrderIter

pandas: 0.23.4
numpy: 1.15.2


### Configuration Cell

Tweak parametric changes from this cell to influence outcome of experiment

In [16]:
# Experiment Config
tpcds='TPCDS1' # Schema upon which to operate test
test_split=.2
y_labels = ['COST',
            'CARDINALITY',
            'BYTES',
            'CPU_COST',
            'IO_COST',
            'TEMP_SPACE',
            'TIME']
black_list = ['TIMESTAMP',
              'SQL_ID',
              'OPERATION',
              'OPTIONS',
              'OBJECT_NAME',
              'OBJECT_OWNER',
              'PARTITION_STOP',
              'PARTITION_START'] # Columns which will be ignored during type conversion, and later used for aggregation
nrows = 10000

### Read data from file into pandas dataframes

In [17]:
# Root path
root_dir = 'C:/Users/gabriel.sammut/University/Data_ICS5200/Schedule/' + tpcds
# root_dir = 'D:/Projects/Datagenerated_ICS5200/Schedule/' + tpcds

rep_vsql_plan_path = root_dir + '/rep_vsql_plan.csv'
#rep_vsql_plan_path = root_dir + '/rep_vsql_plan.csv'

dtype={'COST':'int64',
       'CARDINALITY':'int64',
       'BYTES':'int64',
       'CPU_COST':'int64',
       'IO_COST':'int64',
       'TEMP_SPACE':'int64',
       'TIME':'int64',
       'OPERATION':'str',
       'OBJECT_NAME':'str'}
rep_vsql_plan_df = pd.read_csv(rep_vsql_plan_path, nrows=nrows, dtype=dtype)
print(rep_vsql_plan_df.head())
#
def prettify_header(headers):
    """
    Cleans header list from unwated character strings
    """
    header_list = []
    [header_list.append(header.replace("(","").replace(")","").replace("'","").replace(",","")) for header in headers]
    return header_list
#
rep_vsql_plan_df.columns = prettify_header(rep_vsql_plan_df.columns.values)
print('------------------------------------------')
print(rep_vsql_plan_df.columns)

    ('DBID',)    ('SQL_ID',)  ('PLAN_HASH_VALUE',)  ('ID',)    ('OPERATION',)  \
0  2634225673  dxv968j0352kb             103598129        0  SELECT STATEMENT   
1  2634225673  dxv968j0352kb             103598129        1              SORT   
2  2634225673  dxv968j0352kb             103598129        2    PX COORDINATOR   
3  2634225673  dxv968j0352kb             103598129        3           PX SEND   
4  2634225673  dxv968j0352kb             103598129        4              SORT   

  ('OPTIONS',) ('OBJECT_NODE',)  ('OBJECT#',) ('OBJECT_OWNER',)  \
0          NaN              NaN           NaN               NaN   
1     GROUP BY              NaN           NaN               NaN   
2          NaN              NaN           NaN               NaN   
3  QC (RANDOM)           :Q1001           NaN               SYS   
4     GROUP BY           :Q1001           NaN               NaN   

  ('OBJECT_NAME',)     ...     ('ACCESS_PREDICATES',) ('FILTER_PREDICATES',)  \
0              NaN     ...    

### Read outlier data from file into pandas dataframes and concatenate

In [18]:
#
# CSV Outlier Paths
outlier_hints_q5_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/hints/output/query_5.csv'
outlier_hints_q10_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/hints/output/query_10.csv'
outlier_hints_q14_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/hints/output/query_14.csv'
outlier_hints_q18_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/hints/output/query_18.csv'
outlier_hints_q22_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/hints/output/query_22.csv'
outlier_hints_q27_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/hints/output/query_27.csv'
outlier_hints_q35_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/hints/output/query_35.csv'
outlier_hints_q36_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/hints/output/query_36.csv'
outlier_hints_q51_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/hints/output/query_51.csv'
outlier_hints_q67_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/hints/output/query_67.csv'
outlier_hints_q70_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/hints/output/query_70.csv'
outlier_hints_q77_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/hints/output/query_77.csv'
outlier_hints_q80_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/hints/output/query_80.csv'
outlier_hints_q86_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/hints/output/query_86.csv'
#
outlier_predicates_q5_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/predicates/output/query_5.csv'
outlier_predicates_q10_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/predicates/output/query_10.csv'
outlier_predicates_q14_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/predicates/output/query_14.csv'
outlier_predicates_q18_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/predicates/output/query_18.csv'
outlier_predicates_q22_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/predicates/output/query_22.csv'
outlier_predicates_q27_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/predicates/output/query_27.csv'
outlier_predicates_q35_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/predicates/output/query_35.csv'
outlier_predicates_q36_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/predicates/output/query_36.csv'
outlier_predicates_q51_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/predicates/output/query_51.csv'
outlier_predicates_q67_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/predicates/output/query_67.csv'
outlier_predicates_q70_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/predicates/output/query_70.csv'
outlier_predicates_q77_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/predicates/output/query_77.csv'
outlier_predicates_q80_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/predicates/output/query_80.csv'
outlier_predicates_q86_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/predicates/output/query_86.csv'
#
outlier_rownum_q5_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/rownum/output/query_5.csv'
outlier_rownum_q10_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/rownum/output/query_10.csv'
outlier_rownum_q14_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/rownum/output/query_14.csv'
outlier_rownum_q18_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/rownum/output/query_18.csv'
outlier_rownum_q22_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/rownum/output/query_22.csv'
outlier_rownum_q27_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/rownum/output/query_27.csv'
outlier_rownum_q35_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/rownum/output/query_35.csv'
outlier_rownum_q36_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/rownum/output/query_36.csv'
outlier_rownum_q51_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/rownum/output/query_51.csv'
outlier_rownum_q67_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/rownum/output/query_67.csv'
outlier_rownum_q70_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/rownum/output/query_70.csv'
outlier_rownum_q77_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/rownum/output/query_77.csv'
outlier_rownum_q80_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/rownum/output/query_80.csv'
outlier_rownum_q86_path = 'C:/Users/gabriel.sammut/University/ICS5200/src/sql/Runtime/TPC-DS/' + tpcds + '/Variants/rownum/output/query_86.csv'
#
# Read CSV Paths
outlier_hints_q5_df = pd.read_csv(outlier_hints_q5_path,dtype=str)
outlier_hints_q10_df = pd.read_csv(outlier_hints_q10_path,dtype=str)
outlier_hints_q14_df = pd.read_csv(outlier_hints_q14_path,dtype=str)
outlier_hints_q18_df = pd.read_csv(outlier_hints_q18_path,dtype=str)
outlier_hints_q22_df = pd.read_csv(outlier_hints_q22_path,dtype=str)
outlier_hints_q27_df = pd.read_csv(outlier_hints_q27_path,dtype=str)
outlier_hints_q35_df = pd.read_csv(outlier_hints_q35_path,dtype=str)
outlier_hints_q36_df = pd.read_csv(outlier_hints_q36_path,dtype=str)
outlier_hints_q51_df = pd.read_csv(outlier_hints_q51_path,dtype=str)
outlier_hints_q67_df = pd.read_csv(outlier_hints_q67_path,dtype=str)
outlier_hints_q70_df = pd.read_csv(outlier_hints_q70_path,dtype=str)
outlier_hints_q77_df = pd.read_csv(outlier_hints_q77_path,dtype=str)
outlier_hints_q80_df = pd.read_csv(outlier_hints_q80_path,dtype=str)
outlier_hints_q86_df = pd.read_csv(outlier_hints_q86_path,dtype=str)
#
outlier_predicates_q5_df = pd.read_csv(outlier_predicates_q5_path,dtype=str)
outlier_predicates_q10_df = pd.read_csv(outlier_predicates_q10_path,dtype=str)
outlier_predicates_q14_df = pd.read_csv(outlier_predicates_q14_path,dtype=str)
outlier_predicates_q18_df = pd.read_csv(outlier_predicates_q18_path,dtype=str)
outlier_predicates_q22_df = pd.read_csv(outlier_predicates_q22_path,dtype=str)
outlier_predicates_q27_df = pd.read_csv(outlier_predicates_q27_path,dtype=str)
outlier_predicates_q35_df = pd.read_csv(outlier_predicates_q35_path,dtype=str)
outlier_predicates_q36_df = pd.read_csv(outlier_predicates_q36_path,dtype=str)
outlier_predicates_q51_df = pd.read_csv(outlier_predicates_q51_path,dtype=str)
outlier_predicates_q67_df = pd.read_csv(outlier_predicates_q67_path,dtype=str)
outlier_predicates_q70_df = pd.read_csv(outlier_predicates_q70_path,dtype=str)
outlier_predicates_q77_df = pd.read_csv(outlier_predicates_q77_path,dtype=str)
outlier_predicates_q80_df = pd.read_csv(outlier_predicates_q80_path,dtype=str)
outlier_predicates_q86_df = pd.read_csv(outlier_predicates_q86_path,dtype=str)
#
outlier_rownum_q5_df = pd.read_csv(outlier_rownum_q5_path,dtype=str)
outlier_rownum_q10_df = pd.read_csv(outlier_rownum_q10_path,dtype=str)
outlier_rownum_q14_df = pd.read_csv(outlier_rownum_q14_path,dtype=str)
outlier_rownum_q18_df = pd.read_csv(outlier_rownum_q18_path,dtype=str)
outlier_rownum_q22_df = pd.read_csv(outlier_rownum_q22_path,dtype=str)
outlier_rownum_q27_df = pd.read_csv(outlier_rownum_q27_path,dtype=str)
outlier_rownum_q35_df = pd.read_csv(outlier_rownum_q35_path,dtype=str)
outlier_rownum_q36_df = pd.read_csv(outlier_rownum_q36_path,dtype=str)
outlier_rownum_q51_df = pd.read_csv(outlier_rownum_q51_path,dtype=str)
outlier_rownum_q67_df = pd.read_csv(outlier_rownum_q67_path,dtype=str)
outlier_rownum_q70_df = pd.read_csv(outlier_rownum_q70_path,dtype=str)
outlier_rownum_q77_df = pd.read_csv(outlier_rownum_q77_path,dtype=str)
outlier_rownum_q80_df = pd.read_csv(outlier_rownum_q80_path,dtype=str)
outlier_rownum_q86_df = pd.read_csv(outlier_rownum_q86_path,dtype=str)
#
# Merge dataframes into a single pandas matrix
df_outliers = pd.concat([outlier_hints_q5_df, outlier_hints_q10_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_hints_q14_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_hints_q18_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_hints_q22_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_hints_q27_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_hints_q35_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_hints_q36_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_hints_q51_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_hints_q67_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_hints_q70_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_hints_q77_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_hints_q80_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_hints_q86_df], sort=False)
#
df_outliers = pd.concat([df_outliers, outlier_predicates_q5_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_predicates_q10_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_predicates_q14_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_predicates_q18_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_predicates_q22_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_predicates_q27_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_predicates_q35_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_predicates_q36_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_predicates_q51_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_predicates_q67_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_predicates_q70_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_predicates_q77_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_predicates_q80_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_predicates_q86_df], sort=False)
#
df_outliers = pd.concat([df_outliers, outlier_rownum_q5_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_rownum_q10_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_rownum_q14_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_rownum_q18_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_rownum_q22_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_rownum_q27_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_rownum_q35_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_rownum_q36_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_rownum_q51_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_rownum_q67_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_rownum_q70_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_rownum_q77_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_rownum_q80_df], sort=False)
df_outliers = pd.concat([df_outliers, outlier_rownum_q86_df], sort=False)   
#
print(df_outliers.shape)
print(df_outliers.head())
print('------------------------------------------')
print(df_outliers.columns)

(1456, 35)
  PLAN_ID            TIMESTAMP REMARKS         OPERATION          OPTIONS  \
0   12354  11/20/2018 08:23:55     NaN  SELECT STATEMENT              NaN   
1   12354  11/20/2018 08:23:55     NaN             COUNT          STOPKEY   
2   12354  11/20/2018 08:23:55     NaN              VIEW              NaN   
3   12354  11/20/2018 08:23:55     NaN              SORT  GROUP BY ROLLUP   
4   12354  11/20/2018 08:23:55     NaN              VIEW              NaN   

  OBJECT_NODE OBJECT_OWNER OBJECT_NAME                OBJECT_ALIAS  \
0         NaN          NaN         NaN                         NaN   
1         NaN          NaN         NaN                         NaN   
2         NaN       TPCDS1         NaN  from$_subquery$_018@SEL$11   
3         NaN          NaN         NaN                         NaN   
4         NaN       TPCDS1         NaN                    X@SEL$12   

  OBJECT_INSTANCE     ...      \
0             NaN     ...       
1             NaN     ...       
2     

### Dealing with empty values

In [19]:
def get_na_columns(df, headers):
    """
    Return columns which consist of NAN values
    """
    na_list = []
    for head in headers:
        if df[head].isnull().values.any():
            na_list.append(head)
    return na_list
#
print('N/A Columns\n')
print('\nREP_VSQL_PLAN Features ' + str(len(rep_vsql_plan_df.columns)) + ': ' + str(get_na_columns(df=rep_vsql_plan_df,headers=rep_vsql_plan_df.columns)) + "\n")
print('\nDF_OUTLIERS Features ' + str(len(df_outliers.columns)) + ': ' + str(get_na_columns(df=df_outliers,headers=df_outliers.columns)) + "\n")
#
def fill_na(df):
    """
    Replaces NA columns with 0s
    """
    return df.fillna(0)
#
# Populating NaN values with amount '0'
df = fill_na(df=rep_vsql_plan_df)
df_outliers = fill_na(df=df_outliers)

N/A Columns


REP_VSQL_PLAN Features 39: ['OPTIONS', 'OBJECT_NODE', 'OBJECT#', 'OBJECT_OWNER', 'OBJECT_NAME', 'OBJECT_ALIAS', 'OBJECT_TYPE', 'OPTIMIZER', 'PARENT_ID', 'COST', 'CARDINALITY', 'OTHER_TAG', 'PARTITION_START', 'PARTITION_STOP', 'PARTITION_ID', 'OTHER', 'DISTRIBUTION', 'IO_COST', 'ACCESS_PREDICATES', 'FILTER_PREDICATES', 'PROJECTION', 'TIME', 'QBLOCK_NAME', 'REMARKS', 'OTHER_XML']


DF_OUTLIERS Features 35: ['REMARKS', 'OPTIONS', 'OBJECT_NODE', 'OBJECT_OWNER', 'OBJECT_NAME', 'OBJECT_ALIAS', 'OBJECT_INSTANCE', 'OBJECT_TYPE', 'OPTIMIZER', 'SEARCH_COLUMNS', 'PARENT_ID', 'COST', 'CARDINALITY', 'BYTES', 'OTHER_TAG', 'PARTITION_START', 'PARTITION_STOP', 'PARTITION_ID', 'OTHER', 'OTHER_XML', 'DISTRIBUTION', 'CPU_COST', 'IO_COST', 'TEMP_SPACE', 'ACCESS_PREDICATES', 'FILTER_PREDICATES', 'PROJECTION', 'TIME', 'QBLOCK_NAME']



### Type conversion

Each column is converted into a column of type values which are Integer64.

In [20]:
def handle_numeric_overflows(x):
    """
    Accepts a dataframe column, and 
    """
    try:
        #df = df.astype('int64')
        x1 = pd.DataFrame([x],dtype='int64')
    except ValueError:
        x = 9223372036854775807 # Max int size
    return x
#
for col in df.columns:
    try:
        if col in black_list:
            continue
        df[col] = df[col].apply(handle_numeric_overflows)
        df[col].astype('int64',inplace=True)
    except:
        df.drop(columns=col, inplace=True)
        print('Dropped column [' + col + ']')
#
print('-------------------------------------------------------------')
#
for col in df_outliers.columns:
    try:
        if col in black_list:
            continue
        df_outliers[col] = df_outliers[col].astype('int64')
    except OverflowError:
        #
        # Handles numeric overflow conversions by replacing such values with max value inside the dataset.
        df_outliers[col] = df_outliers[col].apply(handle_numeric_overflows)
        df_outliers[col] = df_outliers[col].astype('int64')
    except Exception as e:
        df_outliers.drop(columns=col, inplace=True)
        print('Dropped column [' + col + ']')
print(df.columns)
print(df_outliers.columns)

-------------------------------------------------------------
Dropped column [OBJECT_ALIAS]
Dropped column [OBJECT_TYPE]
Dropped column [OPTIMIZER]
Dropped column [OTHER_XML]
Dropped column [ACCESS_PREDICATES]
Dropped column [FILTER_PREDICATES]
Dropped column [PROJECTION]
Dropped column [QBLOCK_NAME]
Index(['DBID', 'SQL_ID', 'PLAN_HASH_VALUE', 'ID', 'OPERATION', 'OPTIONS',
       'OBJECT_NODE', 'OBJECT#', 'OBJECT_OWNER', 'OBJECT_NAME', 'OBJECT_ALIAS',
       'OBJECT_TYPE', 'OPTIMIZER', 'PARENT_ID', 'DEPTH', 'POSITION',
       'SEARCH_COLUMNS', 'COST', 'CARDINALITY', 'BYTES', 'OTHER_TAG',
       'PARTITION_START', 'PARTITION_STOP', 'PARTITION_ID', 'OTHER',
       'DISTRIBUTION', 'CPU_COST', 'IO_COST', 'TEMP_SPACE',
       'ACCESS_PREDICATES', 'FILTER_PREDICATES', 'PROJECTION', 'TIME',
       'QBLOCK_NAME', 'REMARKS', 'TIMESTAMP', 'OTHER_XML', 'CON_DBID',
       'CON_ID'],
      dtype='object')
Index(['PLAN_ID', 'TIMESTAMP', 'REMARKS', 'OPERATION', 'OPTIONS',
       'OBJECT_NODE', 'OBJEC

### Feature Selection

In this step, redundant features are dropped. Features are considered redundant if exhibit a standard devaition of 0 (meaning no change in value).

In [21]:
def drop_flatline_columns(df):
    columns = df.columns
    flatline_features = []
    for i in range(len(columns)):
        try:
            #
            if columns[i] in black_list:
                continue
            #
            std = df[columns[i]].std()
            if std == 0:
                flatline_features.append(columns[i])
        except:
            pass
    #
    #print('Features which are considered flatline:\n')
    #for col in flatline_features:
    #    print(col)
    print('\nShape before changes: [' + str(df.shape) + ']')
    df = df.drop(columns=flatline_features)
    print('Shape after changes: [' + str(df.shape) + ']')
    print('Dropped a total [' + str(len(flatline_features)) + ']')
    return df
#
df = drop_flatline_columns(df=df)
df_outliers = drop_flatline_columns(df=df_outliers)
#
print('\nAfter flatline column drop:')
print(df.shape)
print(df.columns)
#
print('--------------------------------------------------------')
print('\nAfter outlier flatline column drop:')
print(df_outliers.shape)
print(df_outliers.columns)


Shape before changes: [(10000, 39)]
Shape after changes: [(10000, 30)]
Dropped a total [9]

Shape before changes: [(1456, 27)]
Shape after changes: [(1456, 21)]
Dropped a total [6]

After flatline column drop:
(10000, 30)
Index(['SQL_ID', 'PLAN_HASH_VALUE', 'ID', 'OPERATION', 'OPTIONS',
       'OBJECT_NODE', 'OBJECT#', 'OBJECT_OWNER', 'OBJECT_NAME', 'OBJECT_ALIAS',
       'OBJECT_TYPE', 'OPTIMIZER', 'PARENT_ID', 'DEPTH', 'POSITION',
       'SEARCH_COLUMNS', 'COST', 'CARDINALITY', 'BYTES', 'OTHER_TAG',
       'PARTITION_START', 'PARTITION_STOP', 'DISTRIBUTION', 'CPU_COST',
       'IO_COST', 'TEMP_SPACE', 'TIME', 'QBLOCK_NAME', 'TIMESTAMP',
       'OTHER_XML'],
      dtype='object')
--------------------------------------------------------

After outlier flatline column drop:
(1456, 21)
Index(['PLAN_ID', 'TIMESTAMP', 'OPERATION', 'OPTIONS', 'OBJECT_OWNER',
       'OBJECT_NAME', 'OBJECT_INSTANCE', 'SEARCH_COLUMNS', 'ID', 'PARENT_ID',
       'DEPTH', 'POSITION', 'COST', 'CARDINALITY', 'BYT

### Scaling columns

This section attempts to process a number of data columns through a MinMax Scaler. This is done, to normalize data on a similar scaler, particularly before comparing column measurements using a euclidean based measure. The following columns will be targetted:

* CARDINALITY
* BYTES
* PARTITION_START
* PARTITION_STOP
* CPU_COST
* IO_COST
* TEMP_SPACE
* TIME

In [22]:
scaler = preprocessing.MinMaxScaler()
scaled_columns = ['CARDINALITY',
                'BYTES',
                'PARTITION_START',
                'PARTITION_STOP',
                'CPU_COST',
                'IO_COST',
                'TEMP_SPACE',
                'TIME']
print(df['PARTITION_START'].iloc[0])
df[scaled_columns] = scaler.fit_transform(df[scaled_columns])
print(df['PARTITION_START'].iloc[0])
print("Minimal Vector Points: " + str(scaler.data_min_))
print("Maximal Vector Points: " + str(scaler.data_max_))
#
print('\nAfter scaled column transformation:')
print(df.shape)
print(df.columns)
#
print('--------------------------------------------------------')
print('\nAfter outlier scaled column transformation:')
print(df_outliers.shape)
print(df_outliers.columns)

0.0
0.0
Minimal Vector Points: [0.000e+00 2.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 1.156e+06
 0.000e+00]
Maximal Vector Points: [2.85716704e+16 9.22337204e+18 0.00000000e+00 0.00000000e+00
 9.22337204e+18 7.78425143e+09 9.22337204e+18 1.58287675e+08]

After scaled column transformation:
(10000, 30)
Index(['SQL_ID', 'PLAN_HASH_VALUE', 'ID', 'OPERATION', 'OPTIONS',
       'OBJECT_NODE', 'OBJECT#', 'OBJECT_OWNER', 'OBJECT_NAME', 'OBJECT_ALIAS',
       'OBJECT_TYPE', 'OPTIMIZER', 'PARENT_ID', 'DEPTH', 'POSITION',
       'SEARCH_COLUMNS', 'COST', 'CARDINALITY', 'BYTES', 'OTHER_TAG',
       'PARTITION_START', 'PARTITION_STOP', 'DISTRIBUTION', 'CPU_COST',
       'IO_COST', 'TEMP_SPACE', 'TIME', 'QBLOCK_NAME', 'TIMESTAMP',
       'OTHER_XML'],
      dtype='object')
--------------------------------------------------------

After outlier scaled column transformation:
(1456, 21)
Index(['PLAN_ID', 'TIMESTAMP', 'OPERATION', 'OPTIONS', 'OBJECT_OWNER',
       'OBJECT_NAME', 'OBJECT_INSTANCE'

### Adding Grouping Column

An extra column is added to allow access plans to be isolated per instance

In [23]:
#
# Adds a columns per SQL_ID, PLAN_HASH_VALUE grouping, which can be used to group instances together
def add_grouping_column(df, column_identifier):
    """
    Receives a pandas dataframe, and adds a new column which allows dataframe to be aggregated per 
    SQL_ID, PLAN_HASH_VALUE combination.
    
    :param: df                - Pandas Dataframe
    :param: column_identifier - String denoting matrix column to group by
    
    :return: Pandas Dataframe, with added column    
    """
    print('Shape before transformation: ' + str(df.shape))
    new_grouping_col = []
    counter = 0
    last_sql_id = df[column_identifier].iloc(0) # Starts with first SQL_ID
    for index, row in df.iterrows():
        if column_identifier == 'SQL_ID':
            if last_sql_id != row.SQL_ID:
                last_sql_id = row.SQL_ID
                counter += 1
        elif column_identifier == 'PLAN_ID':
            if last_sql_id != row.PLAN_ID:
                last_sql_id = row.PLAN_ID
                counter += 1
        else:
            raise ValueError('Column does not exist!')
        new_grouping_col.append(counter)
    #
    # Append list as new column
    new_col = pd.Series(new_grouping_col)
    df['PLAN_INSTANCE'] = new_col.values
    print('Shape after transformation: ' + str(df.shape))
    return df
#
df = add_grouping_column(df=df,column_identifier='SQL_ID')
df_outliers = add_grouping_column(df=df_outliers,column_identifier='PLAN_ID')

Shape before transformation: (10000, 30)
Shape after transformation: (10000, 31)
Shape before transformation: (1456, 21)
Shape after transformation: (1456, 22)


### Tree Formatting

Constructs the tree plan structure

In [24]:
class PlanTreeModeller:
    """
    This class simulates an access plan in the form of a tree structure
    """
    #
    @staticmethod
    def __create_node(node_name, parent=None):
        """
        Builds a node which will be added to the tree. If the parent is 'None', it is assumed that this
        node will be used as the root/parent Node.
        
        :param: node_name - String specifying node name.
        :param: parent    - Parent node specifying parent node name.
        
        :return: anytree object
        """
        if node_name is None:
            raise ValueError('Node name was not specified!')
        #
        if parent is None:
            node = Node(node_name)
        else:
            node = Node(node_name, parent=parent)
        #
        return node
    #
    @staticmethod
    def build_tree(df):
        """
        This method receives a pandas dataframe, and converts it into a searchable python tree
        
        :param: df - Pandas Dataframe, pertaining to input access plan
        
        :return: Dictionary object, consisting of node objects (which are linked in a tree fashion)
        """
        parent_node = None
        node_dict = {}
        for index, row in df.iterrows():
            #
            # Build Node and add to parent
            row_id = int(row['ID'])
            parent_id = int(row['PARENT_ID'])
            #
            if row_id == 0:
                node = PlanTreeModeller.__create_node(node_name=row_id)
            else:
                parent_node = node_dict[parent_id]
                node = PlanTreeModeller.__create_node(node_name=row_id, parent=parent_node)
            node_dict[row_id] = node
        #
        return node_dict # Dictionary consisting of tree nodes
    #
    @staticmethod
    def __retrieve_plan_details(df, node_name):
        """
        Accepts a dataframe, and the node_name. Retrieves features pertaining to the row id in the access plan
        
        :param: df - Dataframe consisting of access plan features
        :param: id - String id denoting which row to retrieve from the parameter dataframe
        
        :return: Dictionary consisting of access plan attributes
        """
        operation = str(df[df['ID'] == node_name]['OPERATION'].iloc[0])
        options = str(df[df['ID'] == node_name]['OPTIONS'].iloc[0])
        object_name = str(df[df['ID'] == node_name]['OBJECT_NAME'].iloc[0])
        cardinality = int(df[df['ID'] == node_name]['CARDINALITY'].iloc[0])
        bytess = int(df[df['ID'] == node_name]['BYTES'].iloc[0])
        partition_delta = int(df[df['ID'] == node_name]['PARTITION_STOP'].iloc[0]) - int(df[df['ID'] == node_name]['PARTITION_START'].iloc[0])
        cpu_cost = int(df[df['ID'] == node_name]['CPU_COST'].iloc[0])
        io_cost = int(df[df['ID'] == node_name]['IO_COST'].iloc[0])
        temp_space = int(df[df['ID'] == node_name]['TEMP_SPACE'].iloc[0])
        time = int(df[df['ID'] == node_name]['TIME'].iloc[0]) 
        #
        return {'OPERATION':operation,
                'OPTIONS':options,
                'OBJECT_NAME':object_name,
                'CARDINALITY':cardinality,
                'BYTES':bytess,
                'PARTITION_DELTA':partition_delta,
                'CPU_COST':cpu_cost,
                'IO_COST':io_cost,
                'TEMP_SPACE':temp_space,
                'TIME':time}
    #
    @staticmethod
    def __tree_node_euclidean(tree_dict1, tree_dict2):
        """
        This method calculates the eucldiean distance between two vectors.
        
        :param: tree_dict1 - Dictionary denoting a single node within plan / tree 1
        :param: tree_dict2 - Dictionary denoting a single node within plan / tree 2
        
        :return: List denoting euclidean distance
        """
        tree_vector_1 = [tree_dict1['CARDINALITY'],
                         tree_dict1['BYTES'],
                         tree_dict1['PARTITION_DELTA'],
                         tree_dict1['CPU_COST'],
                         tree_dict1['IO_COST'],
                         tree_dict1['TEMP_SPACE'],
                         tree_dict1['TIME']]
        #
        tree_vector_2 = [tree_dict2['CARDINALITY'],
                         tree_dict2['BYTES'],
                         tree_dict2['PARTITION_DELTA'],
                         tree_dict2['CPU_COST'],
                         tree_dict2['IO_COST'],
                         tree_dict2['TEMP_SPACE'],
                         tree_dict2['TIME']]
        #
        euc_distance = euclidean_distances([tree_vector_1],[tree_vector_2])
        return euc_distance[0][0]
    #
    @staticmethod
    def render_tree(tree, df):
        """
        Renders Tree by printing to screen
        
        :param: tree - AnyTree object, representing tree modelled access plan
        :param: df   - Pandas dataframe representatnt of the access plan about to be rendered
        
        :return: None
        """
        for pre, fill, node in RenderTree(tree):
            #
            access_plan_dict = PlanTreeModeller.__retrieve_plan_details(df=df,
                                                                        node_name = node.name)
            #
            if access_plan_dict['OBJECT_NAME'] == '0':
                print("%s%s > %s" % (pre, node.name, access_plan_dict['OPERATION']))
            else:
                if access_plan_dict['OPTIONS'] == '0': 
                    print("%s%s > %s (%s)" % (pre, node.name, access_plan_dict['OPERATION'], access_plan_dict['OBJECT_NAME']))
                else:
                    print("%s%s > %s | %s (%s)" % (pre, node.name, access_plan_dict['OPERATION'], access_plan_dict['OPTIONS'], access_plan_dict['OBJECT_NAME']))
    #
    @staticmethod
    def __postorder(tree):
        """
        Accepts a tree, and iterates in post order fashion (left,right,root)
        
        :param: tree - Dictionary consisting of AnyTree Nodes
        
        :return: List consisting of tree traversal order
        """
        post_order_traversal = [node.name for node in PostOrderIter(tree[0])]
        return post_order_traversal
    # 
    @staticmethod
    def tree_compare(tree1, tree2, df1, df2):
        """
        Accepts two trees of type 'AnyTree', along with respective dataframe denoting each respective access
        path.
        
        :param: tree1 - Dictionary consisting of 'AnyTree' nodes, belonging to tree 1
        :param: tree2 - Dictionary consisting of 'AnyTree' nodes, belonging to tree 2
        :param: df1   - Pandas dataframe consisting of access plan instructions opted for by tree 1
        :param: df2   - Pandas dataframe consisting of access plan instructions opted for by tree 2
        
        :return: None
        """
        #
        # Retrieves traversal order for both trees
        post_order_traversal1 = PlanTreeModeller.__postorder(tree1)
        post_order_traversal2 = PlanTreeModeller.__postorder(tree2)
        #
        # Iterates over traversal order, until a change is encountered
        max_range = max(len(post_order_traversal1),len(post_order_traversal2))
        delta_flag = True
        euclidean_measure = []
        for i in range(0,max_range):
            #
            # This check avoids a list IndexError for scebarious when one plan is bigger than the others,
            # and consequently the number of node traversals is bigger than the other tree.
            if i >= len(post_order_traversal1) or i >= len(post_order_traversal2):
                break
            #
            id_1 = post_order_traversal1[i]
            id_2 = post_order_traversal2[i]
            #
            pd_tree1 = PlanTreeModeller.__retrieve_plan_details(df=df1, node_name=id_1)
            pd_tree2 = PlanTreeModeller.__retrieve_plan_details(df=df2, node_name=id_2)
            #
            if (pd_tree1['OPERATION'] != pd_tree2['OPERATION'] or pd_tree1['OBJECT_NAME'] != pd_tree2['OBJECT_NAME'] or pd_tree1['OPTIONS'] != pd_tree2['OPTIONS']) and delta_flag:
                print('Access Predicate Difference detected!')
                print('Tree 1 difference at node [' + str(id_1) + '] operator > ' + pd_tree1['OPERATION'] + '(' + pd_tree1['OPTIONS'] + ') on object [' + pd_tree1['OBJECT_NAME'] + ']')
                print('Tree 2 difference at node [' + str(id_2) + '] operator > ' + pd_tree2['OPERATION'] + '(' + pd_tree2['OPTIONS'] + ') on object [' + pd_tree2['OBJECT_NAME'] + ']')
                delta_flag = False
            #
            # Calculate Node Euclidean Measure
            euclidean_vector = PlanTreeModeller.__tree_node_euclidean(tree_dict1=pd_tree1,
                                                                      tree_dict2=pd_tree2)
            euclidean_measure.append(euclidean_vector)
        #
        if delta_flag:
            print('No plan differences detected.')
        #
        print('Total computed delta score [' + str(sum(euclidean_measure)) + ']')

### Captured REP_VSQL_PLANS plans

This section contains metrics pertaining to plans captured by the data capture tool

In [25]:
#
# Retrieve Unique set of PLAN_HASH_VALUES
np_sql_id, np_plan_hash_value, np_plan_instance = pd.unique(df['SQL_ID']),pd.unique(df['PLAN_HASH_VALUE']),pd.unique(df['PLAN_INSTANCE'])
print(np_sql_id)
print(type(np_sql_id))
print(np_plan_hash_value)
print(type(np_plan_hash_value))
print(np_plan_instance)
print(type(np_plan_instance))
print('-'*100)
#
# Iterate over each PLAN_HASH_VALUE, and retrieve PLAN subset                                                                                                                 
for plan_instance in np_plan_instance:
    #
    # Retrieve only a single instance of the plan (as annotated at beginning of experiment)
    df_temp_plan = df[df['PLAN_INSTANCE'] == plan_instance]
    #
    # This step ensures that only TPC-DS related queries are displayed
    tpc_check = df_temp_plan['OBJECT_OWNER'].tolist()
    if tpcds not in tpc_check:
        continue
    #
    # Discards plans with double entries - Due to the parallel nature of the throughput test for 
    # TPC-DS, multiple threads may execute the same query at the same time, resulting in sql access
    # plans with the same SQL_ID, same PLAN_HASH_VALUE, and same TIMESTAMP. Such occurances are skipped.
    df_temp_count = df_temp_plan[df_temp_plan['ID'] == 0]
    if df_temp_count.shape[0] != 1:
        continue
    #
    # Sorts by ID ascending - This clause may be redundant due to the natural order of the data capture tool
    df_temp_plan = df_temp_plan.sort_values(by='ID', ascending=True)
    #
    # Builds Tree
    tree = PlanTreeModeller.build_tree(df=df_temp_plan)
    #
    # Renders Tree
    print('SQL_ID [' + str(df_temp_plan['SQL_ID'].iloc[0]) + '] with PLAN_HASH_VALUE [' + str(df_temp_plan['PLAN_HASH_VALUE'].iloc[0]) + ']\n')
    PlanTreeModeller.render_tree(tree=tree[0], df=df_temp_plan) # Tree rendederer uses root node and traverses downwards
    print('-'*100)

['dxv968j0352kb' '0aq14dznn91rg' '4q1rzhn63sgpt' 'g0b4snpj74cv5'
 '03ggjrmy0wa1w' '93n8wp5a8xyxn' 'gdh0vpjkmbtjw' '1r7b985mxqj71'
 '1jhyrdp21f2q6' 'bwsf4tnh0gcgv' '29mjaymwt5p6d' 'aggcw7yk1a7s6'
 '8t26unxsrxj72' '71uursqtj1j2m' '59zh1b9759nf4' '0f60bzgt9127c'
 '9vmcsc3prvxpa' '76ds5wxsv7f5t' '20bqsr6btd9x9' '84ntdbh48ctu9'
 'fx7sjdj48pn6z' '2wuhkcaz4uhs5' '2pz0tqbv91m11' '1u97hwfu7dcmz'
 '39nyc1pykjg41' '9ffht8tuysgx9' '7fbzhzg6ysu25' '14f5ngrj3cc5h'
 '4u268zn6r57tm' '6zcux9jb78w36' '2hnpu9m861609' '33vw5865cwyyn'
 'gvcadr1hm4arv' '1pv23p59mjs0v' '7vtvbg7s3zcyp' '6fvfqaw68q59b'
 '06dymzb481vnd' '6u001adh62r0f' 'cqrk7q7f6nk3d' 'fc0va0vju750z'
 'a6g97rawd3ggv' 'd6vqfmt62hypx' 'c1y1djkqjcd88' 'd7w1dugmzb9n9'
 '87gtj5jaq4a3t' '85cmvvurya34f' 'gh5w0gcyfaujs' '0ga8vk4nftz45'
 '13a9r2xkx1bxb' 'cjq93m442uprp' '7w116jy6ysqpm' 'au8ztarrm6vvs'
 '5g88vmdgd99f7' 'gkjkxbzzptg00' '4cgbvpjc134nu' 'a6fy23us0jz84'
 '64wct1dn3b771' 'fs2q92rb37cb0' 'cw6vxf0kbz3v1' 'dyk4dprp70d74'
 'd6vwqbw6r2ffk' '7hys3h7

                    └── 6 > PX SEND | HASH (:TQ10000)
                        └── 7 > SORT
                            └── 8 > PX BLOCK
                                └── 9 > INDEX | FAST FULL SCAN (CS_SHIP_ADDR_SK_INDEX)
----------------------------------------------------------------------------------------------------
SQL_ID [29mjaymwt5p6d] with PLAN_HASH_VALUE [778470866]

0 > SELECT STATEMENT
└── 1 > SORT
    └── 2 > PX COORDINATOR
        └── 3 > PX SEND | QC (RANDOM) (:TQ10001)
            └── 4 > SORT
                └── 5 > PX RECEIVE
                    └── 6 > PX SEND | HASH (:TQ10000)
                        └── 7 > SORT
                            └── 8 > PX BLOCK
                                └── 9 > INDEX | FAST FULL SCAN (CS_SHIP_MODE_SK_INDEX)
----------------------------------------------------------------------------------------------------
SQL_ID [aggcw7yk1a7s6] with PLAN_HASH_VALUE [2032279852]

0 > SELECT STATEMENT
└── 1 > SORT
    └── 2 > PX COORDINATOR
      

                            └── 8 > PX BLOCK
                                └── 9 > INDEX | FAST FULL SCAN (WS_SHIP_ADDR_SK_INDEX)
----------------------------------------------------------------------------------------------------
SQL_ID [4u268zn6r57tm] with PLAN_HASH_VALUE [2658544436]

0 > SELECT STATEMENT
└── 1 > SORT
    └── 2 > PX COORDINATOR
        └── 3 > PX SEND | QC (RANDOM) (:TQ10001)
            └── 4 > SORT
                └── 5 > PX RECEIVE
                    └── 6 > PX SEND | HASH (:TQ10000)
                        └── 7 > SORT
                            └── 8 > PX BLOCK
                                └── 9 > INDEX | FAST FULL SCAN (WS_SHIP_CDEMO_SK_INDEX)
----------------------------------------------------------------------------------------------------
SQL_ID [6zcux9jb78w36] with PLAN_HASH_VALUE [3740804340]

0 > SELECT STATEMENT
└── 1 > SORT
    └── 2 > PX COORDINATOR
        └── 3 > PX SEND | QC (RANDOM) (:TQ10001)
            └── 4 > SORT
                └── 5

        └── 3 > PX SEND | QC (RANDOM) (:TQ10000)
            └── 4 > SORT
                └── 5 > OPTIMIZER STATISTICS GATHERING
                    └── 6 > PX BLOCK
                        └── 7 > TABLE ACCESS | FULL (CATALOG_SALES)
----------------------------------------------------------------------------------------------------
SQL_ID [7w116jy6ysqpm] with PLAN_HASH_VALUE [3395708625]

0 > SELECT STATEMENT
└── 1 > SORT
    └── 2 > PX COORDINATOR
        └── 3 > PX SEND | QC (RANDOM) (:TQ10001)
            └── 4 > SORT
                └── 5 > PX RECEIVE
                    └── 6 > PX SEND | HASH (:TQ10000)
                        └── 7 > SORT
                            └── 8 > PX BLOCK
                                └── 9 > INDEX | SAMPLE FAST FULL SCAN (CS_SHIP_DATE_SK_INDEX)
----------------------------------------------------------------------------------------------------
SQL_ID [au8ztarrm6vvs] with PLAN_HASH_VALUE [1397742550]

0 > SELECT STATEMENT
└── 1 > SORT
    └── 2 > PX

                └── 19 > HASH JOIN
                    ├── 20 > HASH JOIN
                    │   ├── 21 > HASH JOIN
                    │   │   ├── 22 > VIEW
                    │   │   │   └── 23 > TABLE ACCESS | FULL (SYS_TEMP_0FD9D69F3_141942F5)
                    │   │   └── 24 > VIEW
                    │   │       └── 25 > TABLE ACCESS | FULL (SYS_TEMP_0FD9D69F3_141942F5)
                    │   └── 26 > VIEW
                    │       └── 27 > TABLE ACCESS | FULL (SYS_TEMP_0FD9D69F3_141942F5)
                    └── 28 > VIEW
                        └── 29 > TABLE ACCESS | FULL (SYS_TEMP_0FD9D69F3_141942F5)
----------------------------------------------------------------------------------------------------
SQL_ID [4vcvqy7vvy5zm] with PLAN_HASH_VALUE [2045682895]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > VIEW
                └── 5 > WINDOW
                    └── 6 > SORT
                        └── 7 > HASH JOIN
            

    │       └── 4 > HASH JOIN
    │           ├── 5 > TABLE ACCESS | FULL (DATE_DIM)
    │           └── 6 > VIEW
    │               └── 7 > UNION-ALL
    │                   ├── 8 > TABLE ACCESS | FULL (WEB_SALES)
    │                   └── 9 > TABLE ACCESS | FULL (CATALOG_SALES)
    └── 10 > SORT
        └── 11 > HASH JOIN
            ├── 12 > TABLE ACCESS | FULL (DATE_DIM)
            └── 13 > HASH JOIN
                ├── 14 > HASH JOIN
                │   ├── 15 > TABLE ACCESS | FULL (DATE_DIM)
                │   └── 16 > VIEW
                │       └── 17 > TABLE ACCESS | FULL (SYS_TEMP_0FD9D6A00_141942F5)
                └── 18 > VIEW
                    └── 19 > TABLE ACCESS | FULL (SYS_TEMP_0FD9D6A00_141942F5)
----------------------------------------------------------------------------------------------------
SQL_ID [5nn31913tabu9] with PLAN_HASH_VALUE [1613917314]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > VIEW
          

                        │   │       │       └── 61 > TABLE ACCESS | FULL (SYS_TEMP_0FD9D6A2D_141942F5)
                        │   │       └── 62 > HASH JOIN
                        │   │           ├── 63 > TABLE ACCESS | FULL (ITEM)
                        │   │           └── 64 > HASH JOIN
                        │   │               ├── 65 > NESTED LOOPS
                        │   │               │   ├── 66 > NESTED LOOPS
                        │   │               │   │   ├── 67 > STATISTICS COLLECTOR
                        │   │               │   │   │   └── 68 > TABLE ACCESS | FULL (DATE_DIM)
                        │   │               │   │   └── 69 > INDEX | RANGE SCAN (SS_SOLD_DATE_SK_INDEX)
                        │   │               │   └── 70 > TABLE ACCESS | BY INDEX ROWID (STORE_SALES)
                        │   │               └── 71 > TABLE ACCESS | FULL (STORE_SALES)
                        │   └── 72 > VIEW
                        │       └── 73 > TABLE ACCESS | FUL

        ├── 65 > NESTED LOOPS
        │   ├── 66 > NESTED LOOPS
        │   │   ├── 67 > NESTED LOOPS
        │   │   │   ├── 68 > HASH JOIN
        │   │   │   │   ├── 69 > TABLE ACCESS | FULL (CUSTOMER_ADDRESS)
        │   │   │   │   └── 70 > HASH JOIN
        │   │   │   │       ├── 71 > TABLE ACCESS | FULL (DATE_DIM)
        │   │   │   │       └── 72 > TABLE ACCESS | FULL (STORE_SALES)
        │   │   │   └── 73 > INDEX | UNIQUE SCAN (SYS_C0021425)
        │   │   └── 74 > INDEX | UNIQUE SCAN (SYS_C0021402)
        │   └── 75 > TABLE ACCESS | BY INDEX ROWID (CUSTOMER_DEMOGRAPHICS)
        └── 76 > NESTED LOOPS
            ├── 77 > NESTED LOOPS
            │   ├── 78 > NESTED LOOPS
            │   │   ├── 79 > HASH JOIN
            │   │   │   ├── 80 > TABLE ACCESS | FULL (DATE_DIM)
            │   │   │   └── 81 > HASH JOIN
            │   │   │       ├── 82 > TABLE ACCESS | FULL (CUSTOMER_ADDRESS)
            │   │   │       └── 83 > TABLE ACCESS | FULL (STORE_SALES)
           

        │   │   │       │   └── 10 > BUFFER
        │   │   │       │       └── 11 > TABLE ACCESS | FULL (CUSTOMER_ADDRESS)
        │   │   │       └── 12 > FILTER
        │   │   │           └── 13 > SORT
        │   │   │               └── 14 > TABLE ACCESS | FULL (STORE_SALES)
        │   │   └── 15 > INDEX | UNIQUE SCAN (SYS_C0021402)
        │   └── 16 > TABLE ACCESS | BY INDEX ROWID (CUSTOMER_DEMOGRAPHICS)
        ├── 17 > NESTED LOOPS
        │   ├── 18 > HASH JOIN
        │   │   ├── 19 > HASH JOIN
        │   │   │   ├── 20 > TABLE ACCESS | FULL (DATE_DIM)
        │   │   │   └── 21 > HASH JOIN
        │   │   │       ├── 22 > TABLE ACCESS | FULL (CUSTOMER_ADDRESS)
        │   │   │       └── 23 > TABLE ACCESS | FULL (STORE_SALES)
        │   │   └── 24 > TABLE ACCESS | FULL (CUSTOMER_DEMOGRAPHICS)
        │   └── 25 > INDEX | UNIQUE SCAN (SYS_C0021425)
        ├── 26 > NESTED LOOPS
        │   ├── 27 > NESTED LOOPS
        │   │   ├── 28 > NESTED LOOPS
        │   │   │   ├──

│   └── 22 > TABLE ACCESS | FULL (STORE_SALES)
├── 23 > SORT
│   └── 24 > TABLE ACCESS | FULL (STORE_SALES)
├── 25 > SORT
│   └── 26 > TABLE ACCESS | FULL (STORE_SALES)
├── 27 > SORT
│   └── 28 > TABLE ACCESS | FULL (STORE_SALES)
├── 29 > SORT
│   └── 30 > TABLE ACCESS | FULL (STORE_SALES)
└── 31 > INDEX | UNIQUE SCAN (SYS_C0021417)
----------------------------------------------------------------------------------------------------
SQL_ID [as8sqm1c84n07] with PLAN_HASH_VALUE [1728224257]

0 > SELECT STATEMENT
└── 1 > TEMP TABLE TRANSFORMATION
    ├── 2 > LOAD AS SELECT
    │   └── 3 > UNION-ALL
    │       ├── 4 > HASH
    │       │   └── 5 > HASH JOIN
    │       │       ├── 6 > TABLE ACCESS | FULL (DATE_DIM)
    │       │       └── 7 > HASH JOIN
    │       │           ├── 8 > TABLE ACCESS | FULL (CUSTOMER)
    │       │           └── 9 > TABLE ACCESS | FULL (STORE_SALES)
    │       └── 10 > HASH
    │           └── 11 > HASH JOIN
    │               ├── 12 > TABLE ACCESS | FULL (DA

    │       └── 14 > HASH
    │           └── 15 > HASH JOIN
    │               ├── 16 > VIEW (VW_GBC_20)
    │               │   └── 17 > HASH
    │               │       └── 18 > NESTED LOOPS
    │               │           ├── 19 > NESTED LOOPS
    │               │           │   ├── 20 > TABLE ACCESS | FULL (DATE_DIM)
    │               │           │   └── 21 > INDEX | RANGE SCAN (WS_SOLD_DATE_SK_INDEX)
    │               │           └── 22 > TABLE ACCESS | BY INDEX ROWID (WEB_SALES)
    │               └── 23 > TABLE ACCESS | FULL (CUSTOMER)
    └── 24 > COUNT
        └── 25 > VIEW
            └── 26 > SORT
                └── 27 > HASH JOIN
                    ├── 28 > HASH JOIN
                    │   ├── 29 > HASH JOIN
                    │   │   ├── 30 > VIEW
                    │   │   │   └── 31 > TABLE ACCESS | FULL (SYS_TEMP_0FD9D69FF_141942F5)
                    │   │   └── 32 > VIEW
                    │   │       └── 33 > TABLE ACCESS | FULL (SYS_TEMP_0FD9D69FF_1419

                    │   │           │   │   │   └── 14 > TABLE ACCESS | FULL (DATE_DIM)
                    │   │           │   │   └── 15 > INDEX | RANGE SCAN (SS_SOLD_DATE_SK_INDEX)
                    │   │           │   └── 16 > TABLE ACCESS | BY INDEX ROWID (STORE_SALES)
                    │   │           └── 17 > TABLE ACCESS | FULL (STORE_SALES)
                    │   └── 18 > SORT
                    │       └── 19 > HASH JOIN
                    │           ├── 20 > TABLE ACCESS | FULL (CUSTOMER)
                    │           └── 21 > HASH JOIN
                    │               ├── 22 > NESTED LOOPS
                    │               │   ├── 23 > NESTED LOOPS
                    │               │   │   ├── 24 > STATISTICS COLLECTOR
                    │               │   │   │   └── 25 > TABLE ACCESS | FULL (DATE_DIM)
                    │               │   │   └── 26 > INDEX | RANGE SCAN (CS_SOLD_DATE_SK_INDEX)
                    │               │   └── 27 > TABLE ACC

                        │   │       ├── 59 > VIEW (VW_NSO_1)
                        │   │       │   └── 60 > VIEW
                        │   │       │       └── 61 > TABLE ACCESS | FULL (SYS_TEMP_0FD9D6A2D_141942F5)
                        │   │       └── 62 > HASH JOIN
                        │   │           ├── 63 > TABLE ACCESS | FULL (ITEM)
                        │   │           └── 64 > HASH JOIN
                        │   │               ├── 65 > NESTED LOOPS
                        │   │               │   ├── 66 > NESTED LOOPS
                        │   │               │   │   ├── 67 > STATISTICS COLLECTOR
                        │   │               │   │   │   └── 68 > TABLE ACCESS | FULL (DATE_DIM)
                        │   │               │   │   └── 69 > INDEX | RANGE SCAN (SS_SOLD_DATE_SK_INDEX)
                        │   │               │   └── 70 > TABLE ACCESS | BY INDEX ROWID (STORE_SALES)
                        │   │               └── 71 > TABLE ACCESS | FULL 

            └── 4 > NESTED LOOPS
                ├── 5 > HASH JOIN
                │   ├── 6 > TABLE ACCESS | FULL (STORE)
                │   └── 7 > NESTED LOOPS
                │       ├── 8 > NESTED LOOPS
                │       │   ├── 9 > NESTED LOOPS
                │       │   │   ├── 10 > TABLE ACCESS | FULL (DATE_DIM)
                │       │   │   └── 11 > TABLE ACCESS | BY INDEX ROWID BATCHED (STORE_RETURNS)
                │       │   │       └── 12 > INDEX | RANGE SCAN (SR_RETURNED_DATE_SK_INDEX)
                │       │   └── 13 > INDEX | UNIQUE SCAN (SYS_C0021467)
                │       └── 14 > TABLE ACCESS | BY INDEX ROWID (STORE_SALES)
                └── 15 > INDEX | UNIQUE SCAN (SYS_C0021405)
----------------------------------------------------------------------------------------------------
SQL_ID [7m8xtjmn5zv0g] with PLAN_HASH_VALUE [3513624734]

0 > SELECT STATEMENT
└── 1 > SORT
    └── 2 > CONCATENATION
        ├── 3 > NESTED LOOPS
        │   ├── 4 > NESTED

└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > HASH JOIN
                ├── 5 > HASH JOIN
                │   ├── 6 > VIEW
                │   │   └── 7 > WINDOW
                │   │       └── 8 > FILTER
                │   │           ├── 9 > HASH
                │   │           │   └── 10 > TABLE ACCESS | FULL (STORE_SALES)
                │   │           └── 11 > SORT
                │   │               └── 12 > TABLE ACCESS | BY INDEX ROWID (STORE_SALES)
                │   │                   └── 13 > INDEX | RANGE SCAN (SS_STORE_SK_INDEX)
                │   └── 14 > HASH JOIN
                │       ├── 15 > VIEW
                │       │   └── 16 > WINDOW
                │       │       └── 17 > FILTER
                │       │           ├── 18 > HASH
                │       │           │   └── 19 > TABLE ACCESS | FULL (STORE_SALES)
                │       │           └── 20 > SORT
                │       │               └── 21 > TABLE ACCESS | BY IND

### Captured Outlier Plans

This section contains metrics pertaining to outlier plans. There are three categories of captured outliers denoted below, each assigned a total of 14 queries

* Hint Enhanced Queries
* Predicate Enhanced Queries
* Rownum Stopkey Enhanced Queries

In [26]:
#
# Retrieve Unique set of PLAN_HASH_VALUES
np_outlier_plan_id, np_outlier_plan_instance = pd.unique(df_outliers['PLAN_ID']), pd.unique(df_outliers['PLAN_INSTANCE'])
print(np_outlier_plan_id)
print(type(np_outlier_plan_id))
print(np_outlier_plan_instance)
print(type(np_outlier_plan_instance))
print('-'*100)
#
# Iterate over each PLAN_HASH_VALUE, and retrieve PLAN subset                                                                                                                 
for plan_instance in np_outlier_plan_instance:
    #
    # Retrieve only a single instance of the plan (as annotated at beginning of experiment)
    df_temp_plan = df_outliers[df_outliers['PLAN_INSTANCE'] == plan_instance]
    #
    # This step ensures that only TPC-DS related queries are displayed
    tpc_check = df_temp_plan['OBJECT_OWNER'].tolist()
    if tpcds not in tpc_check:
        continue
    #
    # Discards plans with double entries - Due to the parallel nature of the throughput test for 
    # TPC-DS, multiple threads may execute the same query at the same time, resulting in sql access
    # plans with the same SQL_ID, same PLAN_HASH_VALUE, and same TIMESTAMP. Such occurances are skipped.
    df_temp_count = df_temp_plan[df_temp_plan['ID'] == 0]
    if df_temp_count.shape[0] != 1:
        continue
    #
    # Sorts by ID ascending - This clause may be redundant due to the natural order of the data capture tool
    df_temp_plan = df_temp_plan.sort_values(by='ID', ascending=True)
    #
    # Builds Tree
    tree = PlanTreeModeller.build_tree(df=df_temp_plan)
    #
    # Renders Tree
    print('PLAN_ID [' + str(df_temp_plan['PLAN_ID'].iloc[0]) + ']\n')
    PlanTreeModeller.render_tree(tree=tree[0], df=df_temp_plan) # Tree rendederer uses root node and traverses downwards
    print('-'*100)

[12354 12355 12358 12359 12360 12362 12363 12364 12366 12367 12368 12369
 12370 12371 12372 12373 12374 12375 12376 12377 12378 12379 12380 12381
 12382 12383 12384 12385 12386 12387 12388 12389 12390 12391 12392 12393
 12394 12395 12396 12397 12398 12399]
<class 'numpy.ndarray'>
[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42]
<class 'numpy.ndarray'>
----------------------------------------------------------------------------------------------------
PLAN_ID [12354]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > VIEW
                └── 5 > UNION-ALL
                    ├── 6 > HASH
                    │   └── 7 > NESTED LOOPS
                    │       ├── 8 > NESTED LOOPS
                    │       │   ├── 9 > HASH JOIN
                    │       │   │   ├── 10 > TABLE ACCESS | FULL (DATE_DIM)
                    │       │   │   └── 11 > VIEW
             

                        │   │               └── 69 > TABLE ACCESS | FULL (STORE_SALES)
                        │   └── 70 > VIEW
                        │       └── 71 > TABLE ACCESS | FULL (SYS_TEMP_0FD9F16E2_141942F5)
                        ├── 72 > FILTER
                        │   ├── 73 > HASH
                        │   │   └── 74 > HASH JOIN
                        │   │       ├── 75 > VIEW (VW_NSO_2)
                        │   │       │   └── 76 > VIEW
                        │   │       │       └── 77 > TABLE ACCESS | FULL (SYS_TEMP_0FD9F16E1_141942F5)
                        │   │       └── 78 > HASH JOIN
                        │   │           ├── 79 > TABLE ACCESS | FULL (ITEM)
                        │   │           └── 80 > HASH JOIN
                        │   │               ├── 81 > NESTED LOOPS
                        │   │               │   ├── 82 > NESTED LOOPS
                        │   │               │   │   ├── 83 > STATISTICS COLLECTOR
                     

                                        ├── 13 > TABLE ACCESS | FULL (DATE_DIM)
                                        └── 14 > TABLE ACCESS | FULL (STORE_SALES)
----------------------------------------------------------------------------------------------------
PLAN_ID [12368]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > WINDOW
                └── 5 > SORT
                    └── 6 > HASH JOIN
                        ├── 7 > VIEW (VW_NSO_1)
                        │   └── 8 > VIEW
                        │       └── 9 > WINDOW
                        │           └── 10 > HASH
                        │               └── 11 > HASH JOIN
                        │                   ├── 12 > TABLE ACCESS | FULL (STORE)
                        │                   └── 13 > NESTED LOOPS
                        │                       ├── 14 > NESTED LOOPS
                        │                       │   ├── 15 > TABLE ACCESS | FULL (DATE_DIM

                            │   └── 64 > TABLE ACCESS | BY INDEX ROWID (WEB_RETURNS)
                            │       └── 65 > INDEX | UNIQUE SCAN (SYS_C0021458)
                            └── 66 > TABLE ACCESS | FULL (WEB_RETURNS)
----------------------------------------------------------------------------------------------------
PLAN_ID [12371]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > WINDOW
                └── 5 > SORT
                    └── 6 > HASH JOIN
                        ├── 7 > TABLE ACCESS | FULL (ITEM)
                        └── 8 > HASH JOIN
                            ├── 9 > TABLE ACCESS | FULL (DATE_DIM)
                            └── 10 > TABLE ACCESS | FULL (WEB_SALES)
----------------------------------------------------------------------------------------------------
PLAN_ID [12372]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > VIEW
                └── 5 > UN

                            │               └── 95 > TABLE ACCESS | FULL (SYS_TEMP_0FD9F16F8_141942F5)
                            └── 96 > VIEW
                                └── 97 > TABLE ACCESS | FULL (SYS_TEMP_0FD9F16F9_141942F5)
----------------------------------------------------------------------------------------------------
PLAN_ID [12375]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > SORT
                └── 5 > HASH JOIN
                    ├── 6 > TABLE ACCESS | FULL (ITEM)
                    └── 7 > HASH JOIN
                        ├── 8 > TABLE ACCESS | FULL (CUSTOMER_DEMOGRAPHICS)
                        └── 9 > HASH JOIN
                            ├── 10 > HASH JOIN
                            │   ├── 11 > TABLE ACCESS | FULL (CUSTOMER_ADDRESS)
                            │   └── 12 > HASH JOIN
                            │       ├── 13 > TABLE ACCESS | FULL (CUSTOMER)
                            │       └── 14 > NEST

                    │   │   └── 26 > HASH
                    │   │       └── 27 > HASH JOIN
                    │   │           ├── 28 > TABLE ACCESS | FULL (DATE_DIM)
                    │   │           └── 29 > TABLE ACCESS | FULL (CATALOG_SALES)
                    │   └── 30 > BUFFER
                    │       └── 31 > VIEW
                    │           └── 32 > HASH
                    │               └── 33 > NESTED LOOPS
                    │                   ├── 34 > NESTED LOOPS
                    │                   │   ├── 35 > TABLE ACCESS | FULL (DATE_DIM)
                    │                   │   └── 36 > INDEX | RANGE SCAN (CR_RETURNED_DATE_SK_INDEX)
                    │                   └── 37 > TABLE ACCESS | BY INDEX ROWID (CATALOG_RETURNS)
                    └── 38 > HASH JOIN
                        ├── 39 > VIEW
                        │   └── 40 > HASH
                        │       └── 41 > NESTED LOOPS
                        │           ├── 42 > NES

└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > VIEW
                └── 5 > UNION-ALL
                    ├── 6 > HASH
                    │   └── 7 > NESTED LOOPS
                    │       ├── 8 > NESTED LOOPS
                    │       │   ├── 9 > NESTED LOOPS
                    │       │   │   ├── 10 > TABLE ACCESS | FULL (DATE_DIM)
                    │       │   │   └── 11 > VIEW
                    │       │   │       └── 12 > UNION-ALL
                    │       │   │           ├── 13 > TABLE ACCESS | BY INDEX ROWID BATCHED (STORE_SALES)
                    │       │   │           │   └── 14 > INDEX | RANGE SCAN (SS_SOLD_DATE_SK_INDEX)
                    │       │   │           └── 15 > TABLE ACCESS | BY INDEX ROWID BATCHED (STORE_RETURNS)
                    │       │   │               └── 16 > INDEX | RANGE SCAN (SR_RETURNED_DATE_SK_INDEX)
                    │       │   └── 17 > INDEX | UNIQUE SCAN (SYS_C0021425)
                    │       └── 

                            │   └── 94 > HASH JOIN
                            │       ├── 95 > VIEW (VW_NSO_3)
                            │       │   └── 96 > VIEW
                            │       │       └── 97 > TABLE ACCESS | FULL (SYS_TEMP_0FD9F170B_141942F5)
                            │       └── 98 > HASH JOIN
                            │           ├── 99 > TABLE ACCESS | FULL (ITEM)
                            │           └── 100 > HASH JOIN
                            │               ├── 101 > NESTED LOOPS
                            │               │   ├── 102 > NESTED LOOPS
                            │               │   │   ├── 103 > STATISTICS COLLECTOR
                            │               │   │   │   └── 104 > TABLE ACCESS | FULL (DATE_DIM)
                            │               │   │   └── 105 > INDEX | RANGE SCAN (WS_SOLD_DATE_SK_INDEX)
                            │               │   └── 106 > TABLE ACCESS | BY INDEX ROWID (WEB_SALES)
                 

            └── 4 > WINDOW
                └── 5 > SORT
                    └── 6 > HASH JOIN
                        ├── 7 > VIEW (VW_NSO_1)
                        │   └── 8 > VIEW
                        │       └── 9 > WINDOW
                        │           └── 10 > HASH
                        │               └── 11 > HASH JOIN
                        │                   ├── 12 > TABLE ACCESS | FULL (STORE)
                        │                   └── 13 > NESTED LOOPS
                        │                       ├── 14 > NESTED LOOPS
                        │                       │   ├── 15 > TABLE ACCESS | FULL (DATE_DIM)
                        │                       │   └── 16 > INDEX | RANGE SCAN (SS_SOLD_DATE_SK_INDEX)
                        │                       └── 17 > TABLE ACCESS | BY INDEX ROWID (STORE_SALES)
                        └── 18 > HASH JOIN
                            ├── 19 > TABLE ACCESS | FULL (STORE)
                            └── 20 > NE

### Access Plan / Tree Comparison (SAME PLAN COMPARISON)

This section tests / evaluates different plans being compared to one another. Two separate tests are carried out, as follows:

* Comparing the exact same outlier plans with each other. This test verifies that no unneccessary flagging is carried out by the implementation.
* Comparing the inlier plans with the respective TPC-DS outlier plan. This test ensures that access plans are appropriately flagged where inconsistencies are encountered.

In [27]:
#
# Comparing same exact plans                                                                                                                
for plan_instance in np_outlier_plan_instance:
    #
    # Retrieve only a single instance of the plan (as annotated at beginning of experiment)
    df_temp_plan = df_outliers[df_outliers['PLAN_INSTANCE'] == plan_instance]
    #
    # This step ensures that only TPC-DS related queries are displayed
    tpc_check = df_temp_plan['OBJECT_OWNER'].tolist()
    if tpcds not in tpc_check:
        continue
    #
    # Discards plans with double entries - Due to the parallel nature of the throughput test for 
    # TPC-DS, multiple threads may execute the same query at the same time, resulting in sql access
    # plans with the same SQL_ID, same PLAN_HASH_VALUE, and same TIMESTAMP. Such occurances are skipped.
    df_temp_count = df_temp_plan[df_temp_plan['ID'] == 0]
    if df_temp_count.shape[0] != 1:
        continue
    #
    # Sorts by ID ascending - This clause may be redundant due to the natural order of the data capture tool
    df_temp_plan = df_temp_plan.sort_values(by='ID', ascending=True)
    #
    # Builds Tree
    tree = PlanTreeModeller.build_tree(df=df_temp_plan)
    #
    # Renders Trees
    print('Tree 1 with PLAN_ID [' + str(df_temp_plan['PLAN_ID'].iloc[0]) + ']\n')
    PlanTreeModeller.render_tree(tree=tree[0], df=df_temp_plan) # Tree rendederer uses root node and traverses downwards
    print('\nTree 2 with PLAN_ID [' + str(df_temp_plan['PLAN_ID'].iloc[0]) + ']\n')
    PlanTreeModeller.render_tree(tree=tree[0], df=df_temp_plan) # Tree rendederer uses root node and traverses downwards
    #
    # Compares both plans
    print('\n')
    PlanTreeModeller.tree_compare(tree1=tree, 
                                  tree2=tree, 
                                  df1=df_temp_plan, 
                                  df2=df_temp_plan)
    print('-'*100)

Tree 1 with PLAN_ID [12354]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > VIEW
                └── 5 > UNION-ALL
                    ├── 6 > HASH
                    │   └── 7 > NESTED LOOPS
                    │       ├── 8 > NESTED LOOPS
                    │       │   ├── 9 > HASH JOIN
                    │       │   │   ├── 10 > TABLE ACCESS | FULL (DATE_DIM)
                    │       │   │   └── 11 > VIEW
                    │       │   │       └── 12 > UNION-ALL
                    │       │   │           ├── 13 > TABLE ACCESS | FULL (STORE_SALES)
                    │       │   │           └── 14 > TABLE ACCESS | FULL (STORE_RETURNS)
                    │       │   └── 15 > INDEX | UNIQUE SCAN (SYS_C0021425)
                    │       └── 16 > TABLE ACCESS | BY INDEX ROWID (STORE)
                    ├── 17 > HASH
                    │   └── 18 > HASH JOIN
                    │       ├── 19 > NESTED LOOPS
                    │  

        └── 50 > VIEW
            └── 51 > SORT
                └── 52 > VIEW
                    └── 53 > UNION-ALL
                        ├── 54 > FILTER
                        │   ├── 55 > HASH
                        │   │   └── 56 > HASH JOIN
                        │   │       ├── 57 > VIEW (VW_NSO_1)
                        │   │       │   └── 58 > VIEW
                        │   │       │       └── 59 > TABLE ACCESS | FULL (SYS_TEMP_0FD9F16E1_141942F5)
                        │   │       └── 60 > HASH JOIN
                        │   │           ├── 61 > TABLE ACCESS | FULL (ITEM)
                        │   │           └── 62 > HASH JOIN
                        │   │               ├── 63 > NESTED LOOPS
                        │   │               │   ├── 64 > NESTED LOOPS
                        │   │               │   │   ├── 65 > STATISTICS COLLECTOR
                        │   │               │   │   │   └── 66 > TABLE ACCESS | FULL (DATE_DIM)
                        │   

No plan differences detected.
Total computed delta score [56.083261120685236]
----------------------------------------------------------------------------------------------------
Tree 1 with PLAN_ID [12359]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > SORT
                └── 5 > HASH JOIN
                    ├── 6 > TABLE ACCESS | FULL (ITEM)
                    └── 7 > HASH JOIN
                        ├── 8 > TABLE ACCESS | FULL (CUSTOMER_DEMOGRAPHICS)
                        └── 9 > HASH JOIN
                            ├── 10 > TABLE ACCESS | FULL (DATE_DIM)
                            └── 11 > HASH JOIN
                                ├── 12 > HASH JOIN
                                │   ├── 13 > HASH JOIN
                                │   │   ├── 14 > TABLE ACCESS | FULL (CUSTOMER_ADDRESS)
                                │   │   └── 15 > TABLE ACCESS | FULL (CUSTOMER)
                                │   └── 16 > INDEX | FAST FU

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > WINDOW
                └── 5 > SORT
                    └── 6 > HASH JOIN
                        ├── 7 > TABLE ACCESS | FULL (ITEM)
                        └── 8 > HASH JOIN
                            ├── 9 > TABLE ACCESS | FULL (DATE_DIM)
                            └── 10 > HASH JOIN
                                ├── 11 > TABLE ACCESS | FULL (STORE)
                                └── 12 > TABLE ACCESS | FULL (STORE_SALES)


No plan differences detected.
Total computed delta score [0.0]
----------------------------------------------------------------------------------------------------
Tree 1 with PLAN_ID [12366]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > VIEW
                └── 5 > WINDOW
                    └── 6 > VIEW (VW_FOJ_0)
                        └── 7 > HASH JOIN
                            ├── 8 > VIEW
                       

                        │           └── 45 > INDEX | UNIQUE SCAN (SYS_C0021442)
                        └── 46 > VIEW
                            └── 47 > HASH
                                └── 48 > NESTED LOOPS
                                    ├── 49 > NESTED LOOPS
                                    │   ├── 50 > TABLE ACCESS | FULL (DATE_DIM)
                                    │   └── 51 > TABLE ACCESS | BY INDEX ROWID BATCHED (WEB_RETURNS)
                                    │       └── 52 > INDEX | RANGE SCAN (WR_RETURNED_DATE_SK_INDEX)
                                    └── 53 > INDEX | UNIQUE SCAN (SYS_C0021442)


No plan differences detected.
Total computed delta score [37.65685424949238]
----------------------------------------------------------------------------------------------------
Tree 1 with PLAN_ID [12370]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > VIEW
                └── 5 > UNION-ALL
                    ├── 6 

            └── 4 > VIEW
                └── 5 > UNION-ALL
                    ├── 6 > HASH
                    │   └── 7 > NESTED LOOPS
                    │       ├── 8 > NESTED LOOPS
                    │       │   ├── 9 > NESTED LOOPS
                    │       │   │   ├── 10 > TABLE ACCESS | FULL (DATE_DIM)
                    │       │   │   └── 11 > VIEW
                    │       │   │       └── 12 > UNION-ALL
                    │       │   │           ├── 13 > TABLE ACCESS | BY INDEX ROWID BATCHED (STORE_SALES)
                    │       │   │           │   └── 14 > INDEX | RANGE SCAN (SS_SOLD_DATE_SK_INDEX)
                    │       │   │           └── 15 > TABLE ACCESS | BY INDEX ROWID BATCHED (STORE_RETURNS)
                    │       │   │               └── 16 > INDEX | RANGE SCAN (SR_RETURNED_DATE_SK_INDEX)
                    │       │   └── 17 > INDEX | UNIQUE SCAN (SYS_C0021425)
                    │       └── 18 > TABLE ACCESS | BY INDEX ROWID (STORE)
         

No plan differences detected.
Total computed delta score [0.0]
----------------------------------------------------------------------------------------------------
Tree 1 with PLAN_ID [12374]

0 > SELECT STATEMENT
└── 1 > TEMP TABLE TRANSFORMATION
    ├── 2 > LOAD AS SELECT (SYS_TEMP_0FD9F16F8_141942F5)
    │   └── 3 > HASH JOIN
    │       ├── 4 > TABLE ACCESS | FULL (ITEM)
    │       └── 5 > VIEW
    │           └── 6 > INTERSECTION
    │               ├── 7 > INTERSECTION
    │               │   ├── 8 > SORT
    │               │   │   └── 9 > HASH JOIN
    │               │   │       ├── 10 > TABLE ACCESS | FULL (ITEM)
    │               │   │       └── 11 > HASH JOIN
    │               │   │           ├── 12 > TABLE ACCESS | FULL (DATE_DIM)
    │               │   │           └── 13 > TABLE ACCESS | FULL (STORE_SALES)
    │               │   └── 14 > SORT
    │               │       └── 15 > HASH JOIN
    │               │           ├── 16 > TABLE ACCESS | FULL (ITEM)
    │    

No plan differences detected.
Total computed delta score [215.84776310850236]
----------------------------------------------------------------------------------------------------
Tree 1 with PLAN_ID [12375]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > SORT
                └── 5 > HASH JOIN
                    ├── 6 > TABLE ACCESS | FULL (ITEM)
                    └── 7 > HASH JOIN
                        ├── 8 > TABLE ACCESS | FULL (CUSTOMER_DEMOGRAPHICS)
                        └── 9 > HASH JOIN
                            ├── 10 > HASH JOIN
                            │   ├── 11 > TABLE ACCESS | FULL (CUSTOMER_ADDRESS)
                            │   └── 12 > HASH JOIN
                            │       ├── 13 > TABLE ACCESS | FULL (CUSTOMER)
                            │       └── 14 > NESTED LOOPS
                            │           ├── 15 > NESTED LOOPS
                            │           │   ├── 16 > TABLE ACCESS | FULL (D

        └── 3 > SORT
            └── 4 > WINDOW
                └── 5 > SORT
                    └── 6 > HASH JOIN
                        ├── 7 > TABLE ACCESS | FULL (STORE)
                        └── 8 > HASH JOIN
                            ├── 9 > TABLE ACCESS | FULL (ITEM)
                            └── 10 > NESTED LOOPS
                                ├── 11 > NESTED LOOPS
                                │   ├── 12 > TABLE ACCESS | FULL (DATE_DIM)
                                │   └── 13 > INDEX | RANGE SCAN (SS_SOLD_DATE_SK_INDEX)
                                └── 14 > TABLE ACCESS | BY INDEX ROWID (STORE_SALES)

Tree 2 with PLAN_ID [12379]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > WINDOW
                └── 5 > SORT
                    └── 6 > HASH JOIN
                        ├── 7 > TABLE ACCESS | FULL (STORE)
                        └── 8 > HASH JOIN
                            ├── 9 > TABLE ACCESS | FULL (ITEM)
     

No plan differences detected.
Total computed delta score [0.0]
----------------------------------------------------------------------------------------------------
Tree 1 with PLAN_ID [12383]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > VIEW
                └── 5 > UNION-ALL
                    ├── 6 > HASH JOIN
                    │   ├── 7 > VIEW
                    │   │   └── 8 > HASH
                    │   │       └── 9 > HASH JOIN
                    │   │           ├── 10 > INDEX | FULL SCAN (SYS_C0021425)
                    │   │           └── 11 > NESTED LOOPS
                    │   │               ├── 12 > NESTED LOOPS
                    │   │               │   ├── 13 > TABLE ACCESS | FULL (DATE_DIM)
                    │   │               │   └── 14 > INDEX | RANGE SCAN (SS_SOLD_DATE_SK_INDEX)
                    │   │               └── 15 > TABLE ACCESS | BY INDEX ROWID (STORE_SALES)
                    │   └── 16 > VIEW


                    │       ├── 8 > NESTED LOOPS
                    │       │   ├── 9 > STATISTICS COLLECTOR
                    │       │   │   └── 10 > HASH JOIN
                    │       │   │       ├── 11 > NESTED LOOPS
                    │       │   │       │   ├── 12 > STATISTICS COLLECTOR
                    │       │   │       │   │   └── 13 > HASH JOIN
                    │       │   │       │   │       ├── 14 > HASH JOIN
                    │       │   │       │   │       │   ├── 15 > HASH JOIN
                    │       │   │       │   │       │   │   ├── 16 > NESTED LOOPS
                    │       │   │       │   │       │   │   │   ├── 17 > NESTED LOOPS
                    │       │   │       │   │       │   │   │   │   ├── 18 > STATISTICS COLLECTOR
                    │       │   │       │   │       │   │   │   │   │   └── 19 > TABLE ACCESS | FULL (ITEM)
                    │       │   │       │   │       │   │   │   │   └── 20 > INDEX | RANGE SCAN (SS_ITEM_SK_INDE

                    ├── 19 > HASH
                    │   └── 20 > HASH JOIN
                    │       ├── 21 > NESTED LOOPS
                    │       │   ├── 22 > TABLE ACCESS | FULL (DATE_DIM)
                    │       │   └── 23 > VIEW
                    │       │       └── 24 > UNION-ALL
                    │       │           ├── 25 > TABLE ACCESS | BY INDEX ROWID BATCHED (CATALOG_SALES)
                    │       │           │   └── 26 > INDEX | RANGE SCAN (CS_SOLD_DATE_SK_INDEX)
                    │       │           └── 27 > TABLE ACCESS | BY INDEX ROWID BATCHED (CATALOG_RETURNS)
                    │       │               └── 28 > INDEX | RANGE SCAN (CR_RETURNED_DATE_SK_INDEX)
                    │       └── 29 > TABLE ACCESS | FULL (CATALOG_PAGE)
                    └── 30 > HASH
                        └── 31 > NESTED LOOPS
                            ├── 32 > NESTED LOOPS
                            │   ├── 33 > NESTED LOOPS
                            │   │   ├── 

    │               │       └── 17 > HASH JOIN
    │               │           ├── 18 > TABLE ACCESS | FULL (ITEM)
    │               │           └── 19 > NESTED LOOPS
    │               │               ├── 20 > NESTED LOOPS
    │               │               │   ├── 21 > TABLE ACCESS | FULL (DATE_DIM)
    │               │               │   └── 22 > INDEX | RANGE SCAN (CS_SOLD_DATE_SK_INDEX)
    │               │               └── 23 > TABLE ACCESS | BY INDEX ROWID (CATALOG_SALES)
    │               └── 24 > SORT
    │                   └── 25 > HASH JOIN
    │                       ├── 26 > TABLE ACCESS | FULL (ITEM)
    │                       └── 27 > NESTED LOOPS
    │                           ├── 28 > NESTED LOOPS
    │                           │   ├── 29 > TABLE ACCESS | FULL (DATE_DIM)
    │                           │   └── 30 > INDEX | RANGE SCAN (WS_SOLD_DATE_SK_INDEX)
    │                           └── 31 > TABLE ACCESS | BY INDEX ROWID (WEB_SALES)
    ├── 32 > LOAD 

                └── 6 > HASH JOIN
                    ├── 7 > TABLE ACCESS | FULL (STORE)
                    └── 8 > HASH JOIN
                        ├── 9 > TABLE ACCESS | FULL (CUSTOMER_DEMOGRAPHICS)
                        └── 10 > NESTED LOOPS
                            ├── 11 > NESTED LOOPS
                            │   ├── 12 > TABLE ACCESS | FULL (DATE_DIM)
                            │   └── 13 > INDEX | RANGE SCAN (SS_SOLD_DATE_SK_INDEX)
                            └── 14 > TABLE ACCESS | BY INDEX ROWID (STORE_SALES)

Tree 2 with PLAN_ID [12391]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > HASH JOIN
                ├── 5 > TABLE ACCESS | FULL (ITEM)
                └── 6 > HASH JOIN
                    ├── 7 > TABLE ACCESS | FULL (STORE)
                    └── 8 > HASH JOIN
                        ├── 9 > TABLE ACCESS | FULL (CUSTOMER_DEMOGRAPHICS)
                        └── 10 > NESTED LOOPS
                            ├

                                    └── 12 > HASH JOIN
                                        ├── 13 > NESTED LOOPS
                                        │   ├── 14 > NESTED LOOPS
                                        │   │   ├── 15 > STATISTICS COLLECTOR
                                        │   │   │   └── 16 > TABLE ACCESS | FULL (DATE_DIM)
                                        │   │   └── 17 > INDEX | RANGE SCAN (SS_SOLD_DATE_SK_INDEX)
                                        │   └── 18 > TABLE ACCESS | BY INDEX ROWID (STORE_SALES)
                                        └── 19 > TABLE ACCESS | FULL (STORE_SALES)

Tree 2 with PLAN_ID [12395]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > VIEW
                └── 5 > WINDOW
                    └── 6 > VIEW
                        └── 7 > SORT
                            └── 8 > HASH JOIN
                                ├── 9 > TABLE ACCESS | FULL (STORE)
                        

                            │       └── 67 > INDEX | UNIQUE SCAN (SYS_C0021458)
                            └── 68 > TABLE ACCESS | FULL (WEB_RETURNS)

Tree 2 with PLAN_ID [12398]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > VIEW
                └── 5 > UNION-ALL
                    ├── 6 > HASH
                    │   └── 7 > HASH JOIN
                    │       ├── 8 > NESTED LOOPS
                    │       │   ├── 9 > STATISTICS COLLECTOR
                    │       │   │   └── 10 > HASH JOIN
                    │       │   │       ├── 11 > TABLE ACCESS | FULL (STORE)
                    │       │   │       └── 12 > HASH JOIN
                    │       │   │           ├── 13 > TABLE ACCESS | FULL (PROMOTION)
                    │       │   │           └── 14 > HASH JOIN
                    │       │   │               ├── 15 > TABLE ACCESS | FULL (ITEM)
                    │       │   │               └── 16 > HASH JOIN
            

### Access Plan / Tree Comparison (DIFFERENT PLAN COMPARISON)

This section tests / evaluates different plans being compared to one another. Two separate tests are carried out, as follows:

* Comparing the exact same outlier plans with each other. This test verifies that no unneccessary flagging is carried out by the implementation.
* Comparing the inlier plans with the respective TPC-DS outlier plan. This test ensures that access plans are appropriately flagged where inconsistencies are encountered.

In [28]:
outlier_category_quantity = int(len(np_outlier_plan_instance) / 3)
for i in range(outlier_category_quantity):
    #
    # Isolate type 1 outliers
    df_temp_plan1 = df_outliers[df_outliers['PLAN_ID'] == np_outlier_plan_id[i]]
    #
    # Sorts by ID ascending for type 1 outliers - This clause may be redundant due to the natural order of 
    # the data capture tool
    df_temp_plan1 = df_temp_plan1.sort_values(by='ID', ascending=True)
    #
    # Builds Tree 1
    tree1 = PlanTreeModeller.build_tree(df=df_temp_plan1)
    #
    # Isolate type 2 outliers
    comparison_index = int(i + outlier_category_quantity)
    df_temp_plan2 = df_outliers[df_outliers['PLAN_ID'] == (np_outlier_plan_id[comparison_index])]
    #
    # Sorts by ID ascending for type 2 outliers - This clause may be redundant due to the natural order of 
    # the data capture tool
    df_temp_plan2 = df_temp_plan2.sort_values(by='ID', ascending=True)
    #
    # Builds Tree 2
    tree2 = PlanTreeModeller.build_tree(df=df_temp_plan2)
    #
    # Renders Trees
    print('Tree 1 with PLAN_ID [' + str(df_temp_plan1['PLAN_ID'].iloc[0]) + ']\n')
    PlanTreeModeller.render_tree(tree=tree1[0], df=df_temp_plan1) # Tree rendederer uses root node and traverses downwards
    print('\nTree 2 with PLAN_ID [' + str(df_temp_plan2['PLAN_ID'].iloc[0]) + ']\n')
    PlanTreeModeller.render_tree(tree=tree2[0], df=df_temp_plan2) # Tree rendederer uses root node and traverses downwards
    #
    # Compares both plans
    print('\n')
    PlanTreeModeller.tree_compare(tree1=tree1, 
                                  tree2=tree2, 
                                  df1=df_temp_plan1, 
                                  df2=df_temp_plan2)
    print('-'*100)
    print('\n\n\n')

Tree 1 with PLAN_ID [12354]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > VIEW
                └── 5 > UNION-ALL
                    ├── 6 > HASH
                    │   └── 7 > NESTED LOOPS
                    │       ├── 8 > NESTED LOOPS
                    │       │   ├── 9 > HASH JOIN
                    │       │   │   ├── 10 > TABLE ACCESS | FULL (DATE_DIM)
                    │       │   │   └── 11 > VIEW
                    │       │   │       └── 12 > UNION-ALL
                    │       │   │           ├── 13 > TABLE ACCESS | FULL (STORE_SALES)
                    │       │   │           └── 14 > TABLE ACCESS | FULL (STORE_RETURNS)
                    │       │   └── 15 > INDEX | UNIQUE SCAN (SYS_C0021425)
                    │       └── 16 > TABLE ACCESS | BY INDEX ROWID (STORE)
                    ├── 17 > HASH
                    │   └── 18 > HASH JOIN
                    │       ├── 19 > NESTED LOOPS
                    │  

                        │   │               │   └── 68 > TABLE ACCESS | BY INDEX ROWID (STORE_SALES)
                        │   │               └── 69 > TABLE ACCESS | FULL (STORE_SALES)
                        │   └── 70 > VIEW
                        │       └── 71 > TABLE ACCESS | FULL (SYS_TEMP_0FD9F16E2_141942F5)
                        ├── 72 > FILTER
                        │   ├── 73 > HASH
                        │   │   └── 74 > HASH JOIN
                        │   │       ├── 75 > VIEW (VW_NSO_2)
                        │   │       │   └── 76 > VIEW
                        │   │       │       └── 77 > TABLE ACCESS | FULL (SYS_TEMP_0FD9F16E1_141942F5)
                        │   │       └── 78 > HASH JOIN
                        │   │           ├── 79 > TABLE ACCESS | FULL (ITEM)
                        │   │           └── 80 > HASH JOIN
                        │   │               ├── 81 > NESTED LOOPS
                        │   │               │   ├── 82 > NESTED LOOPS
  

Total computed delta score [70530057848.27858]
----------------------------------------------------------------------------------------------------




Tree 1 with PLAN_ID [12359]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > SORT
                └── 5 > HASH JOIN
                    ├── 6 > TABLE ACCESS | FULL (ITEM)
                    └── 7 > HASH JOIN
                        ├── 8 > TABLE ACCESS | FULL (CUSTOMER_DEMOGRAPHICS)
                        └── 9 > HASH JOIN
                            ├── 10 > TABLE ACCESS | FULL (DATE_DIM)
                            └── 11 > HASH JOIN
                                ├── 12 > HASH JOIN
                                │   ├── 13 > HASH JOIN
                                │   │   ├── 14 > TABLE ACCESS | FULL (CUSTOMER_ADDRESS)
                                │   │   └── 15 > TABLE ACCESS | FULL (CUSTOMER)
                                │   └── 16 > INDEX | FAST FULL SCAN (SYS_C0021402)
    

Total computed delta score [27693654302.690872]
----------------------------------------------------------------------------------------------------




Tree 1 with PLAN_ID [12367]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > VIEW
                └── 5 > WINDOW
                    └── 6 > VIEW
                        └── 7 > SORT
                            └── 8 > HASH JOIN
                                ├── 9 > TABLE ACCESS | FULL (STORE)
                                └── 10 > HASH JOIN
                                    ├── 11 > TABLE ACCESS | FULL (ITEM)
                                    └── 12 > HASH JOIN
                                        ├── 13 > TABLE ACCESS | FULL (DATE_DIM)
                                        └── 14 > TABLE ACCESS | FULL (STORE_SALES)

Tree 2 with PLAN_ID [12381]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > VIEW
                └── 5 > WINDOW
     

Tree 1 with PLAN_ID [12370]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > VIEW
                └── 5 > UNION-ALL
                    ├── 6 > HASH
                    │   └── 7 > HASH JOIN
                    │       ├── 8 > NESTED LOOPS
                    │       │   ├── 9 > NESTED LOOPS
                    │       │   │   ├── 10 > HASH JOIN
                    │       │   │   │   ├── 11 > TABLE ACCESS | FULL (PROMOTION)
                    │       │   │   │   └── 12 > HASH JOIN
                    │       │   │   │       ├── 13 > TABLE ACCESS | FULL (ITEM)
                    │       │   │   │       └── 14 > HASH JOIN
                    │       │   │   │           ├── 15 > NESTED LOOPS
                    │       │   │   │           │   ├── 16 > NESTED LOOPS
                    │       │   │   │           │   │   ├── 17 > STATISTICS COLLECTOR
                    │       │   │   │           │   │   │   └── 18 > TABLE ACCESS | FULL (DATE

Total computed delta score [6002741430.18745]
----------------------------------------------------------------------------------------------------




Tree 1 with PLAN_ID [12371]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > WINDOW
                └── 5 > SORT
                    └── 6 > HASH JOIN
                        ├── 7 > TABLE ACCESS | FULL (ITEM)
                        └── 8 > HASH JOIN
                            ├── 9 > TABLE ACCESS | FULL (DATE_DIM)
                            └── 10 > TABLE ACCESS | FULL (WEB_SALES)

Tree 2 with PLAN_ID [12385]

0 > SELECT STATEMENT
└── 1 > COUNT
    └── 2 > VIEW
        └── 3 > SORT
            └── 4 > WINDOW
                └── 5 > SORT
                    └── 6 > HASH JOIN
                        ├── 7 > TABLE ACCESS | FULL (ITEM)
                        └── 8 > NESTED LOOPS
                            ├── 9 > NESTED LOOPS
                            │   ├── 10 > TABLE ACCESS | FULL (DATE