Skip to content
This repository has been archived by the owner on Mar 30, 2021. It is now read-only.

Sample Queries

hbutani edited this page Aug 4, 2016 · 11 revisions

These are against the denormalized(flattened) TPCH sales star schema as described in the Quick Start Guide page.

Example List:

  1. Basic Aggregation
  2. Basic Aggregation with SubQuery
  3. Interval and Dimension Filters
  4. Comparison Filter

Basic Aggregation

SQL

select l_returnflag, l_linestatus, count(*), 
       sum(l_extendedprice) as s, max(ps_supplycost) as m, avg(ps_availqty) as a,count(distinct o_orderkey)  
from orderLineItemPartSupplier 
group by l_returnflag, l_linestatus

Logical Plan

Aggregate [l_returnflag#69,l_linestatus#70], [l_returnflag#69,l_linestatus#70,COUNT(1) AS c2#109L,SUM(l_extendedprice#66) AS s#106,MAX(ps_supplycost#81) AS m#107,AVG(CAST(ps_availqty#80, LongType)) AS a#108,COUNT(DISTINCT o_orderkey#53) AS c6#110L]
 Project [l_extendedprice#66,o_orderkey#53,ps_supplycost#81,l_returnflag#69,l_linestatus#70,ps_availqty#80]
  Relation[o_orderkey#53,o_custkey#54,o_orderstatus#55,...

Physical Plan

Project [l_returnflag#69,l_linestatus#70,alias-1#112L AS c2#109L,alias-2#111 AS s#106,alias-3#115 AS m#107,alias-4#113 AS a#108,alias-7#114L AS c6#110L]
 PhysicalRDD [alias-2#111,alias-3#115,alias-7#114L,alias-4#113,l_returnflag#69,l_linestatus#70,alias-1#112L], DruidRDD[2] at RDD at DruidRDD.scala:16

Druid Query

{
    "jsonClass" : "GroupByQuerySpec",
    "queryType" : "groupBy",
    "dataSource" : "tpch",
    "dimensions" : [ {
      "jsonClass" : "DefaultDimensionSpec",
      "type" : "default",
      "dimension" : "l_returnflag",
      "outputName" : "l_returnflag"
    }, {
      "jsonClass" : "DefaultDimensionSpec",
      "type" : "default",
      "dimension" : "l_linestatus",
      "outputName" : "l_linestatus"
    } ],
    "granularity" : "all",
    "aggregations" : [ {
      "jsonClass" : "FunctionAggregationSpec",
      "type" : "count",
      "name" : "alias-1",
      "fieldName" : "count"
    }, {
      "jsonClass" : "FunctionAggregationSpec",
      "type" : "doubleSum",
      "name" : "alias-2",
      "fieldName" : "l_extendedprice"
    }, {
      "jsonClass" : "FunctionAggregationSpec",
      "type" : "doubleMax",
      "name" : "alias-3",
      "fieldName" : "ps_supplycost"
    }, {
      "jsonClass" : "FunctionAggregationSpec",
      "type" : "longSum",
      "name" : "alias-5",
      "fieldName" : "sum_ps_availqty"
    }, {
      "jsonClass" : "FunctionAggregationSpec",
      "type" : "count",
      "name" : "alias-6",
      "fieldName" : "count"
    }, {
      "jsonClass" : "CardinalityAggregationSpec",
      "type" : "cardinality",
      "name" : "alias-7",
      "fieldNames" : [ "o_orderkey" ],
      "byRow" : true
    } ],
    "postAggregations" : [ {
      "jsonClass" : "ArithmeticPostAggregationSpec",
      "type" : "arithmetic",
      "name" : "alias-4",
      "fn" : "/",
      "fields" : [ {
        "jsonClass" : "FieldAccessPostAggregationSpec",
        "type" : "fieldAccess",
        "fieldName" : "alias-5"
      }, {
        "jsonClass" : "FieldAccessPostAggregationSpec",
        "type" : "fieldAccess",
        "fieldName" : "alias-6"
      } ]
    } ],
    "intervals" : [ "1992-12-31T16:00:00.000-08:00/1997-12-31T16:00:00.000-08:00" ]
  }

Rewrites:

  • Grouping Expressions were translated to DefaultDimensionSpecs
  • Count(*) translated to FunctionAggregationSpec(type=count)
  • Sum/Min/Max translated to corresponding FunctionAggregationSpec
  • Avg translated to ArithmeticPostAggregationSpec and a pair of FunctionAggregationSpecs

Basic Aggregation on SubQuery

SQL

select f, s, count(*)  
from (select l_returnflag f, l_linestatus s from orderLineItemPartSupplier) t 
group by f, s

Logical Plan

Aggregate [f#122,s#123], [f#122,s#123,COUNT(1) AS c2#124L]
 Project [l_returnflag#69 AS f#122,l_linestatus#70 AS s#123]
  Relation[o_orderkey#53,o_custkey#54,....

Physical Plan

Project [f#122,s#123,alias-1#125L AS c2#124L]
 PhysicalRDD [f#122,s#123,alias-1#125L], DruidRDD[3] at RDD at DruidRDD.scala:16

Druid Query

{
    "jsonClass" : "GroupByQuerySpec",
    "queryType" : "groupBy",
    "dataSource" : "tpch",
    "dimensions" : [ {
      "jsonClass" : "DefaultDimensionSpec",
      "type" : "default",
      "dimension" : "l_returnflag",
      "outputName" : "f"
    }, {
      "jsonClass" : "DefaultDimensionSpec",
      "type" : "default",
      "dimension" : "l_linestatus",
      "outputName" : "s"
    } ],
    "granularity" : "all",
    "aggregations" : [ {
      "jsonClass" : "FunctionAggregationSpec",
      "type" : "count",
      "name" : "alias-1",
      "fieldName" : "count"
    } ],
    "intervals" : [ "1992-12-31T16:00:00.000-08:00/1997-12-31T16:00:00.000-08:00" ]
  }

Rewrites:

  • Similar to basicAgg query
  • Projection above PhysicalRDD Operator maps Druid output to original query Attributes. Details on mapping described here

Interval and Dimension Filters

SQL logic

val shipDtPredicate = dateTime('l_shipdate) <= (dateTime("1997-12-01") - 90.day)

    val df = sql(
      date"""
      select f, s, count(*) as count_order
      from
      (
         select l_returnflag as f, l_linestatus as s, l_shipdate, s_region, s_nation, c_nation
         from orderLineItemPartSupplier
      ) t
      where $shipDtPredicate and ((s_nation = 'FRANCE' and c_nation = 'GERMANY') or
                                  (c_nation = 'FRANCE' and s_nation = 'GERMANY')
                                 )
      group by f,s
      order by f,s
""")

SQL

      select f, s, count(*) as count_order
      from
      (
         select l_returnflag as f, l_linestatus as s, l_shipdate, s_region, s_nation, c_nation
         from orderLineItemPartSupplier
      ) t
      where dateIsBeforeOrEqual(dateTime(`l_shipdate`),dateMinus(dateTime("1997-12-01"),period("P90D"))) and ((s_nation = 'FRANCE' and c_nation = 'GERMANY') or
                                  (c_nation = 'FRANCE' and s_nation = 'GERMANY')
                                 )
      group by f,s
      order by f,s

Logical Plan

Sort [f#113 ASC,s#114 ASC], true
 Aggregate [f#113,s#114], [f#113,s#114,COUNT(1) AS count_order#106L]
  Project [l_returnflag#69 AS f#113,l_linestatus#70 AS s#114]
   Filter (scalaUDF(scalaUDF(l_shipdate#71),scalaUDF(scalaUDF(1997-12-01),scalaUDF(P90D))) && (((s_nation#88 = FRANCE) && (c_nation#104 = GERMANY)) || ((c_nation#104 = FRANCE) && (s_nation#88 = GERMANY))))
    Relation[o_orderkey#53,o_custkey#54,o_orderstatus#55,...

Physical Plan

Sort [f#113 ASC,s#114 ASC], true
 Project [f#113,s#114,alias-1#115L AS count_order#106L]
  PhysicalRDD [f#113,s#114,alias-1#115L], DruidRDD[2] at RDD at DruidRDD.scala:16

Druid Query

{
    "jsonClass" : "GroupByQuerySpec",
    "queryType" : "groupBy",
    "dataSource" : "tpch",
    "dimensions" : [ {
      "jsonClass" : "DefaultDimensionSpec",
      "type" : "default",
      "dimension" : "l_returnflag",
      "outputName" : "f"
    }, {
      "jsonClass" : "DefaultDimensionSpec",
      "type" : "default",
      "dimension" : "l_linestatus",
      "outputName" : "s"
    } ],
    "granularity" : "all",
    "filter" : {
      "jsonClass" : "LogicalFilterSpec",
      "type" : "or",
      "fields" : [ {
        "jsonClass" : "LogicalFilterSpec",
        "type" : "and",
        "fields" : [ {
          "jsonClass" : "SelectorFilterSpec",
          "type" : "selector",
          "dimension" : "s_nation",
          "value" : "FRANCE"
        }, {
          "jsonClass" : "SelectorFilterSpec",
          "type" : "selector",
          "dimension" : "c_nation",
          "value" : "GERMANY"
        } ]
      }, {
        "jsonClass" : "LogicalFilterSpec",
        "type" : "and",
        "fields" : [ {
          "jsonClass" : "SelectorFilterSpec",
          "type" : "selector",
          "dimension" : "c_nation",
          "value" : "FRANCE"
        }, {
          "jsonClass" : "SelectorFilterSpec",
          "type" : "selector",
          "dimension" : "s_nation",
          "value" : "GERMANY"
        } ]
      } ]
    },
    "aggregations" : [ {
      "jsonClass" : "FunctionAggregationSpec",
      "type" : "count",
      "name" : "alias-1",
      "fieldName" : "count"
    } ],
    "intervals" : [ "1992-12-31T16:00:00.000-08:00/1997-09-02T01:00:00.001-07:00" ]
  }

Rewrites:

  • The predicate on the timeDimension column l_shipdate is converted to a Interval: "1992-12-31T16:00:00.000-08:00/1997-09-02T01:00:00.001-07:00"
  • Predicates on s_nation and c_nation is translated to LogicalFilterSpec and SelectorFilterSpecs

Comparison Filter

SQL logic

val shipDtPredicate = dateTime('l_shipdate) <= (dateTime("1997-12-01") - 90.day)

    val df = sql(
      date"""
      select s_nation, count(*) as count_order
      from
      (
         select l_returnflag as f, l_linestatus as s, l_shipdate,
         s_region, s_nation, c_nation, p_type
         from orderLineItemPartSupplier
      ) t
      where $shipDtPredicate and s_nation >= 'FRANCE'
      group by s_nation
      order by s_nation
""")

SQL

      select s_nation, count(*) as count_order
      from
      (
         select l_returnflag as f, l_linestatus as s, l_shipdate,
         s_region, s_nation, c_nation, p_type
         from orderLineItemPartSupplier
      ) t
      where dateIsBeforeOrEqual(dateTime(`l_shipdate`),dateMinus(dateTime("1997-12-01"),period("P90D"))) and s_nation >= 'FRANCE'
      group by s_nation
      order by s_nation

Logical Plan

Sort [s_nation#88 ASC], true
 Aggregate [s_nation#88], [s_nation#88,COUNT(1) AS count_order#106L]
  Project [s_nation#88]
   Filter (scalaUDF(scalaUDF(l_shipdate#71),scalaUDF(scalaUDF(1997-12-01),scalaUDF(P90D))) && (s_nation#88 >= FRANCE))
    Relation[o_orderkey#53,o_custkey#54,o_orderstatus...

Physical Plan

Sort [s_nation#88 ASC], true
 Project [s_nation#88,alias-1#115L AS count_order#106L]
  PhysicalRDD [s_nation#88,alias-1#115L], DruidRDD[2] at RDD at DruidRDD.scala:16

Druid Query

{
    "jsonClass" : "GroupByQuerySpec",
    "queryType" : "groupBy",
    "dataSource" : "tpch",
    "dimensions" : [ {
      "jsonClass" : "DefaultDimensionSpec",
      "type" : "default",
      "dimension" : "s_nation",
      "outputName" : "s_nation"
    } ],
    "granularity" : "all",
    "filter" : {
      "jsonClass" : "JavascriptFilterSpec",
      "type" : "javascript",
      "dimension" : "s_nation",
      "function" : "function(x) { return(x >= 'FRANCE') }"
    },
    "aggregations" : [ {
      "jsonClass" : "FunctionAggregationSpec",
      "type" : "count",
      "name" : "alias-1",
      "fieldName" : "count"
    } ],
    "intervals" : [ "1992-12-31T16:00:00.000-08:00/1997-09-02T01:00:00.001-07:00" ]
  },
  "intervalSplits" : [ {
    "start" : 725846400000,
    "end" : 873187200001
  }

Rewrites:

  • The predicate s_nation >= 'FRANCE' is translated to a JavascriptFilterSpec
Clone this wiki locally