Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Feature request: support group by field, and filter with left join index's attribute #301

Open
seraphjiang opened this issue Nov 22, 2019 · 1 comment
Labels
enhancement New feature or request SQL

Comments

@seraphjiang
Copy link
Member

seraphjiang commented Nov 22, 2019

Problem

I want to be able to find list of issue, filter by attribute from another left join table.

I have two index, issue and issue_detail, they share the same key issueId. issue_detail has additional attribute - priority i'd like to use to filter out result.

see below example

SQL Query

select i.issueId from [issue] i 
  left join [issue_detail] d 
  on i.issueId = d.issueId 
where d.priority=2
group by i.issueId

OD-SQL Query

GET _opendistro/_sql?format=csv
{
  "query": "select i.issueId from [issue] i left join [issue_detail] d on i.issueId = d.issueId where d.priority=2 group by i.issueId"
}

Actual result:

i.issueId
00003
00002
00001
00004

Expect result:

i.issueId
00002
00004

Testing Data

POST _bulk
{ "index" : { "_index" : "issue", "_id" : "1" } }
{ "issueId" : "00001" }
{ "index" : { "_index" : "issue", "_id" : "2" } }
{ "issueId" : "00002" }
{ "index" : { "_index" : "issue", "_id" : "3" } }
{ "issueId" : "00003" }
{ "index" : { "_index" : "issue", "_id" : "4" } }
{ "issueId" : "00004" }


POST _bulk
{ "index" : { "_index" : "issue_detail", "_id" : "1" } }
{ "issueId":"00001", "priority" : 1 }
{ "index" : { "_index" : "issue_detail", "_id" : "2" } }
{ "issueId":"00002", "priority" : 2 }
{ "index" : { "_index" : "issue_detail", "_id" : "3" } }
{ "issueId":"00003", "priority" : 3 }
{ "index" : { "_index" : "issue_detail", "_id" : "4" } }
{ "issueId":"00004", "priority" : 2 }

@chloe-zh
Copy link
Member

Thanks for the report! I have reproduced this output and DSL was printed as follows:

{
  "Physical Plan" : {
    "Project [ columns=[i.issueId] ]" : {
      "Top [ count=200 ]" : {
        "BlockHashJoin[ conditions=( i.issueId = d.issueId ), type=LEFT_OUTER_JOIN, blockSize=[FixedBlockSize with size=10000] ]" : {
          "Scroll [ issue_detail as d, pageSize=10000 ]" : {
            "request" : {
              "size" : 200,
              "query" : {
                "bool" : {
                  "filter" : [
                    {
                      "bool" : {
                        "adjust_pure_negative" : true,
                        "must" : [
                          {
                            "term" : {
                              "priority" : {
                                "boost" : 1,
                                "value" : 2
                              }
                            }
                          }
                        ],
                        "boost" : 1
                      }
                    }
                  ],
                  "adjust_pure_negative" : true,
                  "boost" : 1
                }
              },
              "from" : 0
            }
          },
          "Scroll [ issue as i, pageSize=10000 ]" : {
            "request" : {
              "size" : 200,
              "from" : 0,
              "_source" : {
                "excludes" : [ ],
                "includes" : [
                  "issueId"
                ]
              }
            }
          },
          "useTermsFilterOptimization" : false
        }
      }
    }
  },
  "description" : "Hash Join algorithm builds hash table based on result of first query, and then probes hash table to find matched rows for each row returned by second query",
  "Logical Plan" : {
    "Project [ columns=[i.issueId] ]" : {
      "Top [ count=200 ]" : {
        "Join [ conditions=( i.issueId = d.issueId ) type=LEFT_OUTER_JOIN ]" : {
          "Group" : [
            {
              "Project [ columns=[i.issueId] ]" : {
                "TableScan" : {
                  "tableAlias" : "i",
                  "tableName" : "issue"
                }
              }
            },
            {
              "Project [ columns=[d.issueId] ]" : {
                "Filter [ conditions=[AND ( AND priority EQ 2 ) ] ]" : {
                  "TableScan" : {
                    "tableAlias" : "d",
                    "tableName" : "issue_detail"
                  }
                }
              }
            }
          ]
        }
      }
    }
  }
}

As we can see in the physical plan, the filter was pushed into only the left table, which causes problem to the LEFT JOIN queries with filters. Currently we do not support filter and aggregate on the result of LEFT JOINs due to the limitation of ES. Related issues: #245 #236 #289 #124 #110

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request SQL
Projects
None yet
Development

No branches or pull requests

3 participants