Skip to content

Mv_expand and projections and filters #136232

@dnhatn

Description

@dnhatn

A user reported an issue where an ES|QL query takes several minutes to execute. I don't have the query, but I do have a task that includes a list of operators. I suspect the issue is related to mv_expand combined with projections.

from test | MV_EXPAND count_d | where count_d > 20

produces

ExchangeSinkExec[[color{f}#34, count{f}#35, count_d{r}#36, data{f}#37, data_d{f}#38, tag{f}#39, time{f}#40],false]
\_ProjectExec[[color{f}#34, count{f}#35, count_d{r}#36, data{f}#37, data_d{f}#38, tag{f}#39, time{f}#40]]
  \_FieldExtractExec[color{f}#34, count{f}#35, data{f}#37, data_d{f}#38, ..]<[],[]>
    \_LimitExec[1000[INTEGER],28]
      \_FilterExec[count_d{r}#36 > 20[INTEGER]]
        \_MvExpandExec[count_d{f}#41,count_d{r}#36]
          \_FieldExtractExec[count_d{f}#41]<[],[]>
            \_EsQueryExec[test], indexMode[standard], [_doc{f}#42], limit[], sort[] estimatedRowSize[144] queryBuilderAndTags [[QueryBuilderAndTags{queryBuilder=[null], tags=[]}]]
from test | KEEP * | MV_EXPAND count_d | where count_d > 20

produces

ExchangeSinkExec[[color{f}#14, count{f}#15, count_d{r}#16, data{f}#17, data_d{f}#18, tag{f}#19, time{f}#20],false]
\_ProjectExec[[color{f}#14, count{f}#15, count_d{r}#16, data{f}#17, data_d{f}#18, tag{f}#19, time{f}#20]]
  \_LimitExec[1000[INTEGER],140]
    \_FilterExec[count_d{r}#16 > 20[INTEGER]]
      \_MvExpandExec[count_d{f}#21,count_d{r}#16]
        \_ProjectExec[[color{f}#14, count{f}#15, count_d{f}#21, data{f}#17, data_d{f}#18, tag{f}#19, time{f}#20]]
          \_FieldExtractExec[color{f}#14, count{f}#15, count_d{f}#21, data{f}#17, ..]<[],[]>
            \_EsQueryExec[test], indexMode[standard], [_doc{f}#22], limit[], sort[] estimatedRowSize[144] queryBuilderAndTags [[QueryBuilderAndTags{queryBuilder=[null], tags=[]}]]

With KEEP * before mv_expand, all fields are eagerly loaded before the filter. I believe both queries should produce the same physical plan that loading fields after the filter.

I think we could push the filter (without single-value-match) to Lucene in this scenario, but that's a separate issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions