Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove pre-filter phase in search #56016

Closed
easyice opened this issue Apr 30, 2020 · 2 comments
Closed

Remove pre-filter phase in search #56016

easyice opened this issue Apr 30, 2020 · 2 comments
Labels
:Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@easyice
Copy link
Contributor

easyice commented Apr 30, 2020

The PR #25658 add pre-filter in search phase,It used to skip shard if not document that is within request date range.this will send can_match RPC to datanode to get if can match.

The pre-filter used in Range query in Date field,this Feature use Lucene PointValues.getMinPackedValue、PointValues.getMaxPackedValue to compare Is there an intersection.

But,This is not necessary.because Lucene will skip any segments if the segment is OUTSIDE the query range(for any Numeric field),this call CELL_OUTSIDE_QUERY.ref to here Lucene use segment's MinPackedValue/MaxPackedValue to get Relation,if Relation is CELL_OUTSIDE_QUERY,it will do nothing,so skip the segment

Therefore, the pre-filter cannot reduce the query time.

I test in my cluster: three nodes, on three physical machine,34c,196g

Index data for test

I use filebeat indices with nginx log ,the indices are below:

curl -s "localhost:9200/_cat/indices?pretty"|grep filebeat
green  open filebeat-7.4.2-2020.03.17-000068   wGACPeiCTcq5ZQZ8x-2KVA 5 1   82769921       0  100.4gb  50.2gb
green  open filebeat-7.4.2-2020.03.06-000059   KfPPN_0rSJ6oshutjAyHDg 5 1   80906072       0  100.9gb  50.4gb
green  open filebeat-7.4.2-2020.03.28-000076   soro-fURTLq3dFjJHdjhoA 5 1   82393361       0    100gb    50gb
green  open filebeat-7.4.2-2020.03.25-000073   M49qQDJKRjmwSxjtfNaSUQ 5 1   84743158       0    100gb    50gb
green  open filebeat-7.4.2-2020.03.20-000070   wvI7qHwOQBeqFeQTL8Tstw 5 1   84957634       0  100.2gb  50.1gb
green  open filebeat-7.4.2-2020.03.31-000080   UWbFXK3xRCeUTsU78RWMQw 5 1   82266053       0  100.1gb    50gb
green  open filebeat-7.4.2-2020.03.31-000081   epT7qYaQRCeRyW2cncWLGA 5 1   82956677       0  100.6gb  50.3gb
green  open filebeat-7.4.2-2020.03.09-000062   p0294PCmTCCCdy67xGKRIQ 5 1   82141141       0    100gb    50gb
green  open filebeat-7.4.2-2020.04.03-000085   hB36R_XhRJmtAV85g2t9rQ 5 1   84546938       0  100.8gb  50.4gb
green  open filebeat-7.4.2-2020.03.03-000056   aK9D91ZgSDSm42x-kmb_3A 5 1   84180704       0    101gb  50.5gb
green  open filebeat-7.4.2-2020.03.08-000061   CTGADML7Qqm0otDI_uiZMA 5 1   83529134       0  100.6gb  50.3gb
green  open filebeat-7.4.2-2020.04.02-000084   Md7rz9ZhT6CePzEw4kRucw 5 1   84989895       0  100.9gb  50.4gb
green  open filebeat-7.4.2-2020.04.02-000083   _g9-bVxtSVSgWJohP6GJPw 5 1   83201742       0    100gb    50gb
green  open filebeat-7.4.2-2020.04.05-000087   fgaHJ_79QTilBANgN0O0lw 5 1   85151909       0  100.4gb  50.2gb
green  open filebeat-7.4.2-2020.03.05-000058   tsupTkWkT4e-V0bW1MSQew 5 1   82591651       0  100.7gb  50.3gb
green  open filebeat-7.4.2-2020.03.27-000075   TCpnlHpfTVS7QT-pIT0JLQ 5 1   82757462       0  100.3gb  50.1gb
green  open filebeat-7.4.2-2020.03.22-000071   _E7BMpjHTaWhnjmIdBt_mA 5 1   84568381       0    100gb    50gb
green  open filebeat-7.4.2-2020.03.11-000064   3o5XdU7dSaiYjk4-ul6Eow 5 1   84889940       0  100.5gb  50.2gb
green  open filebeat-7.4.2-2020.03.13-000066   blh3f4SsRLGfC0o2DyY37Q 5 1   86548359       0  100.6gb  50.3gb
green  open filebeat-7.4.2-2020.03.02-000055   H82hNj3CSeaPdBJYahu-kA 5 1   83627197       0  100.4gb  50.2gb
green  open filebeat-7.4.2-2020.04.04-000086   MDHfXBKuQXWxQZE424lmaA 5 1   84946525       0  100.1gb    50gb
green  open filebeat-7.4.2-2020.03.07-000060   ImfjAWq1T2K6mwNOZyCH4w 5 1   81787735       0    100gb    50gb
yellow open filebeat-7.4.2-2020.04.07-000089   M0ziF2jLQae6528PTVE4Zg 5 1 4055505356       0    4.4tb   2.4tb
green  open filebeat-7.4.2-2020.03.18-000069   wSKM0Y-HRe6uqv59NBAC1Q 5 1   84133931       0    100gb    50gb
green  open filebeat-7.4.2-2020.03.29-000077   E040p77iQe2LbNgB_-r-pg 5 1   83258912       0  100.6gb  50.3gb
green  open filebeat-7.4.2-2020.03.24-000072   b5V9qrg9TJK39JCOjzhkCg 5 1   85055904       0  100.5gb  50.2gb
green  open filebeat-7.4.2-2020.03.30-000079   6plN3nLtSQyElDoQOhC-Vg 5 1   82709268       0  100.6gb  50.3gb
green  open filebeat-7.4.2-2020.03.30-000078   fIAMmc7lTV2FdqMPJaixHg 5 1   82681575       0  100.3gb  50.2gb
green  open filebeat-7.4.2-2020.04.06-000088   8AWMk0cLQiWiQiNYiu0KOg 5 1   85362231       0  100.5gb  50.2gb
green  open filebeat-7.4.2-2020.03.26-000074   HouNhgWLT1-r17U3DahJgA 5 1   84449316       0  100.5gb  50.3gb
green  open filebeat-7.4.2-2020.03.15-000067   aUwEt33BQG2kd4f7-V8lzA 5 1   85806838       0    101gb  50.4gb
green  open filebeat-7.4.2-2020.03.12-000065   m2l6FqAyQCyq_Baw27aL0A 5 1   85869226       0  100.3gb  50.1gb
green  open filebeat-7.4.2-2020.03.04-000057   sfmYBZX-SIeXDssuyI7WYQ 5 1   81905152       0  100.2gb  50.1gb
green  open filebeat-7.4.2-2020.04.01-000082   HqFiWgXZQ1qngfSpgySHBg 5 1   83104024       0  100.4gb  50.2gb
green  open filebeat-7.4.2-2020.03.01-000054   YKPl_f36Rh-ZDbD99DO6Nw 5 1   83255572       0    100gb    50gb
green  open filebeat-7.4.2-2020.03.10-000063   5p9Xbhc5TaamtXly1l_RIA 5 1   81494789       0    100gb    50gb

it has 11227 segments:

curl  -s "localhost:9200/_cat/segments?pretty"|grep filebeat | wc -l
11227

the doc is more than 10000:

POST filebeat-7.4.2-*/_search?size=0
{
  "query": {
    "range": {
      "nginx.bytes.body_sent": {
        "gte": 0
      }
    }
  }
}


{
  "took" : 5491,
  "timed_out" : false,
  "_shards" : {
    "total" : 180,
    "successful" : 180,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

Compare query results

The first,perform Range search in date field,this will match none,and it took 31ms

POST filebeat-7.4.2-*/_search
{
  "size":0,
  "query": {
    "range": {
      "@timestamp": {
        "gte": "2021-01-01",
        "lte":"2022-01-01"
      }
    }
  }
}

{
  "took" : 31,
  "timed_out" : false,
  "_shards" : {
    "total" : 180,
    "successful" : 180,
    "skipped" : 179,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

step2 ,add ?pre_filter_shard_size=1000 param to skip pre-filter phase,the response took 23ms

{
  "took" : 23,
  "timed_out" : false,
  "_shards" : {
    "total" : 180,
    "successful" : 180,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    }
  }
}

step 3,perform Range search in long field,this will match none,and it took 50ms

POST filebeat-7.4.2-*/_search?size=0
{
  "query": {
    "range": {
      "nginx.bytes.body_sent": {
        "gte": -2,
        "lte":-1
      }
    }
  }
}

{
  "took" : 50,
  "timed_out" : false,
  "_shards" : {
    "total" : 180,
    "successful" : 180,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

the query times are similar in two request.the pre-filter has no improvement,so i think it can remove from Elasticsearch

@dnhatn dnhatn added the :Search/Search Search-related issues that do not fall into other categories label Apr 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Apr 30, 2020
@jimczi
Copy link
Contributor

jimczi commented Apr 30, 2020

Speed is not the most important aspect of the pre-filter phase. For instance you're missing the fact that this phase is executed in the network thread directly and avoids the search thread pool queue.
We don't have any plans to remove this phase and are actively adding more features on top of it (#49092). So, since you mentioned that this phase doesn't slow down fast queries, I hope you don't mind if I close this issue. However I'd be happy to continue the discussion in the forum, which is better suited for this kind of conversation.

@jimczi jimczi closed this as completed Apr 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

4 participants