Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

match_bool_prefix not work expected. #85381

Open
avengerandy opened this issue Mar 28, 2022 · 2 comments
Open

match_bool_prefix not work expected. #85381

avengerandy opened this issue Mar 28, 2022 · 2 comments
Labels
>bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@avengerandy
Copy link

Elasticsearch version (bin/elasticsearch --version):

7.5.1

Description of the problem including expected versus actual behavior:

match_bool_prefix not work expected.
According the document, A match_bool_prefix query analyzes its input and constructs a bool query from the terms. Each term except the last is used in a term query. The last term is used in a prefix query.

match_bool_prefix = input search text -> analyzer -> boolQuery [TermQuery..., PrefixQuery]

but when i use match_bool_prefix, there are two situation seem not work like document says.

situation 1

I reproduce the problem by follows.

create test_index with title field

PUT test_index
{
  "mappings": {
    "properties" : {
      "title" : {
        "type" : "text",
        "analyzer" : "standard"
      }
    }
  }
}

put a document

PUT test_index/_doc/1
{
  "title": "test docOne"
}

when i use match_bool_prefix saerch

GET test_index/_search
{
  "query": {
    "match_bool_prefix": {
      "title": {
        "query": "doc"
      }
    }
  }
}

document 1 has match

"hits" : [
  {
    "_index" : "test_index",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 1.0,
    "_source" : {
      "title" : "test docOne"
    }
  }
]

and the kibana search profiler shows it use MultiTermQuery and also expand to "docone"

but when i saerch it with "*" again

GET test_index/_search
{
  "query": {
    "match_bool_prefix": {
      "title": {
        "query": "doc*"
      }
    }
  }
}

document 1 not match

"hits" : {
  "total" : {
    "value" : 0,
    "relation" : "eq"
  },
  "max_score" : null,
  "hits" : [ ]
}

and the kibana search profiler shows it not use prefix, just simple TermQuery with "doc".

if "*" already remove by standard analyzer, then two of those search should be same query. why second search does not use prefix query?

situation 2

After tokenizer, every token has a type (ex: ALPHANUM, word), whem i use third party tokenizer that will generate type not exist in elasticsearch, match_bool_prefix will not use prefix query for last term that has unknown type.
third party tokenizer: https://github.com/KennFalcon/elasticsearch-analysis-hanlp

I reproduce the problem by follows.

create test_index2 with title field that use third party tokenizer

PUT test_index2
{
  "mappings": {
    "properties" : {
      "title" : {
        "type" : "text",
        "analyzer" : "hanlp_analyzer"
      }
    }
  },
  "settings": {
    "index" : {
      "analysis" : {
        "analyzer" : {
          "hanlp_analyzer" : {
            "tokenizer" : "hanlp_tokenizer"
          }
        },
        "tokenizer": {
          "hanlp_tokenizer" : {
            "type": "hanlp"
          }
        }
      }
    }
  }
}

put a document

PUT test_index2/_doc/1
{
  "title": "test docOne"
}

analyzer result of "test docOne" and "doc"

GET test_index2/_analyze
{
  "text": "test docOne",
  "analyzer": "hanlp_analyzer"
}
{
  "tokens" : [
    {
      "token" : "test",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "nx",
      "position" : 0
    },
    {
      "token" : "docOne",
      "start_offset" : 5,
      "end_offset" : 11,
      "type" : "nx",
      "position" : 1
    }
  ]
}

GET test_index2/_analyze
{
  "text": "doc",
  "analyzer": "hanlp_analyzer"
}
{
  "tokens" : [
    {
      "token" : "doc",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "nx",
      "position" : 0
    }
  ]
}

the type "nx" is not exist in elasticsearch, so its match_bool_prefix will not use prefix query for last term of search "doc".

GET test_index2/_search
{
  "query": {
    "match_bool_prefix": {
      "title": {
        "query": "doc"
      }
    }
  }
}

"hits" : {
  "total" : {
    "value" : 0,
    "relation" : "eq"
  },
  "max_score" : null,
  "hits" : [ ]
}

and the kibana search profiler shows it not use prefix, just simple TermQuery with "doc".

my problem is, Is there some filter condition inside match_bool_prefix?some condition make not exist type and "*" can not use prefix. i try to find those condition in source code, but not not found so far.

Problem only be reproduced when match_bool_prefix, use match_phrase_prefix are all work expected, thanks.

@arteam arteam added needs:triage Requires assignment of a team area label :Search Relevance/Analysis How text is split into tokens labels Mar 29, 2022
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Mar 29, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@arteam arteam added the >bug label Mar 29, 2022
@DJRickyB DJRickyB removed the needs:triage Requires assignment of a team area label label Apr 5, 2022
@javanna javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@javanna javanna added the priority:normal A label for assessing bug priority to be used by ES engineers label Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

6 participants