Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] match_phrase_prefix query on an inner object text field not working #9203

Open
Ei5enheim opened this issue Aug 9, 2023 · 1 comment
Labels
bug Something isn't working Search:Relevance Search Search query, autocomplete ...etc v2.16.0 Issues and PRs related to version 2.16.0 v3.0.0 Issues and PRs related to version 3.0.0

Comments

@Ei5enheim
Copy link

Describe the bug
I have a document with a field (array) that stores multiple objects with one text attribute. when I execute a match_phrase_prefix query on that text field I do not see any hits when there are multiple objects in the array. However, when the first inner object's text field is a match, I do see the document as a hit.

To Reproduce

Create a below sample mapping
{
  "mappings": {
    "properties": {
	"platform_name": { 
		"type": "text",
		"index_prefixes": { }
	},
	"applications":  {
		"properties": {
			"name": { "type": "text", "index_prefixes": { } },
			"app_version": { "type": "keyword" },
			"publisher": { "type": "text" }
                 }
          }
      }
    }
  }
}

Sample document

{
	"platform_name": "Red Hat",
	"applications": [
		{	"name": "red18", "version": "v2", "publisher": "red brown ltd"},
		{	"name": "bind-libs", "version": "v1", "publisher": "red brown ltd"},
		{	"name": "python-libs", "version": "v1", "publisher": "red brown ltd"},
		{	"name": "python1-libs", "version": "v1", "publisher": "red brown ltd"},
		{	"name": "python2-libs", "version": "v1", "publisher": "red brown ltd"},
		{	"name": "python3-libs", "version": "v1", "publisher": "red brown ltd"},
		{	"name": "python4-libs", "version": "v1", "publisher": "red brown ltd"},
		{	"name": "python5-libs", "version": "v1", "publisher": "red brown ltd"},
		{	"name": "python6-libs", "version": "v1", "publisher": "red brown ltd"},
		{	"name": "python7-libs", "version": "v1", "publisher": "red brown ltd"}		
	],
}

Insert the above doc into an index.

Run below queries:

match_phrase query on applications.name field.

{
  "query": {
		"bool":{
			"filter": [
				{
					"bool": {
						"must": [
							{
								"match_phrase": {
									"applications.name": "bind-libs"
								}
							}
						]
					}
				}
			]
		}
       }
}

match_phrase_prefix query on applications.name field.

{
  "query": {
		"bool":{
			"filter": [
				{
					"bool": {
						"must": [
							{
								"match_phrase_prefix": {
									"applications.name": "bind-lib"
								}
							}
						]
					}
				}
			]
		}
       }
}

Expected behavior
Both match_phrase and match_phrase_prefix query should return the document as a hit. However, only match_phrase query returns a hit.

Plugins
None

Screenshots
None

Host/Environment (please complete the following information):

  • OS: MacOs
  • Version
    • opensearch: latest (2.7)

Additional context
Add any other context about the problem here.

@gaobinlong
Copy link
Contributor

The steps of reproducing this bug can be simplified as this:

  1. create a index with a text field and index_prefixes:
PUT test11
{
  "mappings": {
    "properties": {
      "applications": {
        "properties": {
          "name": {
            "type": "text",
            "index_prefixes":{}
          }
        }
      }
    }
  }
}
  1. Add a document with multiple values on that text field
POST test11/_doc/1?refresh
{
	"applications.name": ["a", "b-12"]
}
  1. Execute match_phrase_prefix query
GET test11/_search
{
  "profile": "true",
  "query": {
    "match_phrase_prefix": {
      "applications.name": "b-12"
    }
  }
}

, then you can see nothing returns, with profile setting to true, you can also see that that query is rewritten to SpanNear query:

"type": "SpanNearQuery",
                "description": "spanNear([applications.name:b, mask(applications.name._index_prefix:12) as applications.name], 0, true)"

However, if you use match_phrase query, it works well, and if execute match_phrase_prefix query on a text field which has single value, not multiple values, it also works, things changed weird.

After diving deep on it, I found that this bug is caused by the parameter position_increment_gap, the parameter is used to add a fake gap between the values of multiple-values text field in order to prevent most phrase queries from matching across the values, and the default value is 100. See more details in the document of ES 7.10: position-increment-gap.

The problem is that when index_prefixes is set, a new sub-field applications.name._index_prefix is created in order to speed up the match_phrase_prefix query, but the gap between the values of the sub-field is missing, when calling bulkScorer.score(leafCollector, liveDocs) , and then call twoPhase.matches() in Lucene, this method returns false because this condition is met because nextPosition() of the term 12 in the field applications.name._index_prefix should be 102 but not NO_MORE_POSITIONS.

So you can imagine that there's a workaround for this bug, that is to set position_increment_gap to 0 for the text field applications.name, then it works well, but I think we should fix this bug.

PUT test11
{
  "mappings": {
    "properties": {
      "applications": {
        "properties": {
          "name": {
            "type": "text",
            "position_increment_gap": 0,
            "index_prefixes":{}
          }
        }
      }
    }
  }
}

@reta reta added v3.0.0 Issues and PRs related to version 3.0.0 v2.16.0 Issues and PRs related to version 2.16.0 labels Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Relevance Search Search query, autocomplete ...etc v2.16.0 Issues and PRs related to version 2.16.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
Status: 🆕 New
Development

No branches or pull requests

6 participants