Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour with search_as_you_type field indexed with multiple values #64394

Open
serkanozer opened this issue Oct 30, 2020 · 3 comments
Labels
>bug :Search/Suggesters "Did you mean" and suggestions as you type Team:Search Meta label for search team

Comments

@serkanozer
Copy link

serkanozer commented Oct 30, 2020

Elasticsearch version (bin/elasticsearch --version): 7.9.3

Steps to reproduce:

curl -X PUT "localhost:9200/test_index?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "search_as_you_type_field": {
        "type": "search_as_you_type"
      }
    }
  }
}
'

curl -X PUT "localhost:9200/test_index/_doc/1?refresh&pretty" -H 'Content-Type: application/json' -d'
{
  "search_as_you_type_field": ["owl", "quick brown fox dog"],
}
'

curl -X PUT "localhost:9200/test_index/_doc/2?refresh&pretty" -H 'Content-Type: application/json' -d'
{
  "search_as_you_type_field": ["quick brown fox dog", "owl"]
}
'

curl -X GET "localhost:9200/test_index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_phrase_prefix": {
      "search_as_you_type_field": {"query": "quick brown fox d"}
    }
  }
}
'
this returns the second document but not the first

match_phrase_prefix query on a search_as_you_type field doesn't seem to work properly as expected. In the example above first document is indexed with ["owl", "quick brown fox dog"],
querying q, qu, qui, .. quick b.. , quick brown f.. works but quick brown fox d, quick brown fox do, quick brown fox dog doesn't. However all the possible prefix queries (for "quick brown fox dog") works for the document 2.

I'm not sure this is an expected behavior but seems pretty strange and it is not documented anywhere

Provide logs (if relevant):

@serkanozer serkanozer added >bug needs:triage Requires assignment of a team area label labels Oct 30, 2020
@martijnvg martijnvg added the :Search/Suggesters "Did you mean" and suggestions as you type label Oct 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Suggesters)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Oct 30, 2020
@martijnvg martijnvg removed the needs:triage Requires assignment of a team area label label Oct 30, 2020
@jimczi
Copy link
Contributor

jimczi commented Oct 30, 2020

@romseygeek can you take a look ?

@mushao999
Copy link
Contributor

Following is my analysis:

1. Query Parsing

match_phrase_prefix query for search_as_you_type field will be parsed into a spanNearQuery with many sub clauses. Last one of these clauses is a FieldMaskingSpanQuery of _3gram field( spanTermQury of _index_prefix field actually), rest of the clauses are spanTermQuery of _3gram field. For example , query

{
  "query": {
    "match_phrase_prefix": {
      "search_as_you_type_field": {
        "query": "quick brown fox dog c"
      }
    }
  }
}

will be parsed into : SpanNearQuery( SpanTermQuery:3_gram:quick brown fox + SpanTermQuery:_3gram:brown fox dog + FieldMaskingSpanQuery(SpanTermQuery:_index_prefix: fox dog c))

2. Query Execution

  • when executing this query, lucene will check matchWidth between each adjacent sub clauses to make sure the matchWidth is not larger than the slop
      org.apache.lucene.search.spans.NearSpansOrdered.java
    
      ...
      matchWidth += (spans.startPosition() - prevSpans.endPosition());
      ... 
      
      ...
      if (stretchToOrder() && matchWidth <= allowedSlop) {
        return atFirstInCurrentDoc = true;
      }
      ...
    
  • for spanTermQuerys of _3gram field both prevSpans.endPostion() and spans.startPostion() will use the position in _3gram field wich is correct.
  • however for spanTermQuery of __index_prefix field, prevSpans.endPostion() will use position in _3gram field, and spans.startPosition() will use the position in _index_prefix. matchWidth will be incorrect if position in _3gram field is inconsistent with _index_prefix field.

3.Inconsistent positions

if search_as_you_type field is given multiValues. such as

{
  "search_as_you_type_field": [
    "owl",
    "quick brown fox dog"
  ]
}

and query is

{
  "query": {
    "match_phrase_prefix": {
      "search_as_you_type_field": {
        "query": "quick brown fox d"
      }
    }
  }
}
  • the spanTermQuery for _index_prefix field is : search_as_you_type_field._index_prefix: brown fox d
  • previous spanTermQuery is search_as_you_type_field._3gram: quick brown fox
  • use termVectors API we could found the fowlloing info:
test_index/_doc/1/_termvectors?fields=search_as_you_type_field._index_prefix

"quick brown fox": {
       "term_freq": 1,
       "tokens": [{
           "position": 1,
           "start_offset": 4,
           "end_offset": 19
       }]
   }
"brown fox d": {
     "term_freq": 1,
     "tokens": [{
           "position": 2,
           "start_offset": 10,
           "end_offset": 23
       }]
   }
   
test_index/_doc/1/_termvectors?fields=search_as_you_type_field._3gram   
    
"quick brown fox": {
   "term_freq": 1,
       "tokens": [{
           "position": 0,
           "start_offset": 0,
           "end_offset": 15
       }]
   }
  • quick brown foxhas different position in _index_prefix field and _3gram field
  • so matchWidth=2-(0+1)=1>allowedSlop(0), and doc will no show in query hits

Any good idea to fix this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Suggesters "Did you mean" and suggestions as you type Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

5 participants