Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completion only retrieves one result when multiple documents share same output #4255

Closed
danilomr opened this Issue Nov 26, 2013 · 11 comments

Comments

Projects
None yet
10 participants
@danilomr
Copy link

danilomr commented Nov 26, 2013

When I create multiple documents which have the same output value in the completion field, a suggest completion request only retrieves one object.
I guess it is a feature, however, once we may set different payloads to those documents, it would make sense to retrieve multiple suggestions with the same output.

My environment settings:
ElasticSearch version number: 1.0.0.Beta1
Lucene version: 4.5.1

In the following example, I create an index with two documents. Each document has a different value in the payload, but the same output and input.
Performing a suggestion completion request, only one document is retrieved.

Scripts:

Create and populate index

curl -XDELETE 'localhost:9200/notebookindex'

curl -XPUT localhost:9200/notebookindex

curl -XPUT localhost:9200/notebookindex/friend/_mapping -d '{
  "friend" : {
        "properties" : {
            "name" : { "type" : "string" },
            "suggestField" : { "type" : "completion", "payloads" : true }
        }
    }
}'

curl -XPUT 'localhost:9200/notebookindex/friend/1' -d ' {
  "name": "james smith",
  "suggestField": {
    "input": ["james", "smith", "james smith"],
    "output": "james smith",
    "payload": {"id": "1", "phone": "555-55555"}
  }
}'

curl -XPUT 'localhost:9200/notebookindex/friend/2' -d '{
  "name": "james smith",
  "suggestField": {
    "input": ["james", "smith", "james smith"],
    "output": "james smith",
    "payload": {"id": "2", "phone": "444-44444"}
  }
}'

Search 1: look for friends starting with j

curl -XPOST 'localhost:9200/notebookindex/_suggest?pretty' -d '{
  "my-friends-suggest": {
    "text": "j",
    "completion": {
      "field": "suggestField"
    }
  }
}'

Only one James is found, even though there are two documents matching the suggestion.

{
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "my-friends-suggest" : [ {
    "text" : "j",
    "offset" : 0,
    "length" : 1,
    "options" : [ {
      "text" : "james smith",
      "score" : 1.0, "payload" : {"id":"2","phone":"444-44444"}
    } ]
  } ]
}

Removal and Search 2: remove the James (id 2) previously found and look for friends starting with j

curl -XDELETE 'localhost:9200/notebookindex/friend/2'

curl -XPOST 'localhost:9200/notebookindex/_suggest?pretty' -d '{
  "my-friends-suggest": {
    "text": "j",
    "completion": {
      "field": "suggestField"
    }
  }
}'

The other James is found.

{
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "my-friends-suggest" : [ {
    "text" : "j",
    "offset" : 0,
    "length" : 1,
    "options" : [ {
      "text" : "james smith",
      "score" : 1.0, "payload" : {"id":"1","phone":"555-55555"}
    } ]
  } ]
}
@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Nov 26, 2013

I am feeling bad to close these issues but this is expected and rather a feature than a bug. We de-duplicate under the hood and only return one if the score is the same as well as the output. Thanks for writing this up.

@s1monw s1monw closed this Nov 26, 2013

@mfn

This comment has been minimized.

Copy link
Contributor

mfn commented Nov 26, 2013

Is there any chance to revisit this behavior? Just because the output is the same doesn't mean it's a different "thing", e.g. the payload could be different.

A workaround would probably be to just add "something" unique to the output and put the actual desired output as part of the payload I guess.

thanks

spinscale added a commit that referenced this issue Dec 5, 2013

[DOCS] Completion suggest: Clarify de-duplication, optimize/merge
This contribution is based on the feedback given in issue #4254 and
issue #4255, and should clear things up, when suggestions are being
removed and not displayed anymore after deletion of data.

spinscale added a commit that referenced this issue Dec 5, 2013

[DOCS] Completion suggest: Clarify de-duplication, optimize/merge
This contribution is based on the feedback given in issue #4254 and
issue #4255, and should clear things up, when suggestions are being
removed and not displayed anymore after deletion of data.
@facundoolano

This comment has been minimized.

Copy link

facundoolano commented Jan 22, 2014

I think the de-duplication should be optional, there are times when you can't construct the differentiated output at index time, and rather you have to do it based on the user input.

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Jan 23, 2014

I took a look at this yesterday and i think I can make this work. There is still some work left since I think I ran into a bug in lucene that I wanted to fix ages ago but haven't yet... not sure when I will be able to get this done but there is hope!

@ghost ghost assigned s1monw Jan 23, 2014

@s1monw s1monw reopened this Jan 23, 2014

s1monw added a commit to s1monw/elasticsearch that referenced this issue Mar 12, 2014

@s1monw s1monw added the adoptme label Jul 4, 2014

@clintongormley clintongormley removed the adoptme label Jul 4, 2014

@clintongormley clintongormley assigned areek and unassigned s1monw Jul 11, 2014

@clintongormley

This comment has been minimized.

Copy link
Member

clintongormley commented Jul 11, 2014

@areek assigning this to you

@petard

This comment has been minimized.

Copy link

petard commented Aug 12, 2014

Is de-duplication now optional? In the docs it says "The result is de-duplicated if several documents have the same output, i.e. only one is returned as part of the suggest result. This is optional." but it doesn't say how to turn it off actually.

@areek

This comment has been minimized.

Copy link
Contributor

areek commented Aug 12, 2014

No de-dup is not yet optional. I can see how the doc can be confusing on this (will fix), its the output that is optional.

There has been some progress made related to this issue (check out #7133). After this issue is committed (should be very soon), then same outputs will be appropriately stored and hence it will be easier to support optional de-duplication in completion & context suggesters.

@smithatlanta

This comment has been minimized.

Copy link

smithatlanta commented Aug 27, 2014

I'm running into the same issue. I was using this to do quick searches for movie titles and was scratching my head when we only received one document back for "Shaft" and "Titanic". I guess I'll have to go back to using match. It's great that you will have the de-dup option pretty soon.

@gacarrillor

This comment has been minimized.

Copy link

gacarrillor commented Sep 5, 2014

I'm another ES user eager to use the de-duplication option. As @mfn mentioned, the payload is sufficient to disambiguate suggestions, like here (place names with their corresponding place types):

completion_suggester_same_output

@kgujral

This comment has been minimized.

Copy link

kgujral commented Oct 15, 2014

+1
Please make this feature optional and differentiate on the basis of both output and payload.
Thanks

@areek

This comment has been minimized.

Copy link
Contributor

areek commented Dec 11, 2014

closing in favour of #8909

@areek areek closed this Dec 11, 2014

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

[DOCS] Completion suggest: Clarify de-duplication, optimize/merge
This contribution is based on the feedback given in issue elastic#4254 and
issue elastic#4255, and should clear things up, when suggestions are being
removed and not displayed anymore after deletion of data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.