New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleted documents show up in completion suggester #117

Closed
johnywith1n opened this Issue Jul 7, 2014 · 17 comments

Comments

Projects
None yet
9 participants
@johnywith1n
Copy link

johnywith1n commented Jul 7, 2014

I have the following steps:

  1. create an index
  2. index 2 documents
  3. get suggestions
  4. delete a document
  5. get suggestions

When I do this with a set of curl commands, the 5th step doesn't return any suggestions as expected since I deleted the document corresponding to it. But when I do these steps with the library, a suggestion is returned for the deleted document.

Here's the project with the script and curl commands: https://github.com/johnywith1n/elasticsearch-problem

@spalger

This comment has been minimized.

Copy link
Member

spalger commented Jul 7, 2014

Hey @johnywith1n

Here are a few things that might help you:

  1. there is no reason to use q.promisifyAll. We will return a promise if you don't send a callback.
  2. you can pass refresh: true to most commands that modify an documents (including delete and create).
  3. I'm not sure that sending index: '' is doing what you want. Try my other suggestions and let me know if you are still experiencing the issue.
@johnywith1n

This comment has been minimized.

Copy link

johnywith1n commented Jul 7, 2014

I'm using promisifyAll since elasticsearch.js is on bluebird 1.x and not 2.x.

Adding refresh: true worked. But I'm curious, what's the difference between passing in refresh to the create and delete commands and client.indices.refresh?

Also, isn't setting refresh:true to each operation bad for performance? I thought ES automatically refreshes, but these deleted documents still show up in the suggestions after a day.

@spalger

This comment has been minimized.

Copy link
Member

spalger commented Jul 7, 2014

Yes, setting refresh:true for every operation would result in bad performance, but I was under the impression you were just trying to get this script working. Seeing deleted documents show up in suggestions after days implies that something else is happening entirely. (automatic refreshes happen every second by default)

I tried running your curl script and got:

{"acknowledged":true}
{"_index":"test","_type":"test","_id":"1","_version":1,"created":true}
{"_index":"test","_type":"test","_id":"2","_version":1,"created":true}
{"_shards":{"total":5,"successful":5,"failed":0}}
{"found":true,"_index":"test","_type":"test","_id":"1","_version":2}
{"_shards":{"total":5,"successful":5,"failed":0}}

I also get very similar output when I run the same commands in Marvel.

As for the promises bit: try passing defer: q.defer to the client constructor and it will use your version of bluebird.

@spalger spalger closed this Jul 7, 2014

@spalger

This comment has been minimized.

Copy link
Member

spalger commented Jul 7, 2014

This is likely not an issue with the client itself, but I didn't mean to close the ticket.

@spalger spalger reopened this Jul 7, 2014

@johnywith1n

This comment has been minimized.

Copy link

johnywith1n commented Jul 7, 2014

Yea, I think you're right about it not being the client.

I tried refreshing the index, but that didn't update the suggester. It looks like the completion suggester isn't updated with regards to deleted documents, until a merge occurs. But a merge doesn't occur until there are enough deleted documents. So in the mean time, deleted documents will still be suggested.

Is the completion suggester supposed to only be updated by a merge and not just a refresh? I may have to switch to just a regular search if that's the case.

@spalger

This comment has been minimized.

Copy link
Member

spalger commented Jul 7, 2014

Found this is the docs: "The suggest data structure might not reflect deletes on documents immediately. You may need to do an Optimize for that. You can call optimize with the only_expunge_deletes=true to only cater for deletes or alternatively call a Merge operation."

Right at the bottom of this section

@spalger spalger closed this Jul 7, 2014

@missinglink

This comment has been minimized.

Copy link
Contributor

missinglink commented Aug 28, 2014

FWIW I have the same issue and running _optimize doesn't fix the issue. This is clearly not a client bug.

I use the suggester a load and it only happens once in a blue moon. In this case I was running an import on 8 cores maxing out the CPU and ES got a bit confused; now those suggest records are stuck in there forever even after their documents have been deleted.

@svola

This comment has been minimized.

Copy link

svola commented Jan 28, 2015

For me, calling optimize after deleting documents worked.
The completion suggester was updated then and only then.

@ebuildy

This comment has been minimized.

Copy link

ebuildy commented Feb 16, 2015

I am running the same weird behavior, documents are deleted but still showing up in completion suggester. I tried _optimize, cache/_clear etc... still present.

How can I debug this? (I mean for instance how can I get the document ID the suggest search gives...)

Thanks,

@svola

This comment has been minimized.

Copy link

svola commented Feb 17, 2015

You can add the id of the document to the payload-section at index-time:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html#indexing

For me it worked after calling optimize.... Good luck!

@ebuildy

This comment has been minimized.

Copy link

ebuildy commented Feb 18, 2015

I have a ton of documents, it's working sometime (even without optimize) but sometime not. I believe suggest use FST engine, maybe there are orphans created after a delete operation, I will investigate how ES deals with this and the replication.

@micpalmia

This comment has been minimized.

Copy link

micpalmia commented Feb 19, 2015

I'm having the same issue. Have anybody investigated and/or opened a ticket on the ES github already?

@mcanzerini

This comment has been minimized.

Copy link

mcanzerini commented Mar 24, 2015

I'm having the same issue too. Any suggestion ?

@missinglink

This comment has been minimized.

Copy link
Contributor

missinglink commented Mar 24, 2015

It's a known issue [1] but will not likely be fixed until the release version es@2.0 [2]

It's a problem related to the FST data structures and certainly not a bug in elasticsearch-js

[1] elastic/elasticsearch#7761
[2] elastic/elasticsearch#8909

@hgw2101

This comment has been minimized.

Copy link

hgw2101 commented Mar 2, 2016

Does calling optimize every time after deleting a document cause any performance concerns?

@spalger

This comment has been minimized.

Copy link
Member

spalger commented Mar 3, 2016

@hgw2101 yes, it does.

@Kamapcuc

This comment has been minimized.

Copy link

Kamapcuc commented Oct 25, 2016

The same bug occurs whan you change suggest context of document. 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment