Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scanner Deletion Detection #175

Open
nsoft opened this issue Feb 25, 2023 · 0 comments
Open

Scanner Deletion Detection #175

nsoft opened this issue Feb 25, 2023 · 0 comments
Milestone

Comments

@nsoft
Copy link
Owner

nsoft commented Feb 25, 2023

At the moment none of our scanners have the ability to detect if a previously indexed document has disappeared. IIRC the old version of File Scanner that was based on directory watches did have this, but it had to be scrapped due to issues with the JDK implementation (see #130).

initial thoughts:

  1. Build up a data structure to hold the list of Id's seen during a scan (pick an efficient one)
  2. At the end of the scan identify any not seen during the scan, and then if the last status for that doc is not "delete" or "error" send a delete.
  3. Serialize & persist this structure at the end of each scan.
  4. When starting up a scanner check for and load the serialized structure

Also, make sure processor related documentation/javadocs clearly mention the possibility that the document may represent a deletion (often processors will want to ignore these documents), and make sure our provided implementations have an option to ignore deletes (or not).

@nsoft nsoft added this to the 1.1 milestone Feb 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant