Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tasks: add margin time window for search to purge deleted documents #244

Merged
merged 1 commit into from
Aug 25, 2023

Conversation

zzacharo
Copy link
Member

@zzacharo zzacharo commented Aug 24, 2023

Soft deleted records are being kept in Opensearch for 60 seconds by default. That is happening for control of concurrent processes. In our case, when a soft deleted draft is cleaned up but it happened to be soft deleted in less than the time window that Opensearch finally cleans it up, then editing a new record is resulting in version conflict.

This PR adds a new kwarg to deduct the default deletion time and it can be passed as a parameter in case someone has configured the corresponding setting in a different value.

Comment on lines 239 to 241
:param int search_gc_deletes: default time search is keeping deleted documents for
control of concurrent operations. For more information see:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete.html#delete-versioning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param int search_gc_deletes: default time search is keeping deleted documents for
control of concurrent operations. For more information see:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete.html#delete-versioning
:param int search_gc_deletes: time in seconds, corresponding to the search cluster
setting `index.gc_deletes` (see
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete.html#delete-versioning),
default to 60 seconds. Search cluster caches deleted documents for `index.gc_deletes` seconds.

Comment on lines 236 to 239
"""Clean up (hard delete) all the soft deleted drafts.

The drafts in the last timedelta span of time won't be deleted.
:param int search_gc_deletes: default time search is keeping deleted documents for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Clean up (hard delete) all the soft deleted drafts.
The drafts in the last timedelta span of time won't be deleted.
:param int search_gc_deletes: default time search is keeping deleted documents for
"""Clean up (hard delete) all the soft deleted drafts.
The soft-deleted drafts in the last timedelta span of time won't be deleted, including
`search_gc_deletes` seconds timedelta: this ensures that only drafts fully removed
from the search cluster can be hard-deleted (e.g. when `td` is very short), avoiding
search conflicts.
:param int search_gc_deletes: default time search is keeping deleted documents for

@zzacharo zzacharo merged commit 748aae1 into inveniosoftware:master Aug 25, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants