New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Transform] add the ability to delete documents from the destination index #67916
Labels
Comments
Pinging @elastic/ml-core (:ml/Transform) |
FYI @pzl |
2 tasks
Closed
1 task
hendrikmuhs
pushed a commit
to hendrikmuhs/elasticsearch
that referenced
this issue
Feb 10, 2021
elastic#67832) add a retention policy to transform to delete data that is considered outdated as part of a transform checkpoint. fixes elastic#67916
hendrikmuhs
pushed a commit
to hendrikmuhs/elasticsearch
that referenced
this issue
Feb 10, 2021
elastic#67832) add a retention policy to transform to delete data that is considered outdated as part of a transform checkpoint. fixes elastic#67916
hendrikmuhs
pushed a commit
to hendrikmuhs/elasticsearch
that referenced
this issue
Feb 10, 2021
elastic#67832) add a retention policy to transform to delete data that is considered outdated as part of a transform checkpoint. fixes elastic#67916
hendrikmuhs
pushed a commit
that referenced
this issue
Feb 11, 2021
34 tasks
easyice
pushed a commit
to easyice/elasticsearch
that referenced
this issue
Mar 25, 2021
elastic#67832) add a retention policy to transform to delete data that is considered outdated as part of a transform checkpoint. fixes elastic#67916
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Transform provides a persistent view on data by pivoting them or providing the latest state. With "continuous mode" this view gets updated and kept up-to-date.
However, transform keeps adding new data. However you might want to age out old data or remove data from the persistent view on other criteria. Especially for
latest
we see a lack of functionality. Withlatest
you might want to delete entities that haven't been seen for a longer period. E.g. if you transform host information you might want to remove decommissioned hosts.Overall integration
Retention will be part of the overall transform configuration:
Therefore,
retention_policy
will be available for bothpivot
andlatest
.The choice for nesting at an extra level gives us an extension point for later. The first
retention_policy
to be implemented istime
:Time based retention
This policy requires you to configure a timestamp field (likely the same field as used for
sync
) and amax_age
. Data that is older thanmax_age
is considered outdated and will be removed as part of checkpointing:Retention integration into checkpoints
Retention will be implemented as last step of checkpointing, that means it runs at the final phase of checkpointing. When a checkpoint is completed, data that should be deleted as defined by the policy. Retention is calculated based on the checkpoint time.
Retention policy updating
Updating the retention policy is supported by
_update
. If_update
is called on a running transform, update gets effective when a new checkpoint gets started. The currently running checkpoint will use the current policy.FYI: @elastic/ml-ui it would be good to support retention policy in the update fly-out
Retention policy stats
For measuring the retention policy we add 2 counters to
_stats
:documents_deleted
Total number of documents deleted in the transform destination index by this transform.
delete_time_in_ms
Cumulative sum of time spend deleting documents in the transform destination index by this transform.
The text was updated successfully, but these errors were encountered: