Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] add the ability to delete documents from the destination index #67916

Closed
hendrikmuhs opened this issue Jan 25, 2021 · 2 comments · Fixed by #67832
Closed

[Transform] add the ability to delete documents from the destination index #67916

hendrikmuhs opened this issue Jan 25, 2021 · 2 comments · Fixed by #67832

Comments

@hendrikmuhs
Copy link
Contributor

hendrikmuhs commented Jan 25, 2021

Transform provides a persistent view on data by pivoting them or providing the latest state. With "continuous mode" this view gets updated and kept up-to-date.

However, transform keeps adding new data. However you might want to age out old data or remove data from the persistent view on other criteria. Especially for latest we see a lack of functionality. With latest you might want to delete entities that haven't been seen for a longer period. E.g. if you transform host information you might want to remove decommissioned hosts.

Overall integration

Retention will be part of the overall transform configuration:

{
    "source": { ... },
    "dest": { ... },

    "pivot": { ... },   
OR  
    "latest": { ... },
    "retention_policy": {
        "name": {...}
    }

Therefore, retention_policy will be available for both pivot and latest.

The choice for nesting at an extra level gives us an extension point for later. The first retention_policy to be implemented is time:

Time based retention

    "retention_policy": {
        "time": {
            "field": "@timestamp",
            "max_age": "30d"
        }
    }

This policy requires you to configure a timestamp field (likely the same field as used for sync) and a max_age. Data that is older than max_age is considered outdated and will be removed as part of checkpointing:

Retention integration into checkpoints

Retention will be implemented as last step of checkpointing, that means it runs at the final phase of checkpointing. When a checkpoint is completed, data that should be deleted as defined by the policy. Retention is calculated based on the checkpoint time.

Retention policy updating

Updating the retention policy is supported by _update. If _update is called on a running transform, update gets effective when a new checkpoint gets started. The currently running checkpoint will use the current policy.

FYI: @elastic/ml-ui it would be good to support retention policy in the update fly-out

Retention policy stats

For measuring the retention policy we add 2 counters to _stats:

documents_deleted

Total number of documents deleted in the transform destination index by this transform.

delete_time_in_ms

Cumulative sum of time spend deleting documents in the transform destination index by this transform.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

@hendrikmuhs
Copy link
Contributor Author

FYI @pzl

hendrikmuhs pushed a commit that referenced this issue Feb 8, 2021
#67832)

add a retention policy to transform to delete data that is considered outdated as part of a
transform checkpoint.

fixes #67916
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Feb 10, 2021
elastic#67832)

add a retention policy to transform to delete data that is considered outdated as part of a
transform checkpoint.

fixes elastic#67916
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Feb 10, 2021
elastic#67832)

add a retention policy to transform to delete data that is considered outdated as part of a
transform checkpoint.

fixes elastic#67916
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Feb 10, 2021
elastic#67832)

add a retention policy to transform to delete data that is considered outdated as part of a
transform checkpoint.

fixes elastic#67916
hendrikmuhs pushed a commit that referenced this issue Feb 11, 2021
…nsform (#67832) (#68814)

add a retention policy to transform to delete data that is considered outdated as part of a
transform checkpoint.

backport #67832
fixes #67916
easyice pushed a commit to easyice/elasticsearch that referenced this issue Mar 25, 2021
elastic#67832)

add a retention policy to transform to delete data that is considered outdated as part of a
transform checkpoint.

fixes elastic#67916
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants