Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Job sweeper fails to remove deleted jobs #65

Closed
dbbaughe opened this issue Jul 2, 2020 · 2 comments
Closed

Job sweeper fails to remove deleted jobs #65

dbbaughe opened this issue Jul 2, 2020 · 2 comments
Labels
bug Something isn't working

Comments

@dbbaughe
Copy link
Contributor

dbbaughe commented Jul 2, 2020

Found from this forum discussion in ISM: https://discuss.opendistrocommunity.dev/t/ism-attempting-to-interact-with-an-obsolete-index/3224

Job scheduler has an in-memory map that contains the scheduled jobs that are scheduled to run. When a job document is created, updated, or deleted this map is updated with the appropriate action. In this specific case the delete somehow failed which left a job that was still executing every 2 hours even though it didn't exist anymore. Ideally the sweeper would catch this and resolve the failure, but the sweeper has a bug where it doesn't remove jobs that were deleted.

For reference:

The sweeper is a background process that sweeps the job indices for job documents to schedule, re-schedule, and de-schedule documents. It does this on an interval defined by the sweep period. Every execution it will sweep all indices that were registered by plugins extending job scheduler which in turn will sweep the shards for each index. This sweepShard function is the one with the bug that is not handling job documents that were deleted from the index.

@dbbaughe dbbaughe added the bug Something isn't working label Jul 2, 2020
@ftianli-amzn
Copy link

ftianli-amzn commented Sep 29, 2020

Thanks @dbbaughe for the detailed and well-organized explanation.
Through your description, I think the problem is caused here: when de-scheduling fails, there is no backup to retry or anything else to deal with the failure .

@dbbaughe
Copy link
Contributor Author

Closing in favor of: opensearch-project/job-scheduler#96

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants