Skip to content
This repository has been archived by the owner on Jan 25, 2024. It is now read-only.

Schedulers documents are not cleared #136

Open
gmiejski opened this issue Dec 10, 2016 · 6 comments
Open

Schedulers documents are not cleared #136

gmiejski opened this issue Dec 10, 2016 · 6 comments

Comments

@gmiejski
Copy link

I've come on a bug [or not implemented feature?] where scheduler documents are not cleared from mongo, leading to new entries with each deployment - seems trivial but it's harder but can make some things properly (besides growing collection) - like monitoring of how many active schedulers there are.

I propose 2 solutions - either make other cluster instances scan schedulers collection and periodically clean old ones (or make it configurable) or simply change/add a field "lastCheckingDate" into the schedulers objects (with Date type) - then one can simply add a TTL index to mongo, and everything would be fine.

Please tell me if I hadn't noticed something important, or maybe creating a TTL index for old, inactive cluster instances would break some stuff I'm not aware of. (but placing TTL index for like 1 hour would probably not break such things - still not sure how about lock or that kind of stuff)

Please share your thoughts!

@michaelklishin
Copy link
Owner

A field with last scheduler activity sounds good to me. When would it be set, however?

@gmiejski
Copy link
Author

I was thinking that this lastCheckingDate could be calculated from lastCheckinTime and simply stored applied in SchedulerDao.createUpdateClause().

However, there are two points I have just found out:

  1. When you recover state, you always recover state by the same instanceId - which won't work with AUTO generated instancedIds in CLUSTERED mode - we should also recover old clusterInstances, their lock and clear those too.
  2. There seems to be some kind of a bug, because I can see growing number of trigger locks, without corresponding triggers - have you come across such thing?

But I have studied a bit how it is implemented in original Quartz, and they clear old records during checkingIn - when they also recover old triggers.

Considering the clustered mode, the simplest solution seems to be not-the-best-option.
I would go for recovering old cluster states during checking in, as it is done in Quartz JobStoreSupport, what do you think? How about clearing all locks acquired by non-active instances together with a document in _schedulers for not-checking-in instance?

@otlg
Copy link

otlg commented Jan 26, 2021

Hi,

Seems the _schedulers collection is never cleaned up. Might be problematic in k8s for stateless services.
Any plan to solve it?

Thanks

@michaelklishin
Copy link
Owner

Hi,

Seems the _schedulers collection is never cleaned up. Might be problematic in k8s for stateless services.

Any plan to solve it?

Thanks

This is open source software. You are welcome to contribute a fix.

@otlg
Copy link

otlg commented Feb 1, 2021

If there is no plan and there is no other workaround, I can contribute a fix.
Seems strange that the issue is opened for 6 years and still no fix for it.

@gmiejski
Copy link
Author

gmiejski commented Feb 1, 2021

As far as I remember, the issue seemed easy to fix, but during implementation I've came across more complicated details, which I've described above. I'm not using this cool library anymore, so I won't be able to help, but keeping my fingers crossed 🤞

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants