Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paging support for IScheduler #1273

Open
zihotki opened this issue Aug 12, 2021 · 13 comments
Open

Paging support for IScheduler #1273

zihotki opened this issue Aug 12, 2021 · 13 comments

Comments

@zihotki
Copy link
Contributor

zihotki commented Aug 12, 2021

Current interfaces to fetch jobs and triggers don't support paging. But in case if there are many jobs or triggers the methods take some time to return data (see maikebing/SilkierQuartz#57 for example). Also it's not feasible to use them for HTTP API for the same reason.

IScheduler should be extended to provide methods for fetching paged data.

I created PR #1272 as a draft of how the API should look like. I considered extending existing methods instead of creating overloads but in order to maintain Application Binary Interface compatibility overloads should be used. That's due to the way how default values for methods are used.

In short my suggestion is to provide additional parameters for methods returning collections:

  • long take
  • long skip

These parameters map nicely to database queries. And using these parameters it's possible to implement simple paging. Implementing paging in form of showing page numbers like [1] [2] **3** [4] [5] .. [10] will still be troublesome since that requires to know precise amount of items for the endpoint. I suggest to de-scope it and first implement the simplest paging described above.

@asulwer
Copy link

asulwer commented Aug 12, 2021

why not leave this up to the developer? if you have a large list then use Take & Skip to achieve paging. example: if the developer is using a DB and EF to store the data, using Take & Skip will create a query to retrieve just the data you want.

Take & Skip are not specific to EF and a DB, but to a collection of objects.

Jobs and Triggers are a collection of objects, using Take & Skip on the collection would generate the desired result, paging. why do you need to wrap Take & Skip into Quartz with a method that accomplishes the same thing? Take and Skip already exist, use it directly. this could avoid unnecessary complexity and obfuscation. is it necessary to create a method that wraps Take and Skip?

@zihotki
Copy link
Contributor Author

zihotki commented Aug 16, 2021

@asulwer in short, you're refering to service level paging while I suggest to implement support for database level paging.

The long version, the existing LINQ methods Take and Skip work on database level only when they are used on top of IQueryable (of course, if the database provider supports it). In case of Quartz, its methods return materialized collections. Implementing paging on top of them is suboptimal because you'll have to load ALL items into memory and only after that you can apply paging. It will work fine for tens or hundreds of items but not for thousands.

To fix that and get database level paging you either have to return IQueryable and use EF for loading data inside of Quartz, or implement additional parameters to do the paging. First option means huge rewrite, the second is a minor enhancement. The changes I propose are the keys to enable paging on the database level so that the method calls will not load ALL data but only a requested page of data.

@asulwer
Copy link

asulwer commented Aug 16, 2021

Quartz supports, directly, saving/loading from a db? if that were true then yes a rewrite 'might' be a large undertaking. assuming it loaded all of its data from the db leaving you stuck with client side manipulation. i do not see support for loading/saving from a db, did i miss something?

are you referring to ado.net job store? maybe a fork of that?

yes, loading any data from a db without support for retrieving a portion is 'costly'. especially the larger the data being retrieved
.

@zihotki
Copy link
Contributor Author

zihotki commented Aug 17, 2021

@asulwer I don't get your question, sorry. Let me elaborate the idea better. The Quartz's methods from IScheduler like:

  • GetJobGroupNames
  • GetTriggerGroupNames
  • GetPausedTriggerGroups
  • GetJobKeys
  • GetTriggersOfJob
  • GetTriggerKeys
  • GetCalendarNames

make direct calls to the store (a database, for example) using a store driver that was setup. IScheduler is a standard interface to interact with jobs and triggers. For example, it's used in this HTTP API implementation.

Some of forementioned methods from IScheduler could potentially return thousands items. And currently they all call the store and return materialized collection (a collection with instances of loaded objects), please see StdScheduler for details. That could take minutes and that's unacceptably long for an API which could be used by apps showing the data to users. In order to fix them they need to be extended with paging parameters. So that it would become possible to load only a page of data, for example - Job Keys.

Th associated PR contains overrides for the methods which could potentially return thousands items. If the change of the interface is approved then I'll proceed with implementation and there I will need to also extend IJobStore.

Extending only IJobStore or forking a Job Store I don't see as an option. That's because a configured instance is not available on IScheduler or any other public methods. You have to initialize one yourself using the same settings. It's not a viable solution and can't be used to implement an HTTP API plug-in like the API prototype I referred before. Therefore 'IScheduler` needs to be extended.

Another option could be implementing cross-database EF Core mapping on top of existing schema and use EF Core methods to fetch the data. I'm also currently exploring this possibility. That's why current PR is only for a small change of interface currently without implementation to use it for discussion.

@asulwer
Copy link

asulwer commented Aug 17, 2021

thank you for this detailed explanation. i wasn't aware, that Quartz directly, accessed the underlying job store. in this case a db or http api implementation.

@zihotki
Copy link
Contributor Author

zihotki commented Aug 27, 2021

@lahma Would you be able to provide a feedback about the interface updates so that I can proceed with the implementation?

@lahma
Copy link
Member

lahma commented Aug 27, 2021

@zihotki thanks for your patience. I'll try to sort my thoughts this weekend the latest. Changing this interface will mean it cannot land in 3.x, but I've been planning to shift main branch to more backwards-breaking development and branch out the 3.x maintenance as feature set seems quite stable and bug-free.

@lahma
Copy link
Member

lahma commented Aug 28, 2021

OK, good discussion happening here. My thoughts if we are going to implement this:

  • we are targeting Quartz 4.x with these, meaning it might take long to get an official release, but MyGet feed will allow usage
  • not sure about the parameter names, but we should have defaults, int take = int32.MaxValue, int skip = 0, not sure if anyone will hit int.MaxValue limit, you've used long which should be fine too
    • keeping old code working while still binary incompatible changes though
  • this needs to flow to job stores and delegates too of course

Background: Quartz.NET follows the Java project quite closely, to make fixes and features portable between the two. Java version has been hibernating for a long time, but there seems to be a recent uptick in activity.

I checked the linked issue and as I have not gone deep into the implementation, I'm just guessing here, but are the actual keys the problem or something else?

I would think that you can load/stream keys quite quickly from the persistent store (even thousands) as they are lightweight, just a pair of strings. But when you want to show trigger details there's no batch API for that, like give me triggers with these keys (1...50). My guess is that the UI is doing 50 GetTrigger(key) calls - and possibly even an RPC call with .NET Remoting.

Currently there's only GetTriggersOfJob() that allows retrieving more than one trigger. So is the actual need GetTriggers(collectionOfKeys)? Also loading of triggers/jobs also means loading the JobDataMap which is rarely of interest in the UI until you dig into the details (view) of single job/trigger. One could question whether it makes sense to return full ITrigger from collection returning methods and instead return a record TriggerHeader(string Name, string Group, string TriggerType, DateTimeOffset Start, DatetimeOffset End) or something similar.

@seangwright
Copy link

I'm currently looking for a good scheduling lib for an ASPNET Core app. Quartz seems to have features that align with my requirements... but one of those requirements is an administration UI.

Having read the issues about lacking paging, I probably wouldn't be able to use Quartz until v4.0, so this topic is definitely interesting to me.

  • not sure about the parameter names, but we should have defaults, int take = int32.MaxValue, int skip = 0,
    not sure if anyone will hit int.MaxValue limit, you've used long which should be fine too

This could come from configuration in ConfigureServices() and use Math.Clamp() to limit it. If no value is specified, a default would be used.

@lahma
Copy link
Member

lahma commented Aug 30, 2021

Ideally I would see a project like SilkierQuartz or other established solution to be integrated into Quartz, allowing original maintainer(s) to improve and main Quartz project to ensure integration works as expected. I think a stable HTTP API is a first step, and has to account for paging.

@seangwright
Copy link

Even if Quartz didn't have an HTTP API, downstream projects like SilkierQuartz could integrate successfully as long as the data retrieval/update APIs existed within Quartz.

It might be best if that HTTP API remains part of the downstream projects because the features it exposes are more a dependency of the UI than what data exists in Quartz' data persistence layer.

Using SilkierQuartz as a model of what types of data retrieval scenarios can be expected seems like a good starting point (eg Paging).

@zihotki
Copy link
Contributor Author

zihotki commented Sep 6, 2021

@lahma see my response below.

we are targeting Quartz 4.x with these

That's to be expected.

not sure about the parameter names, but we should have defaults, int take = int32.MaxValue, int skip = 0, not sure if anyone will hit int.MaxValue limit, you've used long which should be fine too

Makes sense.

this needs to flow to job stores and delegates too of course

That's also clear, the referenced PR is only to start discussion.

I'm just guessing here, but are the actual keys the problem or something else?

It's also a part of a problem. Constantly loading a few thousands of keys is a bad idea, for a page of 20 items that's at least 100 times the overhead. Loading of all keys is acceptable for a background service but not for a HTTP API. And if running on a cloud db, that may cost you a fortune. So I would prefer to adjust it as well.

Currently there's only GetTriggersOfJob() that allows retrieving more than one trigger. So is the actual need GetTriggers(collectionOfKeys)? Also loading of triggers/jobs also means loading the JobDataMap which is rarely of interest in the UI until you dig into the details (view) of single job/trigger. One could question whether it makes sense to return full ITrigger from collection returning methods and instead return a record TriggerHeader(string Name, string Group, string TriggerType, DateTimeOffset Start, DatetimeOffset End) or something similar.

That's a very valuable feedback! I'll take it into account if we decide to follow this route.

I'm going withing a few days to propose another way of implementing the desired functionality. I have a working prototype of EF.Core models and schema that can be easily integrated. The models and schema can be used both for database migrations and for GET endpoint. All modifications, of course, must go through the Quartz utils.

@lahma
Copy link
Member

lahma commented Sep 6, 2021

TBH I'm not super excited to introduce EF to the mix for data access. It has its merits and can make things easier, but the core library doesn't currently have any deps to "fat libraries".

Current data access implementation is quite low-level and even archaic, 1:1 ideology with Java version, and doesn't require dependencies. Of course good API allows you to switch to EF provider with extra package dependency.

The APi requires some overhaul to support document databases in efficient way anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants