Proposal: make persistent EJB timers non-persistent or not rely on database #5345

poikilotherm · 2018-11-27T13:58:27Z

This is related to #5292 and a story of that epic.
As DataverseTimerServiceBean.java has been mostly written and maintained by @landreev, this most certainly is for him...

Context

Currently, in glassfish-setup.sh you need to setup a database connection for storage of persistent EJB timers.

Reusing the database for Dataverse for EJB timers is alright, but it is not easy to depict when heading for containers. It is much easier for container approaches to use application scoped JDBC connections, but those seem not to be reusable for EJB timers.

There seemed to be some issues in the past, too. #3672, #3789 and more.

Proposal

Just remove the persistent nature of timers.

They are created on each startup on the timing server (see docs).
In a single node setup, when the server is down, IMHO there is no need to have the timers persistent in a database. No one will execute them. Executing them after the server comes back seems with HarvestingClient and ExportClient as current use cases not very important.
In a multi node setup, when the timing server is down, you need to change it to another server, so they get executed again. This will result in broken persistent timers anyway, as they had been created with the name of the dead timing server.

When persistance is needed in a multi node setup in the future, there is Hazelcast in Payara which supports storing timers as long as at least one cluster member is online...

Comments are very much appreciated. :-)

The text was updated successfully, but these errors were encountered:

poikilotherm · 2018-12-05T15:39:37Z

Following @pdurbin suggestion about DDD (doc driven development), I request a review of my plans in #5371 by @landreev and potentially others.

Cheers 😄 🍺

landreev · 2018-12-07T20:55:05Z

@poikilotherm
As a quick note, yes, this makes perfect sense.

At present, we are not benefiting from the fact that the timers are stored in the database, at all. It's kind of the other way around: you can say that we go to the extra trouble of making sure that our timers are NOT persisted. In a multi-node cluster, we specifically try to run all the timers on just one of the nodes (for various reasons); AND we clear and re-create all the timers on startup.

So yes, it will be cleaner, not to bother saving them in the database in the first place.

(we actually have tried something similar, in our own production cluster at Harvard a while ago; I can't remember now why we chose not to make it the default solution in the distribution...)

I may have questions about some specifics of the PR (#5371) though; I'll be in touch.

poikilotherm · 2018-12-14T12:37:58Z

Hi @landreev, @pdurbin told me you guys talked about this and you seemed to like it. Should I start actual code work on this?

@schedule

Removes the timer handling code for the export of Datasets and OAISets from DataverseTimerServiceBean and uses non-persisting @schedule annotation instead. Checks for execution on timer server only are in place.

@schedule

Removes the timer handling code for the export of Datasets and OAISets from DataverseTimerServiceBean and uses non-persisting @schedule annotation instead. Checks for execution on timer server only are in place.

poikilotherm · 2018-12-17T20:52:30Z

Hey @landreev,

could you give my comments in 81fbffe a look and discuss this with others? I am uncertain if this refactoring is a chunk to big to fit inside this PR. (Happy to do the necessary code work).

And while you are in there, you could take a look at a7cc7b8 which puts the first (but not all) @Schedule annotations in place.

Thx!

landreev · 2018-12-17T22:51:18Z

@poikilotherm I'm looking into it now (was out on Friday).
Yes, I was confused/didn't get the part that it was documentation only in the PR; and that the code was not yet written.

Hey @landreev,

could you give my comments in 81fbffe a look and discuss this with others? I am uncertain if this refactoring is a chunk to big to fit inside this PR. (Happy to do the necessary code work).

And while you are in there, you could take a look at a7cc7b8 which puts the first (but not all) @Schedule annotations in place.

Thx!

@landreev

* This removes the old timer handling completly. No persistent timers present anymore. * Removes timer handling from the Harvester Client Admin UI backing. * This cannot be unit tested as stated in the Javadocs and accepted by @landreev. * This might need refactoring when problems arise from the simple execution approach used for now. Prior to this commit, harvest clients ran on independent timers and thus on independent threads. Now they are executed sequentially, which might lead to problems when harvests last longer and more clients want to run an hour later.

…elates to IQSS#5345.

poikilotherm · 2018-12-18T15:17:28Z

Hi @landreev feel free to look into my recent additions to #5371 😄 I added the necessary bits to HarvesterServiceBean, added lots of docs and removed the old timer setup.

Gave this a shot in VM, harvesting from Harvard Dataverse. It works, but we might run into trouble with such large harvests. Nothing would be really "lost", but it would take a few days to get this loads of data inside a new Dataverse installation. Should I add presumptious support for this (e.g. using JBatch or similar)?

poikilotherm · 2019-01-03T11:12:36Z

@djbrooke or @pdurbin could you please move this to code review again? @landreev needs to take a look 😉

pdurbin · 2019-01-03T12:08:43Z

@poikilotherm hi! Two things:

Can you please merge the latest from "develop" into your branch? We released 4.10.
Pull request WIP: 5345: Nonpersistent EJB timers #5371 still say "WORK IN PROGRESS - DO NOT MERGE". Is this still accurate?

poikilotherm · 2019-01-03T12:26:11Z

IMHO this is still a WIP as long as @landreev does not express his satisfaction with this. Especially the possible design drawback mentioned above (large harvests causing delay for others) needs a decision from him... 😉

(Merging done.)

landreev · 2019-01-04T15:59:18Z

The issue with long harvesting jobs - it's not really anything new/being introduced in this PR, correct?
As long as the harvesting client is properly locked, thus preventing the service to attempt further harvests while it's in progress, it should be ok for the job to take a while, I think.

I'm currently looking at the overall code in the PR; it looks good overall, I just need to review more carefully the part where the scheduled client jobs are started.

landreev · 2019-01-04T16:05:28Z

(I'm going to test the branch and run some harvests too)

poikilotherm · 2019-01-04T22:30:46Z

It is not really new, no. Before my code changes, the harvests are executed in parallel, as each harvester has its own timer going off and triggering the harvest a configured.

The code changes lead to a serialisation of harvests, which might be suboptimal with bigger harvest in mind. The @Lock(WRITE) annotation should ensure that no other harvest is executed as long as the long-running harvest is running. However, this is currently not unit- or integration testable and needs manual testing. 😞

"A while" is a very unsharp term. When I harvested Harvard Dataverse, many thousand datasets where collected and it took a few hours until I aborted unfinished. OAI seems pretty slow...

There are proper solutions around this like starting a background batch job from the scheduled code, but this might be presumptios optimization.

landreev · 2019-01-08T00:11:43Z

I thought about this/discussed it a bit w/others, and I do think the serialization of harvests is a serious problem. I don't think we want to put all the harvests on hold, for hours (or days, potentially) while a large collection is being populated. And I'm not sure getting rid of the timer entries in the database is by itself worth a new limitation like this.

I was actually hoping that this new code would be starting the individual jobs asynchronously (you mentioned this as a solution in your last comment). So we would be back to running harvests in parallel... Why not just do that - just use a method with the @asynchronous annotation? We will not be able to use the @lock(WRITE) annotation then; but that just means we will have to rely on the current system of the database locks on individual harvestingClients. Seems like a practical enough solution, I think. (?)

poikilotherm · 2019-01-16T15:52:51Z

Hey guys, I haven't forgotten this issue, I have been busy with other stuff. Sry for the delay.

I will look into using proper JSR 352 (JBatch) handling for this. IMHO using @Async is no good idea for a job potentially running for a week while harvesting. JBatch is already used for other batch jobs, so its a dependency already.

Will keep in mind multiple glassfish servers as pointed out by @pdurbin on IRC. Using ShedLock might be a valid option.

This will take a while, as I am on vacation for the two weeks ahead.

landreev · 2019-01-30T01:23:01Z

Hi Oliver, there's definitely no rush, working on this issue. We are fixing something that's not really broken, after all...
JBatch would indeed make sense to use, as an alternative for large harvests that can take days. But how would that work in practice? I mean, you don't know how large a harvest is going to be until you actually run it... We could of course make it so that if a scheduled harvest (i.e., a harvest that runs on a timer) finds more than N new datasets, it gives up and exits with "this OAI server has too much content; run the harvest as a batch job" somehow.
Or did you have something else in mind?

poikilotherm · 2020-11-10T11:27:17Z

With the move to Payara done since 5.0, we could stop relying on a EJB timers PostgreSQL connection two ways (showstopper for #6819 and #7048).

Move to non-persistent timers
- Maybe use MicroProfile Config to enable configuration when they should go off (or similar mechanism via a database setting. MPConfig preferred.)
- As outlined above once, we could use ShedLock for coordinating a fleet and avoid harverster jobs pilling up.
  This might be easier than switching to Hazelcast Domain Data Grid, as we can provide the locking database table on our own via Flyway.
  Benefits: a) no hassle with app server level db connection, b) more control over execution, maybe in parallel, c) compatible with Payara Micro and other smaller app servers.
- I don't think switching to Java Concurrency (ManagedScheduledExecutorService) is worth it. See here.
Stay with persistent timers, but use Hazelcast Domain Data Grid for storing them.
- This could still involve MicroProfile Config for configuring them.
- Execution is coordinated between the different nodes.

Maybe other components like File PIDs, Draft PIDs etc benefit from this, too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: make persistent EJB timers non-persistent or not rely on database #5345

Proposal: make persistent EJB timers non-persistent or not rely on database #5345

poikilotherm commented Nov 27, 2018 •

edited

poikilotherm commented Dec 5, 2018

landreev commented Dec 7, 2018

poikilotherm commented Dec 14, 2018

poikilotherm commented Dec 17, 2018

landreev commented Dec 17, 2018

poikilotherm commented Dec 18, 2018 •

edited

poikilotherm commented Jan 3, 2019

pdurbin commented Jan 3, 2019

poikilotherm commented Jan 3, 2019 •

edited

landreev commented Jan 4, 2019

landreev commented Jan 4, 2019

poikilotherm commented Jan 4, 2019

landreev commented Jan 8, 2019

poikilotherm commented Jan 16, 2019 •

edited

landreev commented Jan 30, 2019

poikilotherm commented Nov 10, 2020 •

edited

Proposal: make persistent EJB timers non-persistent or not rely on database #5345

Proposal: make persistent EJB timers non-persistent or not rely on database #5345

Comments

poikilotherm commented Nov 27, 2018 • edited

Context

Proposal

poikilotherm commented Dec 5, 2018

landreev commented Dec 7, 2018

poikilotherm commented Dec 14, 2018

poikilotherm commented Dec 17, 2018

landreev commented Dec 17, 2018

poikilotherm commented Dec 18, 2018 • edited

poikilotherm commented Jan 3, 2019

pdurbin commented Jan 3, 2019

poikilotherm commented Jan 3, 2019 • edited

landreev commented Jan 4, 2019

landreev commented Jan 4, 2019

poikilotherm commented Jan 4, 2019

landreev commented Jan 8, 2019

poikilotherm commented Jan 16, 2019 • edited

landreev commented Jan 30, 2019

poikilotherm commented Nov 10, 2020 • edited

poikilotherm commented Nov 27, 2018 •

edited

poikilotherm commented Dec 18, 2018 •

edited

poikilotherm commented Jan 3, 2019 •

edited

poikilotherm commented Jan 16, 2019 •

edited

poikilotherm commented Nov 10, 2020 •

edited