Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: make persistent EJB timers non-persistent or not rely on database #5345

Closed
poikilotherm opened this issue Nov 27, 2018 · 17 comments · Fixed by #7416
Closed

Proposal: make persistent EJB timers non-persistent or not rely on database #5345

poikilotherm opened this issue Nov 27, 2018 · 17 comments · Fixed by #7416

Comments

@poikilotherm
Copy link
Contributor

poikilotherm commented Nov 27, 2018

This is related to #5292 and a story of that epic.
As DataverseTimerServiceBean.java has been mostly written and maintained by @landreev, this most certainly is for him...

Context

Currently, in glassfish-setup.sh you need to setup a database connection for storage of persistent EJB timers.

Reusing the database for Dataverse for EJB timers is alright, but it is not easy to depict when heading for containers. It is much easier for container approaches to use application scoped JDBC connections, but those seem not to be reusable for EJB timers.

There seemed to be some issues in the past, too. #3672, #3789 and more.

Proposal

Just remove the persistent nature of timers.

  • They are created on each startup on the timing server (see docs).
  • In a single node setup, when the server is down, IMHO there is no need to have the timers persistent in a database. No one will execute them. Executing them after the server comes back seems with HarvestingClient and ExportClient as current use cases not very important.
  • In a multi node setup, when the timing server is down, you need to change it to another server, so they get executed again. This will result in broken persistent timers anyway, as they had been created with the name of the dead timing server.

When persistance is needed in a multi node setup in the future, there is Hazelcast in Payara which supports storing timers as long as at least one cluster member is online...

Comments are very much appreciated. :-)

@poikilotherm
Copy link
Contributor Author

Following @pdurbin suggestion about DDD (doc driven development), I request a review of my plans in #5371 by @landreev and potentially others.

Cheers 😄 🍺

@landreev
Copy link
Contributor

landreev commented Dec 7, 2018

@poikilotherm
As a quick note, yes, this makes perfect sense.

At present, we are not benefiting from the fact that the timers are stored in the database, at all. It's kind of the other way around: you can say that we go to the extra trouble of making sure that our timers are NOT persisted. In a multi-node cluster, we specifically try to run all the timers on just one of the nodes (for various reasons); AND we clear and re-create all the timers on startup.

So yes, it will be cleaner, not to bother saving them in the database in the first place.

(we actually have tried something similar, in our own production cluster at Harvard a while ago; I can't remember now why we chose not to make it the default solution in the distribution...)

I may have questions about some specifics of the PR (#5371) though; I'll be in touch.

@poikilotherm
Copy link
Contributor Author

Hi @landreev, @pdurbin told me you guys talked about this and you seemed to like it. Should I start actual code work on this?

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Dec 17, 2018
Removes the timer handling code for the export of Datasets and OAISets from
DataverseTimerServiceBean and uses non-persisting @schedule annotation instead.
Checks for execution on timer server only are in place.
poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Dec 17, 2018
Removes the timer handling code for the export of Datasets and OAISets from
DataverseTimerServiceBean and uses non-persisting @schedule annotation instead.
Checks for execution on timer server only are in place.
@poikilotherm
Copy link
Contributor Author

Hey @landreev,

could you give my comments in 81fbffe a look and discuss this with others? I am uncertain if this refactoring is a chunk to big to fit inside this PR. (Happy to do the necessary code work).

And while you are in there, you could take a look at a7cc7b8 which puts the first (but not all) @Schedule annotations in place.

Thx!

@landreev
Copy link
Contributor

@poikilotherm I'm looking into it now (was out on Friday).
Yes, I was confused/didn't get the part that it was documentation only in the PR; and that the code was not yet written.

Hey @landreev,

could you give my comments in 81fbffe a look and discuss this with others? I am uncertain if this refactoring is a chunk to big to fit inside this PR. (Happy to do the necessary code work).

And while you are in there, you could take a look at a7cc7b8 which puts the first (but not all) @Schedule annotations in place.

Thx!

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Dec 18, 2018
* This removes the old timer handling completly. No persistent timers present anymore.
* Removes timer handling from the Harvester Client Admin UI backing.
* This cannot be unit tested as stated in the Javadocs and accepted by @landreev.
* This might need refactoring when problems arise from the simple execution approach used for now.
  Prior to this commit, harvest clients ran on independent timers and thus on independent threads.
  Now they are executed sequentially, which might lead to problems when harvests last longer and
  more clients want to run an hour later.
poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Dec 18, 2018
@poikilotherm
Copy link
Contributor Author

poikilotherm commented Dec 18, 2018

Hi @landreev feel free to look into my recent additions to #5371 😄 I added the necessary bits to HarvesterServiceBean, added lots of docs and removed the old timer setup.

Gave this a shot in VM, harvesting from Harvard Dataverse. It works, but we might run into trouble with such large harvests. Nothing would be really "lost", but it would take a few days to get this loads of data inside a new Dataverse installation. Should I add presumptious support for this (e.g. using JBatch or similar)?

@poikilotherm
Copy link
Contributor Author

@djbrooke or @pdurbin could you please move this to code review again? @landreev needs to take a look 😉

@pdurbin
Copy link
Member

pdurbin commented Jan 3, 2019

@poikilotherm hi! Two things:

@poikilotherm
Copy link
Contributor Author

poikilotherm commented Jan 3, 2019

IMHO this is still a WIP as long as @landreev does not express his satisfaction with this. Especially the possible design drawback mentioned above (large harvests causing delay for others) needs a decision from him... 😉

(Merging done.)

@landreev
Copy link
Contributor

landreev commented Jan 4, 2019

The issue with long harvesting jobs - it's not really anything new/being introduced in this PR, correct?
As long as the harvesting client is properly locked, thus preventing the service to attempt further harvests while it's in progress, it should be ok for the job to take a while, I think.

I'm currently looking at the overall code in the PR; it looks good overall, I just need to review more carefully the part where the scheduled client jobs are started.

@landreev
Copy link
Contributor

landreev commented Jan 4, 2019

(I'm going to test the branch and run some harvests too)

@poikilotherm
Copy link
Contributor Author

It is not really new, no. Before my code changes, the harvests are executed in parallel, as each harvester has its own timer going off and triggering the harvest a configured.

The code changes lead to a serialisation of harvests, which might be suboptimal with bigger harvest in mind. The @Lock(WRITE) annotation should ensure that no other harvest is executed as long as the long-running harvest is running. However, this is currently not unit- or integration testable and needs manual testing. 😞

"A while" is a very unsharp term. When I harvested Harvard Dataverse, many thousand datasets where collected and it took a few hours until I aborted unfinished. OAI seems pretty slow...

There are proper solutions around this like starting a background batch job from the scheduled code, but this might be presumptios optimization.

@landreev
Copy link
Contributor

landreev commented Jan 8, 2019

I thought about this/discussed it a bit w/others, and I do think the serialization of harvests is a serious problem. I don't think we want to put all the harvests on hold, for hours (or days, potentially) while a large collection is being populated. And I'm not sure getting rid of the timer entries in the database is by itself worth a new limitation like this.

I was actually hoping that this new code would be starting the individual jobs asynchronously (you mentioned this as a solution in your last comment). So we would be back to running harvests in parallel... Why not just do that - just use a method with the @asynchronous annotation? We will not be able to use the @lock(WRITE) annotation then; but that just means we will have to rely on the current system of the database locks on individual harvestingClients. Seems like a practical enough solution, I think. (?)

@poikilotherm
Copy link
Contributor Author

poikilotherm commented Jan 16, 2019

Hey guys, I haven't forgotten this issue, I have been busy with other stuff. Sry for the delay.

I will look into using proper JSR 352 (JBatch) handling for this. IMHO using @Async is no good idea for a job potentially running for a week while harvesting. JBatch is already used for other batch jobs, so its a dependency already.

Will keep in mind multiple glassfish servers as pointed out by @pdurbin on IRC. Using ShedLock might be a valid option.

This will take a while, as I am on vacation for the two weeks ahead.

@landreev
Copy link
Contributor

Hi Oliver, there's definitely no rush, working on this issue. We are fixing something that's not really broken, after all...
JBatch would indeed make sense to use, as an alternative for large harvests that can take days. But how would that work in practice? I mean, you don't know how large a harvest is going to be until you actually run it... We could of course make it so that if a scheduled harvest (i.e., a harvest that runs on a timer) finds more than N new datasets, it gives up and exits with "this OAI server has too much content; run the harvest as a batch job" somehow.
Or did you have something else in mind?

@poikilotherm poikilotherm removed this from Backlog in Forschungszentrum Jülich Jun 27, 2019
@poikilotherm poikilotherm changed the title Proposal: make persistent EJB timers non-persistent Proposal: make persistent EJB timers non-persistent or not rely on database Nov 10, 2020
@poikilotherm
Copy link
Contributor Author

poikilotherm commented Nov 10, 2020

With the move to Payara done since 5.0, we could stop relying on a EJB timers PostgreSQL connection two ways (showstopper for #6819 and #7048).

  1. Move to non-persistent timers

    • Maybe use MicroProfile Config to enable configuration when they should go off (or similar mechanism via a database setting. MPConfig preferred.)
    • As outlined above once, we could use ShedLock for coordinating a fleet and avoid harverster jobs pilling up.
      This might be easier than switching to Hazelcast Domain Data Grid, as we can provide the locking database table on our own via Flyway.
      Benefits: a) no hassle with app server level db connection, b) more control over execution, maybe in parallel, c) compatible with Payara Micro and other smaller app servers.
    • I don't think switching to Java Concurrency (ManagedScheduledExecutorService) is worth it. See here.
  2. Stay with persistent timers, but use Hazelcast Domain Data Grid for storing them.

    • This could still involve MicroProfile Config for configuring them.
    • Execution is coordinated between the different nodes.

Maybe other components like File PIDs, Draft PIDs etc benefit from this, too.

See also this Google Doc

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Nov 17, 2020
poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Nov 17, 2020
poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Nov 17, 2020
@poikilotherm poikilotherm added this to the 5.3 milestone Nov 17, 2020
poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Nov 18, 2020
poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Nov 18, 2020
IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from Community Dev 💻❤️ to Done 🚀 Nov 24, 2020
@poikilotherm poikilotherm removed this from Proposals in Forschungszentrum Jülich Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

4 participants