New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SchedulerFactoryBean's setOverwriteExistingJobs does not reliably work in a cluster [SPR-14745] #19311
Comments
Juergen Hoeller commented Isn't this an issue with Quartz's trigger persistence mechanism itself? Is there anything specific to Spring's Quartz integration here? Otherwise you'd have to report this to the Quartz project... |
Alexandru-Constantin Bledea commented I'm actually not sure to be honest but I'm pretty sure that you're right now that I think about it. The issue seemed to arise from the fact that overwriteExistingJobs is set to true. However, i just had another look into it that in turn eventually delegates to the scheduler's "replaceTrigger" which doesn't seem to do the whole replacing thing transactionally, there's a small window where the database is in an inconsistent state because of it. I will try to see if i can get into this scenario without spring at all (under normal circumstances as defined by their docs), but it definitely looks plausible. If i get there I will raise the issue on the quartz side. (Thanks for the quick response.) |
Alexandru-Constantin Bledea commented I've tinkered with vanilla quartz a bit and I can't get to a scenario where I would reschedule a job in the way that it's done with SchedulerFactoryBean. Here's what i came up with package com.apixandru.casestudy;
import org.quartz.JobDetail;
import org.quartz.Scheduler;
import org.quartz.SchedulerException;
import org.quartz.Trigger;
import org.quartz.impl.StdSchedulerFactory;
import org.quartz.impl.jdbcjobstore.JobStoreTX;
import java.util.Properties;
import static java.util.Collections.singleton;
import static org.quartz.JobBuilder.newJob;
import static org.quartz.SimpleScheduleBuilder.simpleSchedule;
import static org.quartz.TriggerBuilder.newTrigger;
import static org.quartz.impl.StdSchedulerFactory.PROP_JOB_STORE_CLASS;
public class QuartzTest {
public static void main(String[] args) throws SchedulerException, InterruptedException {
Properties props = new Properties();
props.put("org.quartz.scheduler.instanceName", "QuartzTest Scheduler");
props.put("org.quartz.scheduler.instanceId", "AUTO");
props.put("org.quartz.jobStore.isClustered", "true");
props.put("org.quartz.threadPool.threadCount", "2");
props.put("org.quartz.dataSource.NAME.URL", "jdbc:oracle:thin:@localhost:49161:xe");
props.put("org.quartz.dataSource.NAME.user", "system");
props.put("org.quartz.dataSource.NAME.password", "oracle");
props.put("org.quartz.dataSource.NAME.driver", "oracle.jdbc.OracleDriver");
props.put("org.quartz.jobStore.dataSource", "NAME");
props.put(PROP_JOB_STORE_CLASS, JobStoreTX.class.getName());
Scheduler scheduler = new StdSchedulerFactory(props).getScheduler();
scheduler.start();
JobDetail job = newJob(HelloJob.class)
.withIdentity("QuartzTest Job")
.build();
Trigger trigger = newTrigger()
.withIdentity("QuartzTest Trigger")
.startNow()
.withSchedule(simpleSchedule()
.withIntervalInSeconds(20)
.repeatForever())
.build();
scheduler.scheduleJob(job, singleton(trigger), true);
}
} This code never throws that exception because it manages the whole trigger replacing internally and it does that by updating the existing trigger rather than deleting it and reinserting it. The rescheduleJob that is used in spring seems to do all that managing on the spring side (to figure out that the trigger already exists and so on). |
Juergen Hoeller commented Alright, we'll try to reimplement our logic there to leave the entire replacement step up to Quartz itself... |
Alexandru-Constantin Bledea commented I think that the fact that the job has to be durable in the spring-quartz integration is probably exactly because of this issue. Since the trigger is deleted in replaceTrigger, the job would have been without any trigger and it would have been deleted which meant that replace trigger would probably have failed. I'm facing another issue currently because of this, the jobs are still in the database even though they are no longer in the code (i'm pretty sure it's because they are durable) .If i just change the job detail factory bean's name, the trigger is not currently being replaced and linked to the new job. It still remains on the old job even though a new job is created in the database. |
Alexandru-Constantin Bledea commented Hi Juergen Hoeller, I see that this was pushed back to 5.x. I made some (very) minor changes that allows me to work around the issue here: #1198 Regards, |
Alexandru-Constantin Bledea commented I believe that allowing SchedulerFactoryBean to be subclassed and giving the option to override the behavior in the subclass is a good approach because there are other things that I expected to happen that aren't currently happening and I cannot insert hooks for my logic anywhere in the SchedulerFactoryBean. Here are a couple of tests of what I would expect to happen but aren't currently. Also, this pull request doesn't fix it either (however, it does come close if you set the durability flag to false - which wasn't possible with the old logic, and you set remove unused triggers to true), I suppose that remove unused triggers should come before rescheduling them to remove the old link between the old job and the new one |
Alexandru-Constantin Bledea commented Just found out something really interesting. The problem is not the method that we are calling. The replaceJob does actually do this atomically and inside a transaction. The thing is that spring doesn't honor the transaction apparently. I'm assuming that if it doesn't honor the transaction here, it probably doesn't honor it in other places as well which is not that great. |
Alexandru-Constantin Bledea commented Ok, so the reason why it is not respecting the transaction is that the spring provided job store's transaction management is container managed. The problem with that is that we do not provide a transaction so it's done non transactionally. (Actually, the whole code that registers everything is executed in the afterPropertiesSet method outside of a transaction.) Just to check if this is correct, I manually inserted a transaction where i knew it fails and it no longer failed. (the trigger was no longer removed eagerly, but after the commit) // Register Triggers.
if (this.triggers != null) {
for (final Trigger trigger : this.triggers) {
TransactionTemplate tmpl = new TransactionTemplate(txManager);
tmpl.execute(new TransactionCallbackWithoutResult() {
@Override
protected void doInTransactionWithoutResult(TransactionStatus status) {
try {
addTriggerToScheduler(trigger);
} catch (SchedulerException e) {
e.printStackTrace();
}
}});
} |
Juergen Hoeller commented If such wrapping in a transaction helped, wouldn't it also work to set the transaction manager through |
Alexandru-Constantin Bledea commented I can confirm that setting the TransactionManager solves the concurrency issues without any additional changes. |
Bulk closing outdated, unresolved issues. Please, reopen if still relevant. |
Alexandru-Constantin Bledea opened SPR-14745 and commented
This happens when I want to deploy a spring boot application on a cluster.
What I suspect the problem to be is SchedulerFactoryBean's setOverwriteExistingJobs not offering enough protection.
One node will be initializing the scheduler and it will decide to replace the trigger (breakpoint org.quartz.impl.jdbcjobstore.SimpleTriggerPersistenceDelegate#deleteExtendedTriggerProperties )
Right after it executes this method, the trigger won't be in the database any longer so when another node in the cluster will try to read it (org.quartz.impl.jdbcjobstore.JobStoreSupport#retrieveTrigger) it will fail with the exception below. Because of this exception, the whole application will fail to start (not just the scheduler).
I attached the logs (The exception can be found on the Server-1 node after the 4th restart)
For the whole project that demonstrates this issue go to https://github.com/apixandru/case-study/tree/master/spring-boot-quartz
The way that we configure the scheduler is here
This happens a lot on our work servers but it's a lot harder to get locally (possibly due to the fact that the actual servers are dedicated and have a lot more power than my local machine?)
To get the bug on any machine, start one server in debug mode and put a breakpoint on SimpleTriggerPersistenceDelegate.deleteExtendedTriggerProperties and just after it executes, start the second server and you will get this exception
Anyway, I managed to get this error locally as well after about 40 redeploys to my local clustered weblogic server.
Affects: 4.3.3
Reference URL: http://stackoverflow.com/questions/39673572/spring-quartz-scheduler-race-condition
Attachments:
Issue Links:
1 votes, 3 watchers
The text was updated successfully, but these errors were encountered: