SchedulerFactoryBean's setOverwriteExistingJobs does not reliably work in a cluster [SPR-14745] #19311

spring-projects-issues · 2016-09-24T07:38:38Z

Alexandru-Constantin Bledea opened SPR-14745 and commented

This happens when I want to deploy a spring boot application on a cluster.

What I suspect the problem to be is SchedulerFactoryBean's setOverwriteExistingJobs not offering enough protection.

One node will be initializing the scheduler and it will decide to replace the trigger (breakpoint org.quartz.impl.jdbcjobstore.SimpleTriggerPersistenceDelegate#deleteExtendedTriggerProperties )

Right after it executes this method, the trigger won't be in the database any longer so when another node in the cluster will try to read it (org.quartz.impl.jdbcjobstore.JobStoreSupport#retrieveTrigger) it will fail with the exception below. Because of this exception, the whole application will fail to start (not just the scheduler).

Caused by: org.quartz.JobPersistenceException: Couldn't retrieve trigger: No record found for selection of Trigger with key:

I attached the logs (The exception can be found on the Server-1 node after the 4th restart)

For the whole project that demonstrates this issue go to https://github.com/apixandru/case-study/tree/master/spring-boot-quartz

The way that we configure the scheduler is here

@Bean
JobDetailFactoryBean jobFactoryBean() {
    JobDetailFactoryBean bean = new JobDetailFactoryBean();
    bean.setDurability(true);
    bean.setName("Sampler");
    bean.setJobClass(SampleJob.class);
    return bean;
}

@Bean
SimpleTriggerFactoryBean triggerFactoryBean(JobDetailFactoryBean jobFactoryBean) {
    SimpleTriggerFactoryBean bean = new SimpleTriggerFactoryBean();
    bean.setName("Sampler Trigger");
    bean.setRepeatInterval(20_000);
    bean.setJobDetail(jobFactoryBean.getObject());
    return bean;
}

@Bean
SchedulerFactoryBean schedulerFactoryBean(SimpleTriggerFactoryBean triggerFactoryBean, DataSource dataSource, Dependency dependency) {
    Properties props = new Properties();
    props.put("org.quartz.scheduler.instanceId", "AUTO");
    props.put("org.quartz.jobStore.isClustered", "true");

    SchedulerFactoryBean bean = new SchedulerFactoryBean();
    bean.setTriggers(triggerFactoryBean.getObject());
    bean.setSchedulerName("Demo Scheduler");
    bean.setSchedulerContextAsMap(Collections.singletonMap("dependency", dependency));
    bean.setOverwriteExistingJobs(true);
    bean.setDataSource(dataSource);
    bean.setQuartzProperties(props);

    return bean;
}

This happens a lot on our work servers but it's a lot harder to get locally (possibly due to the fact that the actual servers are dedicated and have a lot more power than my local machine?)

To get the bug on any machine, start one server in debug mode and put a breakpoint on SimpleTriggerPersistenceDelegate.deleteExtendedTriggerProperties and just after it executes, start the second server and you will get this exception

Anyway, I managed to get this error locally as well after about 40 redeploys to my local clustered weblogic server.

Affects: 4.3.3

Reference URL: http://stackoverflow.com/questions/39673572/spring-quartz-scheduler-race-condition

Attachments:

myserver.log (38.69 kB)
myserver.stdout (9.96 kB)
Server-0.log (28.62 kB)
Server-0.stdout (50.53 kB)
Server-1.log (39.90 kB)
Server-1.stdout (58.06 kB)

Issue Links:

Wait for Quartz jobs to finish before continuing shutdown of singleton beans [SPR-14806] #19372 Wait for Quartz jobs to finish before continuing shutdown of singleton beans

1 votes, 3 watchers

spring-projects-issues · 2016-09-24T18:52:22Z

Juergen Hoeller commented

Isn't this an issue with Quartz's trigger persistence mechanism itself? Is there anything specific to Spring's Quartz integration here? Otherwise you'd have to report this to the Quartz project...

spring-projects-issues · 2016-09-24T21:14:10Z

Alexandru-Constantin Bledea commented

I'm actually not sure to be honest but I'm pretty sure that you're right now that I think about it.

The issue seemed to arise from the fact that overwriteExistingJobs is set to true. However, i just had another look into it that in turn eventually delegates to the scheduler's "replaceTrigger" which doesn't seem to do the whole replacing thing transactionally, there's a small window where the database is in an inconsistent state because of it.

I will try to see if i can get into this scenario without spring at all (under normal circumstances as defined by their docs), but it definitely looks plausible. If i get there I will raise the issue on the quartz side. (Thanks for the quick response.)

spring-projects-issues · 2016-09-25T13:58:02Z

Alexandru-Constantin Bledea commented

I've tinkered with vanilla quartz a bit and I can't get to a scenario where I would reschedule a job in the way that it's done with SchedulerFactoryBean.

Here's what i came up with

package com.apixandru.casestudy;

import org.quartz.JobDetail;
import org.quartz.Scheduler;
import org.quartz.SchedulerException;
import org.quartz.Trigger;
import org.quartz.impl.StdSchedulerFactory;
import org.quartz.impl.jdbcjobstore.JobStoreTX;

import java.util.Properties;

import static java.util.Collections.singleton;
import static org.quartz.JobBuilder.newJob;
import static org.quartz.SimpleScheduleBuilder.simpleSchedule;
import static org.quartz.TriggerBuilder.newTrigger;
import static org.quartz.impl.StdSchedulerFactory.PROP_JOB_STORE_CLASS;

public class QuartzTest {

    public static void main(String[] args) throws SchedulerException, InterruptedException {

        Properties props = new Properties();
        props.put("org.quartz.scheduler.instanceName", "QuartzTest Scheduler");
        props.put("org.quartz.scheduler.instanceId", "AUTO");

        props.put("org.quartz.jobStore.isClustered", "true");
        props.put("org.quartz.threadPool.threadCount", "2");

        props.put("org.quartz.dataSource.NAME.URL", "jdbc:oracle:thin:@localhost:49161:xe");
        props.put("org.quartz.dataSource.NAME.user", "system");
        props.put("org.quartz.dataSource.NAME.password", "oracle");
        props.put("org.quartz.dataSource.NAME.driver", "oracle.jdbc.OracleDriver");
        props.put("org.quartz.jobStore.dataSource", "NAME");

        props.put(PROP_JOB_STORE_CLASS, JobStoreTX.class.getName());

        Scheduler scheduler = new StdSchedulerFactory(props).getScheduler();

        scheduler.start();


        JobDetail job = newJob(HelloJob.class)
                .withIdentity("QuartzTest Job")
                .build();

        Trigger trigger = newTrigger()
                .withIdentity("QuartzTest Trigger")
                .startNow()
                .withSchedule(simpleSchedule()
                        .withIntervalInSeconds(20)
                        .repeatForever())
                .build();

        scheduler.scheduleJob(job, singleton(trigger), true);

    }
}

This code never throws that exception because it manages the whole trigger replacing internally and it does that by updating the existing trigger rather than deleting it and reinserting it.

The rescheduleJob that is used in spring seems to do all that managing on the spring side (to figure out that the trigger already exists and so on).

spring-projects-issues · 2016-09-25T19:10:38Z

Juergen Hoeller commented

Alright, we'll try to reimplement our logic there to leave the entire replacement step up to Quartz itself...

spring-projects-issues · 2016-10-03T07:15:08Z

Alexandru-Constantin Bledea commented

I think that the fact that the job has to be durable in the spring-quartz integration is probably exactly because of this issue.

Since the trigger is deleted in replaceTrigger, the job would have been without any trigger and it would have been deleted which meant that replace trigger would probably have failed.

I'm facing another issue currently because of this, the jobs are still in the database even though they are no longer in the code (i'm pretty sure it's because they are durable) .If i just change the job detail factory bean's name, the trigger is not currently being replaced and linked to the new job. It still remains on the old job even though a new job is created in the database.

spring-projects-issues · 2016-10-07T19:13:52Z

Alexandru-Constantin Bledea commented

Hi Juergen Hoeller,

I see that this was pushed back to 5.x.
I understand that it's not on the priority list but maybe some small changes that would allow us to subclass SchedulerFactoryBean and change the default behavior would let us work around the issue until the official fix.

I made some (very) minor changes that allows me to work around the issue here: #1198
Would it be possible to have this in 4.3.x?

Regards,
Alexandru

spring-projects-issues · 2016-10-16T12:26:54Z

Alexandru-Constantin Bledea commented

I believe that allowing SchedulerFactoryBean to be subclassed and giving the option to override the behavior in the subclass is a good approach because there are other things that I expected to happen that aren't currently happening and I cannot insert hooks for my logic anywhere in the SchedulerFactoryBean.

Here are a couple of tests of what I would expect to happen but aren't currently.
https://github.com/apixandru/case-study/tree/master/spring-boot-quartz-expectations

Also, this pull request doesn't fix it either (however, it does come close if you set the durability flag to false - which wasn't possible with the old logic, and you set remove unused triggers to true), I suppose that remove unused triggers should come before rescheduling them to remove the old link between the old job and the new one
#1200

spring-projects-issues · 2016-10-29T06:06:18Z

Alexandru-Constantin Bledea commented

Just found out something really interesting. The problem is not the method that we are calling. The replaceJob does actually do this atomically and inside a transaction. The thing is that spring doesn't honor the transaction apparently. I'm assuming that if it doesn't honor the transaction here, it probably doesn't honor it in other places as well which is not that great.

quartz-scheduler/quartz#57

spring-projects-issues · 2016-10-29T07:22:25Z

Alexandru-Constantin Bledea commented

Ok, so the reason why it is not respecting the transaction is that the spring provided job store's transaction management is container managed. The problem with that is that we do not provide a transaction so it's done non transactionally. (Actually, the whole code that registers everything is executed in the afterPropertiesSet method outside of a transaction.)

Just to check if this is correct, I manually inserted a transaction where i knew it fails and it no longer failed. (the trigger was no longer removed eagerly, but after the commit)

// Register Triggers.
if (this.triggers != null) {
  for (final Trigger trigger : this.triggers) {
    TransactionTemplate tmpl = new TransactionTemplate(txManager);
    tmpl.execute(new TransactionCallbackWithoutResult() {
      @Override
      protected void doInTransactionWithoutResult(TransactionStatus status) {
        try {
            addTriggerToScheduler(trigger);
        } catch (SchedulerException e) {
            e.printStackTrace();
        }
    }});
  }

spring-projects-issues · 2016-11-04T10:23:36Z

Juergen Hoeller commented

If such wrapping in a transaction helped, wouldn't it also work to set the transaction manager through SchedulerFactoryBean's setTransactionManager method? That would wrap the entire registerJobsAndTriggers procedure in one large transaction... Or do you specifically need an individual transaction per trigger here?

spring-projects-issues · 2016-11-23T07:15:21Z

Alexandru-Constantin Bledea commented

I can confirm that setting the TransactionManager solves the concurrency issues without any additional changes.

spring-projects-issues · 2019-01-12T02:24:33Z

Bulk closing outdated, unresolved issues. Please, reopen if still relevant.

spring-projects-issues added status: waiting-for-triage An issue we've not yet triaged or decided on type: enhancement A general enhancement in: core Issues in core modules (aop, beans, core, context, expression) and removed type: enhancement A general enhancement labels Jan 11, 2019

spring-projects-issues mentioned this issue Jan 11, 2019

Wait for Quartz jobs to finish before continuing shutdown of singleton beans [SPR-14806] #19372

Closed

rstoyanchev added status: bulk-closed An outdated, unresolved issue that's closed in bulk as part of a cleaning process and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Jan 11, 2019

spring-projects-issues closed this as completed Jan 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SchedulerFactoryBean's setOverwriteExistingJobs does not reliably work in a cluster [SPR-14745] #19311

SchedulerFactoryBean's setOverwriteExistingJobs does not reliably work in a cluster [SPR-14745] #19311

spring-projects-issues commented Sep 24, 2016 •

edited

spring-projects-issues commented Sep 24, 2016

spring-projects-issues commented Sep 24, 2016

spring-projects-issues commented Sep 25, 2016

spring-projects-issues commented Sep 25, 2016

spring-projects-issues commented Oct 3, 2016

spring-projects-issues commented Oct 7, 2016

spring-projects-issues commented Oct 16, 2016

spring-projects-issues commented Oct 29, 2016

spring-projects-issues commented Oct 29, 2016

spring-projects-issues commented Nov 4, 2016

spring-projects-issues commented Nov 23, 2016

spring-projects-issues commented Jan 12, 2019

SchedulerFactoryBean's setOverwriteExistingJobs does not reliably work in a cluster [SPR-14745] #19311

SchedulerFactoryBean's setOverwriteExistingJobs does not reliably work in a cluster [SPR-14745] #19311

Comments

spring-projects-issues commented Sep 24, 2016 • edited

spring-projects-issues commented Sep 24, 2016

spring-projects-issues commented Sep 24, 2016

spring-projects-issues commented Sep 25, 2016

spring-projects-issues commented Sep 25, 2016

spring-projects-issues commented Oct 3, 2016

spring-projects-issues commented Oct 7, 2016

spring-projects-issues commented Oct 16, 2016

spring-projects-issues commented Oct 29, 2016

spring-projects-issues commented Oct 29, 2016

spring-projects-issues commented Nov 4, 2016

spring-projects-issues commented Nov 23, 2016

spring-projects-issues commented Jan 12, 2019

spring-projects-issues commented Sep 24, 2016 •

edited