Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover from crash [BATCH-2505] #1099

Closed
spring-issuemaster opened this issue May 2, 2016 · 1 comment
Closed

Recover from crash [BATCH-2505] #1099

spring-issuemaster opened this issue May 2, 2016 · 1 comment

Comments

@spring-issuemaster
Copy link
Collaborator

@spring-issuemaster spring-issuemaster commented May 2, 2016

Alexander Hagenhoff opened BATCH-2505 and commented

Regarding the documentation http://docs.spring.io/spring-batch/reference/htmlsingle/#d5e1320

"If the process died ("kill -9" or server failure) the job is, of course, not running, but the JobRepository has no way of knowing because no-one told it before the process died."

I try to find and restart the stale job executions by using

Set<JobExecution> jobExecutions = jobExplorer.findRunningJobExecutions(jobName);
...
jobExecution.setStatus(FAILED);
jobExecution.setEndTime(new Date());
jobRepository.update(jobExecution);
jobOperator.restart(jobExecution.getId());

But this seems to be very inconvenient. 1) I have to do this before other (new) jobs could be started. 2) I have to handle multiple instances of running servers so findRunningJobExecutions will not do the trick.

You can find other questions regarding this topic: https://jira.spring.io/browse/BATCH-2433?jql=project%20%3D%20BATCH%20AND%20status%20%3D%20Open%20ORDER%20BY%20priority%20DESC Spring Batch after JVM crash

I would love to see a solution to register a "start up clean jobs listener". This will still not fix the problems originated by the multi server environment because spring batch does not know if the JobExecution marked by STARTED is not running on an other instance.


Affects: 3.0.7

Reference URL: http://docs.spring.io/spring-batch/reference/htmlsingle/#d5e1320

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Oct 18, 2019

Mahmoud Ben Hassine commented

  1. I have to do this before other (new) jobs could be started

Yes, as mentioned in the section of the docs you linked, "it's a business decision and there is no way to automate it". So you need to manually mark the job execution as failed or abandoned.

  1. I have to handle multiple instances of running servers so findRunningJobExecutions will not do the trick

The job repository acts a central guard in a clustered environment to prevent launching the same job instance on different servers. So JobExplorer#findRunningJobExecutions should work fine in such environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.