Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSR-352 Job Operator - Can Neither Abandon nor Stop job post crash [BATCH-2433] #1169

Open
spring-projects-issues opened this issue Sep 10, 2015 · 1 comment

Comments

@spring-projects-issues
Copy link
Collaborator

@spring-projects-issues spring-projects-issues commented Sep 10, 2015

Brian Rogers opened BATCH-2433 and commented

The logic for abandon or stop does not cover all failure cases. My guess is that because stop uses JSR Batch status and abandon uses Spring's JobExecution.isRunning method.

When calling abandon I receive:

javax.batch.operations.JobExecutionIsRunningException: Unable to abandon a job that is currently running

When calling stop on the same job I get the seemingly contrary message:

javax.batch.operations.JobExecutionNotRunningException: JobExecution must be running so that it can be stopped: JobExecution: id=448, version=2, startTime=2015-09-09 11:26:13.543, endTime=null, lastUpdated=2015-09-09 14:26:42.003, status=STOPPING, exitStatus=exitCode=UNKNOWN;exitDescription=; <redacted>

I'm having a difficult time reproducing the "STOPPING" state though I imagine the JVM was shut down during the termination of my batch job which has a long running partitioned chunk step. I also have some other jobs in the "STARTED" state that are clearly not running as the JVM has been recycled multiple times since.

Basically there needs to be a reliable way of cleaning up these post-crash jobs through the JSR JobOperator interface. It may be as simple as updating the checks in STOP and ABANDON but I don't have a full appreciation of what's going on behind the scenes in all possible scenarios.

The other thing to note is that JobOperator.getRunningExecutions is returning these "STOPPING" instances. Which is probably not incorrect behavior, just another data point.


Affects: 3.0.1, 3.0.2, 3.0.3, 3.0.4, 3.0.5

Reference URL: https://gist.github.com/xerofun/ac8c19ef814253efedf5#file-gistfile1-java

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Sep 18, 2015

Brian Rogers commented

Here's a first cut at a fix. Not entirely sure what implications it has in the overall architecture to attack this by using the internal jobRegistry, but it was the first strategy that came to mind after surveying the code. Would love to get some feedback on this.

if (jobRegistry.exists(jobExecutionId) && jobExecution.isRunning()) {
    throw new JobExecutionIsRunningException("Unable to abandon a job that is currently running");
}

https://gist.github.com/xerofun/5f2c4d89aedaddceeb8a#file-jsrjoboperator-java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants