Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Existing failed job is restarted, even if new job contains different job parameters [BATCH-2711] #882

Closed
spring-issuemaster opened this issue Apr 17, 2018 · 19 comments

Comments

@spring-issuemaster
Copy link
Collaborator

@spring-issuemaster spring-issuemaster commented Apr 17, 2018

Tonie opened BATCH-2711 and commented

Created a Spring Batch job as described in the documentation 'Getting Started - Creating a Batch Service' (https://spring.io/guides/gs/batch-processing/) in order to show the issue. SQLServer is used as database for the meta-data tables as well as the business data table.

The job reads a .csv file with with 6 records and the ItemProcess throws an exception on the 4th record. The job then stops and the meta-data tables are updated correctly. Job parameters used for the job is: parm=0 fileName=error.csv

After this a new job is started with the same job name, but different parameters: parm=1 fileName=new.csv. What happens is that Spring Batch restart the failed job with the errors again. Even if the new job contains different parameters.

This worked perfectly as expected in version 3.0.8, but the issue occurred when upgraded to version 4.0.1.

Logs from 3.0.8 shows the following:

main] o.s.b.a.b.JobLauncherCommandLineRunner   : Running default command line with: [param=1, inputFile=sample-data-1.csv]
main] o.s.b.c.r.s.JobRepositoryFactoryBean     : No database type set, using meta data indicating: SQLSERVER
main] o.s.b.c.l.support.SimpleJobLauncher      : No TaskExecutor has been set, defaulting to synchronous executor.
main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=importUserJob]] launched with the following parameters: [inputFile=sample-data-1.csv, run.id=1, param=1]

Logs from 4.0.1 shows the following:

main] o.s.b.c.r.s.JobRepositoryFactoryBean     : No database type set, using meta data indicating: SQLSERVER
main] o.s.b.c.l.support.SimpleJobLauncher      : No TaskExecutor has been set, defaulting to synchronous executor.
main] o.s.j.e.a.AnnotationMBeanExporter        : Registering beans for JMX exposure on startup
main] o.s.j.e.a.AnnotationMBeanExporter        : Bean with name 'dataSource' has been autodetected for JMX exposure
main] o.s.j.e.a.AnnotationMBeanExporter        : Located MBean 'dataSource': registering with JMX server as MBean [com.zaxxer.hikari:name=dataSource,type=HikariDataSource]
main] hello.Application                        : Started Application in 2.157 seconds (JVM running for 2.592)
main] o.s.b.a.b.JobLauncherCommandLineRunner   : Running default command line with: [param=1, inputFile=sample-data-1.csv]
main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=importUserJob]] launched with the following parameters: [{inputFile=sample-data-error.csv, param=0, run.id=1}]

Notice the difference in parameters used in SimpleJobLauncher, even if JobLauncherCommandLineRunner in both cases show the same correct parameters.


Affects: 4.0.1

Issue Links:

  • BATCH-2741 Job Parameter Overwrite Issue
    ("is duplicated by")

Referenced from: commits ebf156d, 5328fb2

Backported to: 4.0.2

4 votes, 8 watchers

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Apr 19, 2018

Tonie commented

I traced this back to Spring boot.
In version 2.0.0.M5 it worked as expected.
In version 2.0.0.M6 the issue started for the first time.

Version 2.0.0.M5 used the getNextJobParameters method in the JobLauncherCommandLineRunner class.
This method gets the previous job's job parameters and the add the current job's parameters to the map.
This means if the parameter values are the same, the current job's parameters are used.

Version 2.0.0.M6 changed JobLauncherCommandLineRunner to use the getNextJobParameters method in the JobParametersBuilder class.
This method always uses the previous job's job parameters.

Now, the question is: Which one of these are the correct design and how do you achieve the same result as it was in version 2.0.0.M5 in the later versions?

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Apr 19, 2018

Michael Minella commented

Can you please provide the command you're using to launch your job? The difference you're describing was just moving the code from one place to another. The code itself should be the same.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Apr 19, 2018

Tonie commented

I am using the following commands to start the job:
First job that causes the intended error:

java -jar myjar.jar parm=0 fileName=error.csv

Second job that restart the first job instead of starting a new job instance:

java -jar myjar.jar parm=1 fileName=new.csv

The code was moved from one place to another, but it also changed.

Code in version 2.0.0.M5:

JobLauncherCommandLineRunner execute method:

JobParameters nextParameters = getNextJobParameters(job, jobParameters)
JobExecution execution = this.jobLauncher.run(job, nextParameters);

JobLauncherCommandLineRunner getNextJobParameters method:

   else if (isStoppedOrFailed(previousExecution) && job.isRestartable()) {
	// Retry a failed or stopped execution
	parameters = previousExecution.getJobParameters();
	// Non-identifying additional parameters can be removed to a retry
	removeNonIdentifying(additionals);
   }
   else if (incrementer != null) {
	// New instance so increment the parameters if we can
	parameters = incrementer.getNext(previousExecution.getJobParameters());
   }
}
return merge(parameters, additionals);

JobLauncherCommandLineRunner merge method:

private JobParameters merge(JobParameters parameters,
   Map<String, JobParameter> additionals) {
   Map<String, JobParameter> merged = new HashMap<String, JobParameter>();
   merged.putAll(parameters.getParameters());
   merged.putAll(additionals);
   return new JobParameters(merged);
}

Code in version 2.0.0.M6:

JobLauncherCommandLineRunner execute method:

JobParameters nextParameters = new JobParametersBuilder(jobParameters,
this.jobExplorer).getNextJobParameters(job).toJobParameters();
JobExecution execution = this.jobLauncher.run(job, nextParameters);

JobParametersBuilder getNextJobParameters method:

	else if (isStoppedOrFailed(previousExecution) && job.isRestartable()) {
		// Retry a failed or stopped execution
		nextParameters = previousExecution.getJobParameters();
		// Non-identifying additional parameters can be removed to a retry
		removeNonIdentifying(this.parameterMap);
	}
	else if (incrementer != null) {
		// New instance so increment the parameters if we can
		nextParameters = incrementer.getNext(previousExecution.getJobParameters());
	}
}

this.parameterMap = addJobParameters(nextParameters).toJobParameters().getParameters();

JobParametersBuilder addJobParameters method:

public JobParametersBuilder addJobParameters(JobParameters jobParameters) {
	Assert.notNull(jobParameters, "jobParameters must not be null");
	this.parameterMap.putAll(jobParameters.getParameters());
	return this;
}

As you can see, the old code will use the new job parameters (as it is different from the previous job), while the new code will use the previous job's job parameters. I debugged this to confirm it is indeed happening like this.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Apr 21, 2018

Tonie commented

I changed the addJobParameters method of the JobParametersBuilder class in the latest version of Spring Boot 2.0.1 with Spring batch core 4.0.1 as follow:
 

public JobParametersBuilder addJobParameters(JobParameters jobParameters) { 
   Assert.notNull(jobParameters, "jobParameters must not be null");

   Map<String, JobParameter> currentParameterMap = new LinkedHashMap<>(); 
   currentParameterMap.putAll(this.parameterMap); 
   this.parameterMap.clear(); 
   this.parameterMap.putAll(jobParameters.getParameters()); 
   this.parameterMap.putAll(currentParameterMap);

   return this;
}

 
This reflects the behaviour in version 2.0.0.M5. I tested this and it solves the problem.

I first ran a job as follow:

java -jar myjar.jar parm=0 fileName=error.csv

This resulted in an error (as intended) and the job in the Spring meta-data tables showed a status of FAILED.

I then ran a second job as follow:

java -jar myjar.jar parm=1 fileName=new.csv

This started the second job with a new job instance and it completed successfully.

I then fixed the data file with the error for the first job and ran it again with:

java -jar myjar.jar parm=0 fileName=error.csv

This restarted the first job and it completed successfully.

With this change the behaviour is the same as in version 2.0.0.M5.
Without this change, I can never run a second job until I fixed the first job and it completed with a successful status.

Please advise with next steps on this.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Apr 23, 2018

Tonie commented

While "playing" with different scenarios in order to understand what is happening with this issue, I also found the following 2 side effects of this issue:

  • RunIdIncrementer is not working, even if you do specify it on the job definition.
  • The Job isRestartable() function is not working if set to true.

I have decided not to use the default Spring Boot logic, but to write my own custom job launcher and job parameters processing logic for now, but are willing to help to resolve this issue if more information is required.   

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Apr 26, 2018

Michael Minella commented

I'm looking at this and while the original move was just to simplify the boot code and provide this for any user, I can't help but feel that this is wrong anyways.  This code assumes the run after a failed JobExecution is a restart.  Period.  That may not be the case.  Also, I'm not clear that the JobExecution would end up the way you want.  The code assumes a consistent set of identifying parameters are always passed in.  The CommandLineJobRunner has an explicit flag for a restart scenario.  While I'm not proposing that, I do think that we should dig deeper to determine if a restart makes sense.  

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Apr 27, 2018

Tonie commented

I totally agree with your overall analysis and statement: 

I can't help but feel that this is wrong anyways.

If I look at the documentation for Spring batch, I see the following playing a role during the job start logic:

  • The job parameters
  • The restart indicator
  • The incrementer

I feel that Spring Boot should implement the use case as follow:

If the new job contains exactly the same job parameters as a previous failed job:

  • Re-use the previously failed job instance
  • Treat the new job as a restart

If the new job contains different job parameters from a previous failed job:

  • Treat it as a new job instance
  • It is not a restart

If the new job contains the same job parameters as a previous failed job, but also specifies an incrementer:

  • Treat it as a new job instance
  • It is not a restart

If the new job contains the same job parameters as a previous failed job, but also contains an isRestartable indicator set to false:

  • Treat it as a new job instance
  • It is not a restart

 Does this make sense?

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Aug 31, 2018

Minhyeok Jeong commented

Michael Minella,

The root cause of this issue is in change of merge method.

In previous version, 3.0.9.RELEASE, existing parameters and new parameters are merged like below:

  • existing: foo=1
  • new: foo=20, bar=1
  • merged: foo=20, bar=1

In current version, 4.0.1.RELEASE:

  • existing: foo=1
  • new: foo=20, bar=1
  • merged: foo=1, bar=1

It happens when the previous job is either failed or stopped.

 

But was this change intended? Isn't it misbehavior?

I think the addition of a new public method-addJobParameters(JobParameters)- to the builder was a good thing. But it needs more test case to verify if the builder still keeps previous behavior without any side effect.

I made a PR for this issue: #630

Please review the new test case in the PR:

@Test
public void testAddingExistingJobParametersIfAbsent() {
   JobParameters params1 = new JobParametersBuilder()
         .addString("foo", "bar")
         .addString("bar", "baz")
         .toJobParameters();

   JobParameters params2 = new JobParametersBuilder()
         .addString("foo", "baz")
         .toJobParameters();

   JobParameters finalParams = new JobParametersBuilder()
         .addString("baz", "quix")
         .addJobParametersIfAbsent(params1)
         .addJobParametersIfAbsent(params2)
         .toJobParameters();

   assertEquals("bar", finalParams.getString("foo"));
   assertEquals("baz", finalParams.getString("bar"));
   assertEquals("quix", finalParams.getString("baz"));
} 

 

I hope this issue will be resolved as soon as possible.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Oct 1, 2018

Aritz Bastida commented

Hello! Does the PR really solve the issue? In my opinion, it just hides it a bit deeper.

Note that, when the last execution fails, the non-identifying parameters are removed from the input map (probably to relaunch the previous execution as-it-was). Now the resulting map will contain a mix of externally provided parameters (identifying ones) and previous execution parameters (non-identifying ones).
 

			else if (isStoppedOrFailed(previousExecution) && job.isRestartable()) {
				// Retry a failed or stopped execution
				nextParameters = previousExecution.getJobParameters();
				// Non-identifying additional parameters can be removed to a retry
				removeNonIdentifying(this.parameterMap);
			}
			else if (incrementer != null) {
				// New instance so increment the parameters if we can
				nextParameters = incrementer.getNext(previousExecution.getJobParameters());
			}
		}

		this.parameterMap = addJobParametersIfAbsent(nextParameters)
								.toJobParameters()
								.getParameters();

On the other hand, there might as well be a design problem in JobLauncherCommandLineRunner (spring-boot-autoconfigure), which systematically obtains the next parameters, no matter what.

As reported in this issue, these two classes (JobParameterBuilder and JobLauncherCommandLineRunner) in tandem make it impossible to launch a new instance without first completing the previous job execution. For the same reasons, a previous failed execution can't be restarted either.

protected void execute(Job job, JobParameters jobParameters) {
     JobParameters nextParameters = new JobParametersBuilder(jobParameters,
               this.jobExplorer).getNextJobParameters(job).toJobParameters();
     JobExecution execution = this.jobLauncher.run(job, nextParameters);
}

Compare this to the old semantics from Spring Batch 3 + Spring Batch Admin. Back then, the externally provided job parameters were used to find a matching job instance. If it existed and it was restartable, the call to getNext() was omitted. Otherwise, the incrementer was called, but using the externally provided job parameters as input.

In Spring Batch 4, an empty JobParameters instance will be passed to the first execution, and the previous execution's parameters otherwise. The job parameters specified at launch-time are not propagated as getNextParameters() arguments.

Somehow, the semantics have changed: For better? Is it intentional?

public JobExecution launch(final String jobName, JobParameters jobParameters) {

             final Job job = this.jobLocator.getJob(jobName);

                final JobExecution lastJobExecution = this.jobRepository.getLastJobExecution(jobName, jobParameters);
                boolean restart = false;
                if (lastJobExecution != null) {
                    final BatchStatus status = lastJobExecution.getStatus();
                    if (status.isUnsuccessful() && status != BatchStatus.ABANDONED) {
                        restart = true;
        
                    }
                }
                if (job.getJobParametersIncrementer() != null && !restart) {
                    jobParameters = job.getJobParametersIncrementer().getNext(jobParameters);        
                }
		JobExecution jobExecution = jobLauncher.run(job, jobParameters);

To summarize, I'd say that the job launcher should allow the following use cases:

  • New job instance. The incrementer would be called in this case, but should it be based on the provided parameters (SB Admin) or the last instance (SB4)? I'd vote for the latter.
  • Restart any failed previous execution. In this case, the job would be launched with the provided parameters (including the non-identifying ones).
Use case SB3 SB4 SB4 + PR
New instance OK   KO   KO if the last execution failed and non-identifying parameters exist; OK otherwise)
Restart last failed execution OK OK KO (non-identifying parameters are lost)
Restart previous failed executions OK KO KO (the incrementer shold not apply in this case)
@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Oct 1, 2018

Aritz Bastida commented

In order to satisfy the use cases I mentioned above, I've revised the code for both JobLauncherCommandLineRunner and JobLauncherCommandLineRunner, and implement my own "custom" versions.

Now, you can do any of the following:

  • Restart a new execution of a previously failed or stopped job instance. The new execution will always have the same identifying parameters, but different non-identifying parameters can be used per execution.
  • Launch a new job instance

The call to getNextParameters() will only be made in the latter case, which is the only situation where it makes sense. The incrementer is based on the latest job execution to calculate the next parameters (similar to a database sequence). Note that the parameters modified in this method override the provided ones (typically via command line arguments). The addJobParametersIfAbsent() method is no longer necessary, because this was only necessary for a restart.

Any attempt to launch a previously completed job instance or a non-restartable one is detected by JobLauncher (with JobInstanceAlreadyCompleteException and JobRestartException, respectively), so it need not be checked in JobLauncherCommandLineRunner or JobParametersBuilder.

All in all, I think that the separation of concerns are clearer with this design, although, if accepted, the change would impact at least three projects (spring-batch, spring-boot and spring-cloud-task). Feedback is welcome!

By the way, I'm assuming that JobLauncherCommandLineRunner is the preferred option instead of the older CommandLineJobRunner. Am I right?

 

 JobLauncherCommandLineRunner:

protected void execute(Job job, JobParameters jobParameters)
  throws JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException, JobParametersInvalidException, JobParametersNotFoundException {
		
  JobExecution matchingExecution = jobRepository.getLastJobExecution(job.getName(), jobParameters);
  if (matchingExecution == null) {  //New job instance, so get next parameters
    jobParameters = new JobParametersBuilder(jobParameters, this.jobExplorer).getNextJobParameters(job).toJobParameters();			
  }		

  JobExecution execution = this.jobLauncher.run(job, jobParameters);
  if (this.publisher != null) {
    this.publisher.publishEvent(new JobExecutionEvent(execution));
  }
}	

JobParametersBuilder:

public CustomJobParametersBuilder getNextJobParameters(Job job) {
		
  if (this.jobExplorer == null) {
    throw new IllegalStateException("A JobExplorer is required to get next job parameters");
  }

  JobParametersIncrementer incrementer = job.getJobParametersIncrementer();		
  if (incrementer == null) {
    return this;
  }
		
  JobExecution prevExec = getPreviousJobExecution(job);		
  JobParameters prevParams = (prevExec != null) ? prevExec.getJobParameters() : new JobParameters();
  JobParameters nextParams = incrementer.getNext(prevParams);
		
  this.parameterMap = addJobParameters(nextParams).toJobParameters().getParameters();		
  return this;
}
	
	
private JobExecution getPreviousJobExecution(Job job) {
		
  List<JobInstance> lastInstances = this.jobExplorer.getJobInstances(job.getName(), 0, 1);		
  if (lastInstances.isEmpty()) {
    return null;
  }
		
  List<JobExecution> previousExecutions = this.jobExplorer.getJobExecutions(lastInstances.get(0));
  if (previousExecutions.isEmpty()) {
    return null;
  }
		
  return previousExecutions.get(0); 		
}

 

 

 

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Oct 4, 2018

Minhyeok Jeong commented

Aritz Bastida, Thanks for your long explanation. I've been watching your lots of editing.

As you mentioned, this PR does not care about non-identifying parameters. But I agree that lost of non-identifying parameters is another problem.

Although you suggest some customized codes for JobLauncherCommandLineRunner and JobParametersBuilder in different point of view, I don't think it is such a complicated issue.

If the problem is the lost of non-identifying parameter then we can simply fix it with just removing removeNonIdentifying(this.parameterMap); line in JobParametersBuilder#getNextJobParameters.

public JobParametersBuilder getNextJobParameters(Job job) {
    if (lastInstances.isEmpty()) {
        ...
    }
    else {
        ...
        else if (isStoppedOrFailed(previousExecution) && job.isRestartable()) {
            // Retry a failed or stopped execution
            nextParameters = previousExecution.getJobParameters();
            ////////// DELETE THE LINE BELOW
            // Non-identifying additional parameters can be removed to a retry
            removeNonIdentifying(this.parameterMap);
            //////////
        }
    }
    ...
}

Then you can restart a failed job with new non-identifying parameters. But I'm not sure whether this is intentional behavior, since, at least, Spring team intentionally added this method (removeNonIdentifying()).

Documentation about non-identifying parameters are not enough in official site, so I want them to add more explanations about non-identifying parameters and its behavior, especially when the batch restarts. 

Another thing that I have different thought from you is that the modification of JobLauncherCommandLineRunner which is a part of Spring Boot. One of the goals of Spring Boot is to help run Spring Batch conveniently, but not get involved in its behavior. If JobLauncherCommandLineRunner determines whether to retrieve previous JobParameters or not, Spring Boot is concerned too much about the behavior(or design) of Spring Batch. Therefore I think the role of JobLauncherCommandLineRunner is enough as it stands now.

Anyway, Mahmoud Ben Hassine has read your comments and he will take a look at it. Hope the issue gets resolved soon.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Oct 4, 2018

Aritz Bastida commented

Hi Minhyeok Jeong! Thanks for your feedback. But, I'm sorry, I still think that the PR tries to "hide" the design problem with getNextJobParameters(), and the fix you are suggesting is an indication why.

The incrementer should (only) be called to calculate the "next" parameters in new job instances, as stated in the Spring Batch documentation (JobParameterIncrementer). Now, let's take a job with the same incrementer (i.e. auto-generated "run.id" parameter) as in the SB example, and suppose the following two situations:

  • The last execution failed and you want to run a new instance: It won't work, because the job incrementer will not be called, and the previous job parameters will be reused.
  • The last execution completed and you want to restart a previous failed instance. In this case, it will work, yes, but in a quite convolute way. We will call the incrementer to calculate a "next" instance but then try to revert it giving precedence to externally provided parameters. This is like calling SEQUENCE.nextval for a database update.

Actually, there could be quite another side effect. In the case "run.id" was provided as command line parameters, these would take precedence from the ones calculated by the incrementer, something we probably don't want.

I take the point that Spring Boot should not be to concerned too much about Spring Batch internals, but, well, a bean called JobLauncherCommandLineRunner must at least know how to launch a job, and, as it currently stands, that includes calculating next parameters. The problem is it always calls getNextJobParameters() no matter whether it's a restart or not, even if only needed for new instances (as stated in the SB ref guide). It is also quite strange that the next parameters sometimes are previous parameters (i.e. when the last execution failed).

To conclude, the code to find out whether it's a restart or not should be outside getNextJobParameters(), either in JobLauncherCommandLineRunner or, if we want to keep it internal to Spring Batch, within the JobLauncher itself.  Something like:

JobExecution execution = this.jobLauncher.run(job, jobParameters, true);

...the third parameter being a boolean called "useIncrementer".

Internally, that would use the job repository to find out whether it's a restart or not, and call JobParametersBuilder.getNextJobParameters() only when necessary. In fact, SimpleJobLauncher already does the first thing, so the change would be simple enough.

If you'd like to, I can contribute with a PR to prove my point. :)

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Oct 10, 2018

Minhyeok Jeong commented

Aritz Bastida, It look like we have a different point of view on this issue.
The name of getNextJobParameters() is really similar to `JobParametersIncrementer#getNext(JobParameters). getNext() does not always return the next value, but it also returns the initial value when previous parameter does not exist. The word 'next' does not mean 'just next', but it also implies do some other things in sort of exceptional case. In a similar way getNextJobParameters() does not return just incremented values, but it returns the parameters for current trial. It could be either a really (incremented) next parameter or the one from previous failed job. In this sense, its Javadoc says \'Initializes the JobParameters based on the state of the Job.'`

You may still argue about its name-next-, but I think it is reasonble, as you stated JobParametersIncrementer, getNextJobParameters() has a similar name. And you know the purpose of getNext(), I believe you can also accept getNextJobParameters(). (But who knows? If Spring Team would change its name or design.)

Anyway, I respect your opinion and it is good to propose your solution in this place. However, since we think in a different way, it would be better to create another PR and leave the link in this jira ticket. In your explanation, you would have to PR on both spring-batch and spring-boot project.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Oct 10, 2018

Aritz Bastida commented

Hi Minhyeok Jeong. That's right, it seems we have a different opinion about what "next" means in Spring Batch. Historically, when using launchers like CommandLineJobRunner or JobService, the incrementer's getNext() method was only invoked for new instances, never for a restart. And there was a clear distinction: for example, CommandLineJobRunner has a explicit command line argument "-next". It seems that in Spring Boot this meaning has somehow changed.

Anyway, we are deviating from the main subject. As I pointed out in my last email, there is an additional problem with the fix you are suggesting for getNextJobParameters(), namely: If the last execution failed and you want to run a new instance, then the incrementer will never be called. Conversely, if you want to restart a previously failed execution but the last execution completed, the incrementer will be called (even if not necessary).

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Oct 10, 2018

Aritz Bastida commented

BTW, I will be glad to share a PR, when I come back from holidays in 3 weeks ;)

As discussed, this issue can be solved in many ways, but somehow the first think upon a launch request must be to distinguish between restart and new instance, and for that we should use some DAO method, such as jobRepository.getLastJobExecution().

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Oct 10, 2018

Mahmoud Ben Hassine commented

Thank you all for this interesting analysis and constructive discussion!

I'm currently working on this issue (I wrote a test suite here) and testing both PRs currently open for this problem (#625 and #630).

I will share more details soon about how we would approach this issue.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Oct 11, 2018

Mahmoud Ben Hassine commented

There are a lot of details discussed here so I'll try to summarize things. But first, let me give some important facts that might influence the solution to this issue:

  • Fact 1: The JobParametersIncrementer is used only by the CommandLineJobRunner (when using "-next" option) and the JobOperator#startNextInstance. Both usages are explicit about starting the next instance and not re-starting a previous failed execution. The incrementer is not used by the JobLauncher, so if the user wants to start the next instance of a job with the JobLauncher, he has to manually increment the parameters and pass them to the job launcher or let someone else do it (like it was the case for example with Spring Boot, but this is not the case anymore since the code moved to Spring Batch).
  • Fact 2: The logic of getting the parameters for the next instance is duplicated in 3 places: SimpleJobOperator, JobParametersBuilder and CommandLineJobRunner. This code is a good candidate to be factored out in a separate class (as mentioned here and here).
  • Fact 3: Spring Boot systematically calls getNextJobParameters regardless if it is a restart or not. And the code that has been moved from Boot to Batch may call the incrementer while it shouldn't.

That said, there are two things that need to be addressed:

  • A bug: Running a job with a different set of identifying parameters should not reuse the parameters of the last failed execution
  • A design improvement: Who is responsible of incrementing the job parameters and where to put this logic (in the JobParametersBuilder or JobLauncher or somewhere else)?

In regards to the PRs:

  1. PR #630

The method addJobParametersIfAbsent is used in getNextJobParameters so it is supposed to be used in a "start next instance scenario" (equivalent to CommandLineJobRunner with "-next"). This method is not correct because it says in the Javadoc: "If the map previously contained a mapping for the key, the new value is ignored." while the new value should override the existing one (See Javadoc of CommandLineJobRunner: "If the -next option is used the parameters on the command line (if any) are appended to those retrieved from the incrementer, overriding any with the same key."

  1. PR #625

This PR basically reverts this commit and fixes the issue, but it changes an assertion on an existing test on which I don't agree. I will continue the review of this PR on github.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Oct 24, 2018

Mahmoud Ben Hassine commented

After further investigation, this issue should be fixed in both Spring Batch and Spring Boot. The JobParametersBuilder#getNextJobParameters method (initially moved from boot) was dealing with both restartability and getting the parameters of the next instance. This made sense in boot but now that it moved to batch, it became inconsistent with the behaviour of CommandLineJobRunner and JobOperator. I explained in more details why the fix should be done on both sides here and opened a PR for each project.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Nov 6, 2018

Aritz Bastida commented

Hi Mahmoud Ben Hassine!! Thanks for considering all the feedback. Glad to know the issue has already been resolved. 

By the way, although not mentioned in your last comment, I see that this change affected also TaskJobLauncherCommandLineRunner in spring-cloud-task project, just for the record. :)

 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.