Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JdbcJobInstanceDao.getJobInstances() does not optimize at the database level [BATCH-1784] #1805

Closed
spring-projects-issues opened this issue Aug 22, 2011 · 4 comments

Comments

@spring-projects-issues
Copy link
Collaborator

@spring-projects-issues spring-projects-issues commented Aug 22, 2011

Clement Pang opened BATCH-1784 and commented

Our deployment of Spring Batch is seeing monotonically increasing row reads and we were able to trace the issue down to JdbcJobInstanceDao.getJobInstances() loading all job instances when the caller is requesting for the last one. Essentially the class is doing the (start, count) operations in Java whereas it would have been much more efficient to do the filtering at the JDBC-level (by passing that information to the database driver). We have jobs that have thousands of instances and just to run the next execution, the system would query for all past instances of it just to grab the very last one in order to increment the parameters.


Affects: 2.1.8

Issue Links:

  • BATCH-2389 Improve select queries in JdbcJobInstanceDao by introducing limit clause
    ("is duplicated by")

Referenced from: pull request #719, and commits 7c1fbed

Backported to: 4.2.0.M3

5 votes, 3 watchers

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Aug 22, 2011

Clement Pang commented

The offending query in query logs:

"SELECT JOB_INSTANCE_ID, JOB_NAME from BATCH_JOB_INSTANCE where JOB_NAME = '<job_name>' order by JOB_INSTANCE_ID desc"

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Oct 17, 2011

Dave Syer commented

Thanks for the analysis. It might be time to to some optimization here. Feel free to send a pull request.

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Jun 2, 2012

Shaun McBride commented

We're having a similar with job executions with large numbers of step executions created by partition handlers (things grind to a halt after about 2000 steps). We've gotten around this a number of ways: restarting a new instance after a step limit is reached, handling a single batch (typically a file) per job instance, or just processing multiple batches in a single step. The first approach is a bit of a hack and still causes a degredation in DB performance, the second approach means we need to purge BATCH_JOB_INSTANCE more than we'd like, and the third approach results in pretty ugly Java code.

@spring-projects-issues
Copy link
Collaborator Author

@spring-projects-issues spring-projects-issues commented Jun 20, 2019

Mahmoud Ben Hassine commented

Hi Clement Pang

Can you please provide more details (which DB server you use, how many instances you have, how do you start the next job instance) ? Can you confirm the issue is still happening with v4.x?

We have jobs that have thousands of instances and just to run the next execution, the system would query for all past instances of it just to grab the very last one in order to increment the parameters.

What is the concern exactly? Is it the query that starts to become slow after a given number of instances or is it the memory consumption that bothers you? I tried to reproduce the issue (with h2 and mysql, see this repo for detailed steps) with 100_000 instances and I don't see any performance problem in regards to the response time when starting the next instance (with both JobOperator#startNextInstance and CommandLineJobRunner).

FTR, there is an ongoing effort to optimize the aforementioned query (See here) in order to do the filtering in the database and grab only the last instance, but I want to make sure to address the right problem before applying any change.

Kind regards,
Mahmoud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants