Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized JdbcPagingItemReader configuration for process indicator pattern [BATCH-2317] #1284

Open
spring-issuemaster opened this issue Nov 3, 2014 · 3 comments

Comments

@spring-issuemaster
Copy link
Collaborator

@spring-issuemaster spring-issuemaster commented Nov 3, 2014

Jimmy Praet opened BATCH-2317 and commented

For a new project I'm currently evaluating the performance of using JdbcPagingItemReader versus JdbcCursorItemReader.

The application will make use of a process indicator column in the input table, so saveState="false" will be configured.

In my tests the JdbcCursorItemReader is way faster (5x) than the JdbcPagingItemReader. But this is mostly due to the fact that the JdbcCursorItemReader is doing a simple "SELECT FROM

WHERE AND processed = 0".

The JdbcPagingItemReader however is doing "SELECT FROM

WHERE AND processed = 0 AND > ORDER BY ASC FETCH FIRST 1000 ROWS ONLY".

When working with a process indicator column these " > " and "ORDER BY ASC" clauses are actually not required.

After doing some local hacking to the JdbcPagingItemReader and PagingQueryProvider to remove the sort key condition and order by clause from the query, the performance is comparable to that of JdbcCursorItemReader for this scenario.

The process indicator column is a pattern that is being promoted in the spring batch reference manual, so I think it would be nice to have both standard JDBC reader implementations support this pattern in a performant way.


No further details from BATCH-2317

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Nov 3, 2014

Michael Minella commented

A couple initial thoughts:

  1. Thanks for doing this type of testing!
  2. So are you proposing that the sortKey and orderBy only be required when saveState is set to true and if it's false, allowing a "buyer beware" mode?
@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Nov 5, 2014

Jimmy Praet commented

Yes that was the approach I had in mind as well, PR: #352

I also added a question to the PR regarding the PagingQueryProvider.generateJumpToItemQuery method that could potentially be removed.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Feb 13, 2015

Jimmy Praet commented

When working with a process indicator column these " > " and "ORDER BY ASC" clauses are actually not required.

While this is true, even when using the process indicator pattern, there may be cases where you still want to do the ORDER BY. That is because the record processing order can significantly impact performance as well.

So, I think a more correct approach is, when you use the process indicator pattern:

  • saveState should be false
  • sortKeys should be considered optional
  • PagingQueryProvider.generateFirstPageQuery should be used for first and remaining pages, this gets rid of the unnecessary " > " filtering

But that approach would need a new dedicated 'processIndicator'=true configuration flag on the JdbcPagingItemReader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.