Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OraclePagingQueryProvider reads incorrect number of rows based on Sort Key and PageSize [BATCH-1845] #1744

Closed
spring-projects-issues opened this issue Mar 16, 2012 · 5 comments
Labels
in: core status: declined type: bug

Comments

@spring-projects-issues
Copy link
Collaborator

spring-projects-issues commented Mar 16, 2012

David Leo King opened BATCH-1845 and commented

I listed this as critical because it causes a loss of data.

The situation that I am runnning into occurs when the sort key value has less rows than the page size.
The number of rows I obtain are significantly less as the page size for the job decreases.

I have a Table called CITY with columns CTY_NME and ST_CDE (Agency Code is just a random value).
Using the query "SELECT c.* FROM <Schema Name>.CITY c WHERE c.CTY_NME LIKE 'M%' AND c.ST_CDE LIKE 'C%' ORDER BY <Sort Key> ASC;" there are a total of 3 states (CA, CO, CT) and each CTY_NME and CTY_CDE are unique for all records.
For ST_CDE = CA there are 94 rows
For ST_CDE = CO there are 28 rows
For ST_CDE = CT there are 20 rows
For at total of 142 rows.

Now if I have the sort key be CTY_NME, there is a total of 142 rows returned no matter what the page size is (for this case I tested 93 and 29).

When I use ST_CDE as the Sort Key (having only 3 values) the number of rows I get are variable.
Page Size = 93 | 94 | 28 | 29
Row Count = 141 | 142 | 76 | 58

For page size 93 and 94, there are two pages. One where it picks up CA with 93 of its rows (missing the final CA row) and then CO and CT together (28 + 20 = 48 < 93). For a page size of 94, all 94 of CA rows are printed and so are CO (28) and CT (20). Again, it took two pages.

For page size of 28, the first 28 rows of CA are given, all of the 28 rows of CO are given with the 20 rows of CT is also given for a total of 28+28+20 = 76 (three pages total).
For page size of 29, the first 29 rows of CA are given, all of the 28 rows of CO are given with the first of the rows for CT given for a total of 29+28+1 = 58. (three pages total)

At this time I am unsure if this is at the OracleQueryProvider level or much deeper. I couldn't find where the error would occur.

I also tested the Oracle JDBC by itself and also obtained the correct number of rows.

I've attached several documents of my personal troubleshooting. The names that start with ST_CDE are results from a sort key of ST_CDE and the CTY_NME, CTY_NME. The numbers are the page size used. JDBCExperiment was the class I used to test the driver. Logs contain the values printed to the console.


Affects: 2.1.8

Attachments:

@spring-projects-issues
Copy link
Collaborator Author

spring-projects-issues commented Mar 29, 2012

@spring-projects-issues
Copy link
Collaborator Author

spring-projects-issues commented Mar 29, 2012

David Leo King commented

Well don't I have egg on my face. I guess my project won't be able to use paging at all then. Thanks for looking that up for me, I appreciate it.

But still, it seems like whatever query I would want to use on a database, I would want it to return as if the query wasn't broken into pages. In this case it makes paging a whole lot worse off, specifically because more than one sort key cannot be provided. There are a lot of tables I've seen or do not have control over that have multiple primary keys and makes using paging almost impossible.

To solve the unqiue sort key issue would also be to solve the multiple sort key issue.

I am also willing to guess that sorting usually isn't a concern for batch processing which is why it is so restricted in paging because it isn't of a big concern.

When I have some time, I will look into find a way for multiple sort keys to work in paging. In the mean time it looks like the only option is the Jdbc Cursor Item Reader.

@spring-projects-issues
Copy link
Collaborator Author

spring-projects-issues commented Apr 25, 2012

Morten Andersen-Gott commented

@David, can't you use the ROWID as the sortkey on oracle? that is if your requirements don't mandate a special ordering on columns with some sort of business logic? (eg. do you need to process the states in decending order or is the order irrelevant?)

@spring-projects-issues
Copy link
Collaborator Author

spring-projects-issues commented Apr 25, 2012

David Leo King commented

The above example was simple compared to some of the queries we are working. But it gets the point across where I may not be able to have any unique keys across multiple joins unless I use multiple rows as sort keys (City and State). In the project I have, the schema is very normalized so that things like city and state are located in different tables.

And for the record, ordering does matter. I have a unique pdf report writer that separates records onto different pages based on the current and previous items.

I've been busy with the actually building of the base designs and haven't had time to address this issue. I still plan on coming back to it once there is some free time though as deadlines approach it seems time gets away from me.

@spring-projects-issues
Copy link
Collaborator Author

spring-projects-issues commented Mar 18, 2014

Michael Minella commented

While the initial issue here works as designed, the other issue brought up in the comments was addressed by BATCH-1749

@spring-projects-issues spring-projects-issues added type: bug status: declined in: core labels Dec 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: core status: declined type: bug
Projects
None yet
Development

No branches or pull requests

1 participant