Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PK Chunking #169

Closed
mattandneil opened this issue Nov 19, 2018 · 2 comments
Closed

Support PK Chunking #169

mattandneil opened this issue Nov 19, 2018 · 2 comments

Comments

@mattandneil
Copy link

mattandneil commented Nov 19, 2018

Large extracts (10M - 100M) are problematic due to query timeouts. PK Chunking resolves this by instructing Bulk API to split a query into several parts, each having a maximum of 250,000 rows, eg:

WHERE Id >= '00vD000003Xwm4z' and Id < '00vD000003YaNYE' ...
WHERE Id >= '00vD000003YqVqq' and Id < '00vD000003YwGXr' ...
WHERE Id >= '00vD000003YyaIS' and Id < '00vD000003ZDtyQ' ...
WHERE Id >= '00vD000003bv6oI' and Id < '00vD000003bxls6' ...

The server implements all this splitting automatically when the PK Chunking header is present. But the data loader client needs additional logic to download and recombine the multiple batch results.

What's the easiest way to implement this? Maybe subclassing Bulk Query Visitor?

pk-chunking-ui

@xbiansf
Copy link
Contributor

xbiansf commented Nov 19, 2018

This is a tough problem, which really depending on the actual queried results on how to split the data. You can subclass this query visitor by using some your own specific strategy that suits your own business query. Then you need to aggregate the query result as well.

@mattandneil
Copy link
Author

Duplicate of #138, see implementation in 23f4582 and c697fc8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants