Support PK Chunking #169

mattandneil · 2018-11-19T02:48:09Z

Large extracts (10M - 100M) are problematic due to query timeouts. PK Chunking resolves this by instructing Bulk API to split a query into several parts, each having a maximum of 250,000 rows, eg:

WHERE Id >= '00vD000003Xwm4z' and Id < '00vD000003YaNYE' ...
WHERE Id >= '00vD000003YqVqq' and Id < '00vD000003YwGXr' ...
WHERE Id >= '00vD000003YyaIS' and Id < '00vD000003ZDtyQ' ...
WHERE Id >= '00vD000003bv6oI' and Id < '00vD000003bxls6' ...

The server implements all this splitting automatically when the PK Chunking header is present. But the data loader client needs additional logic to download and recombine the multiple batch results.

What's the easiest way to implement this? Maybe subclassing Bulk Query Visitor?

The text was updated successfully, but these errors were encountered:

xbiansf · 2018-11-19T17:58:46Z

This is a tough problem, which really depending on the actual queried results on how to split the data. You can subclass this query visitor by using some your own specific strategy that suits your own business query. Then you need to aggregate the query result as well.

mattandneil · 2018-11-19T20:24:36Z

Duplicate of #138, see implementation in 23f4582 and c697fc8

mattandneil closed this as completed Nov 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support PK Chunking #169

Support PK Chunking #169

mattandneil commented Nov 19, 2018 •

edited

xbiansf commented Nov 19, 2018

mattandneil commented Nov 19, 2018

Support PK Chunking #169

Support PK Chunking #169

Comments

mattandneil commented Nov 19, 2018 • edited

xbiansf commented Nov 19, 2018

mattandneil commented Nov 19, 2018

mattandneil commented Nov 19, 2018 •

edited