merge_insert spill configuration easily too small if server has many cores

When we create plans for merge_insert we run a datafusion plan to do the join and we enable spilling because we need to sort the joined stream by row address and if there is a ton of new data then that could be large.

However, when we run this plan we do not put any cap on the target number of partitions.  The default will use the number of CPUs available on the machine.

The memory reservation is then divided across all reservations which are mirrored across all partitions.  In other words, if there are 64 cores on the system, then it's possible the plan is divided into 128, or maybe even 192 different reservations.  Each reservation is then very tiny and the operation will fail with a resources exhausted error.

We could use the greedy memory pool instead of the fair memory pool.  This makes it so all of the reservations share the same limit.  However, this could still fail if all cores get used evenly for the same problem.

Instead, it might be a good idea to put a cap on the target number of partitions.  We don't really need much parallelism for this sort operation (this is independent of scan parallelism).  For example, a good cap might be 8 target partitions.  This could be done by adding a new field (target threads or something like that) to LanceExecutionOptions and setting it appropriately in merge_insert.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge_insert spill configuration easily too small if server has many cores #3601

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

merge_insert spill configuration easily too small if server has many cores #3601

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions