Skip to content

Allow for configuring concurrency and batch size for all sinks in self-hosted #1088

@acco

Description

@acco

All sinks have the following tunable config settings:

  • concurrency
  • batch_size
  • batch wait (timeout)

We specify defaults for all sinks here:
https://github.com/sequinstream/sequin/blob/main/lib/sequin/runtime/sink_pipeline.ex#L205-L237

"processors" is CPU-intensive work, like preparing JSON. Batchers batch one or more messages from processors, and are what do the I/O work to write to sink destinations.

Sink modules can overwrite these settings, like this:
https://github.com/sequinstream/sequin/blob/main/lib/sequin/runtime/kafka_pipeline.ex#L30-L38

Note the hack there to make Kafka super concurrent!

What we should do instead:

  • sink_consumers already has a batch_size column/setting
  • we should add processor_concurrency and batcher_concurrency and batch_wait to sink_consumers
    • we can set what we think are reasonable defaults for each one (like this)
  • in self-hosted Sequin, all of these should be configurable. @acco can determine the copy, as well as where each field ought to live.

For cloud, we'll tackle separately. So, these fields should only be shown on self-hosted for now.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions