Proposal: exposing the option to parallelize actions/triggers #120

hannah-bulmer · 2019-11-19T10:10:29Z

By exposing functionality that allows multiple messages to run through actions in parallel (when it makes sense, for example, in lookup actions), we can increase the time efficiency of performing these actions.

To do this, we want to expose the RABBITMQ_PREFETCH_SAILOR env variable, which controls how many messages are read from the queue at once. This variable is currently defaulted to 1. It does not make sense for this to be set as an env variable, as it is dependent per action/trigger and not per component.

We could implement the following changes to allow parallelizing actions run on our platform:

read an optional parallelize: bool field in from component.json inside every action/trigger
if this field exists, read in a number from component.json that represents the number of messages that should be read in parallel at once. This number should default to a value greater than 1 (2?) if not set. It should also be able to be configured as a config field by a user, as it would be dependent on the amount of computing power available
the RABBITMQ_PREFETCH_SAILOR should be set now to this value to run that many messages available, each time. It should be deprecated as an env variable as it does not make sense to use in that context
documentation should be added to explain what actions should and should not be run in parallel
investigation should be done to determine the capacities of running messages in parallel and how to calculate them

The text was updated successfully, but these errors were encountered:

jhorbulyk · 2019-11-19T10:50:29Z

read an optional parallelize: bool field in from component.json inside every action/trigger

if this field exists, read in a number from component.json that represents the number of messages that should be read in parallel at once. This number should default to a value greater than 1 (2?) if not set. It should also be able to be configured as a config field by a user, as it would be dependent on the amount of computing power available

Or just have a single number that defaults to 1 if not present.

zubairov · 2019-11-19T11:01:19Z

It's a good idea to optimize the processing, however drawbacks of this approach are:

Increase memory usage - we need to keep multiple (potentially large) messages in memory
Decreased reliability in case of container failures - instead of potentially re-processing a single message (e.g. when container failed before rabbitmq ack.) we will have to re-process potentially multiple messages

Taking drawbacks into account I don't' think the component developer can make this decision

jhorbulyk · 2019-11-19T11:13:51Z

I potentially also makes sense to allow this value to be dynamically configured at runtime based on a specific cfg. Perhaps add a way to set this in either the init() or startup() functions?

jhorbulyk · 2019-11-19T11:17:12Z

Increase memory usage - we need to keep multiple (potentially large) messages in memory

Currently, our platform doesn't really support messages above ~3.5 MB in side. So even if we fetched 10 messages in parallel, at most 35 MB of RAM would be consumed which is still less than the default 256 MB allocated to a component.

Decreased reliability in case of container failures - instead of potentially re-processing a single message (e.g. when container failed before rabbitmq ack.) we will have to re-process potentially multiple messages

I don't understand why the costs of doing this would be a significant concern.

hannah-bulmer added the enhancement label Nov 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: exposing the option to parallelize actions/triggers #120

Proposal: exposing the option to parallelize actions/triggers #120

hannah-bulmer commented Nov 19, 2019

jhorbulyk commented Nov 19, 2019

zubairov commented Nov 19, 2019

jhorbulyk commented Nov 19, 2019

jhorbulyk commented Nov 19, 2019

Proposal: exposing the option to parallelize actions/triggers #120

Proposal: exposing the option to parallelize actions/triggers #120

Comments

hannah-bulmer commented Nov 19, 2019

jhorbulyk commented Nov 19, 2019

zubairov commented Nov 19, 2019

jhorbulyk commented Nov 19, 2019

jhorbulyk commented Nov 19, 2019