Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: exposing the option to parallelize actions/triggers #120

Open
hannah-bulmer opened this issue Nov 19, 2019 · 4 comments
Open

Proposal: exposing the option to parallelize actions/triggers #120

hannah-bulmer opened this issue Nov 19, 2019 · 4 comments

Comments

@hannah-bulmer
Copy link

By exposing functionality that allows multiple messages to run through actions in parallel (when it makes sense, for example, in lookup actions), we can increase the time efficiency of performing these actions.

To do this, we want to expose the RABBITMQ_PREFETCH_SAILOR env variable, which controls how many messages are read from the queue at once. This variable is currently defaulted to 1. It does not make sense for this to be set as an env variable, as it is dependent per action/trigger and not per component.

We could implement the following changes to allow parallelizing actions run on our platform:

  • read an optional parallelize: bool field in from component.json inside every action/trigger
  • if this field exists, read in a number from component.json that represents the number of messages that should be read in parallel at once. This number should default to a value greater than 1 (2?) if not set. It should also be able to be configured as a config field by a user, as it would be dependent on the amount of computing power available
  • the RABBITMQ_PREFETCH_SAILOR should be set now to this value to run that many messages available, each time. It should be deprecated as an env variable as it does not make sense to use in that context
  • documentation should be added to explain what actions should and should not be run in parallel
  • investigation should be done to determine the capacities of running messages in parallel and how to calculate them
@jhorbulyk
Copy link

  • read an optional parallelize: bool field in from component.json inside every action/trigger
  • if this field exists, read in a number from component.json that represents the number of messages that should be read in parallel at once. This number should default to a value greater than 1 (2?) if not set. It should also be able to be configured as a config field by a user, as it would be dependent on the amount of computing power available

Or just have a single number that defaults to 1 if not present.

@zubairov
Copy link
Contributor

It's a good idea to optimize the processing, however drawbacks of this approach are:

  1. Increase memory usage - we need to keep multiple (potentially large) messages in memory
  2. Decreased reliability in case of container failures - instead of potentially re-processing a single message (e.g. when container failed before rabbitmq ack.) we will have to re-process potentially multiple messages

Taking drawbacks into account I don't' think the component developer can make this decision

@jhorbulyk
Copy link

I potentially also makes sense to allow this value to be dynamically configured at runtime based on a specific cfg. Perhaps add a way to set this in either the init() or startup() functions?

@jhorbulyk
Copy link

Increase memory usage - we need to keep multiple (potentially large) messages in memory

Currently, our platform doesn't really support messages above ~3.5 MB in side. So even if we fetched 10 messages in parallel, at most 35 MB of RAM would be consumed which is still less than the default 256 MB allocated to a component.

Decreased reliability in case of container failures - instead of potentially re-processing a single message (e.g. when container failed before rabbitmq ack.) we will have to re-process potentially multiple messages

I don't understand why the costs of doing this would be a significant concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants