feat: Automatically resize blocks if they get too small #270

untitaker · 2023-06-29T06:48:52Z

This is a naive implementation that resizes input/output blocks if they
become too small.

Blocks are reallocated once the overflowing batch is successfully
drained.

This will make initial throughput of a consumer slower, but over time
hopefully all blocks eventually reach a state where they won't overflow
anymore.

Fix #248

This is a naive implementation that resizes input/output blocks if they become too small. Blocks are reallocated once the overflowing batch is successfully drained. This will make startup of a consumer slower, but over time hopefully all blocks eventually reach a state where they won't overflow anymore.

lynnagara · 2023-06-30T16:20:52Z

arroyo/processing/strategies/run_task_with_multiprocessing.py

+        resize_input_blocks: bool = False,
+        resize_output_blocks: bool = False,
+        max_input_block_size: Optional[int] = None,
+        max_output_block_size: Optional[int] = None,


With so many extra parameters, this seems like it would be more complicated rather than less to figure out what to pass here

yeah I just want to make this opt-in for now. I think long-term we should give input_block_size and default, and enable auto-resizing by default. Then we still have a lot of parameters, but people don't need to set any of them.

Right now they need to configure input_block_size, and we can't give them a default.

If the user knows what the max is supposed to be, what's the rationale for them to start with a smaller number at all, rather than just passing the max into input_block_size and output_block_size

I expect most users to not configure a max at all. I just added that because people were concerned about unbounded memory usage, but I would also be fine with not having a max parameter at all. I also believe passing 1GB or 500MB to max_output_block_size would be reasonable, but I don't think such a value would be reasonable for output_block_size.

lynnagara · 2023-06-30T16:25:16Z

arroyo/processing/strategies/run_task_with_multiprocessing.py

+                "arroyo.strategies.run_task_with_multiprocessing.batch.input.resize"
+            )
+            new_input_block = self.__shared_memory_manager.SharedMemory(
+                old_input_block.size * 2


This seems like it could trigger quite a few resizes depending on the values passed. Is there another way to calculate the sharedmemory size based on prior seen message sizes?

I think that's a good idea, we can keep track of the batch size and use it as input for reallocating. I'll have to see how to do that.. I don't think I can use our existing batch size metrics because they don't emit the right value when batches are split up due to input/output overflow.

fpacifici · 2023-06-30T18:47:18Z

arroyo/processing/strategies/run_task_with_multiprocessing.py

+            self.__metrics.increment(
+                "arroyo.strategies.run_task_with_multiprocessing.batch.input.resize"
+            )
+            new_input_block = self.__shared_memory_manager.SharedMemory(


Knowing that memory settings for Kubernetes are static (we cannot resize the memory assigned to a pod) and that you cannot exceed the memory allocated (OOMKill), do we have a reason to ever resize our input/output blocks instead of just taking all the available memory at the start ? It seems it would be much easier to just statically create memory blocks that used all the available memory of the pod and not try to change anything.

do we have a reason to ever resize our input/output blocks instead of just taking all the available memory at the start

there's still a ton of stuff outside of arroyo that takes memory, but even if not I don't think it's a good idea to consume significantly more memory than needed. When we deploy a new service we configure k8s limits based on projected memory usage, and later optionally adjust based on average memory usage. If arroyo defaults to consuming all pod memory, we lose insight into how much memory we actually need and how much cost we could save.

but even if not I don't think it's a good idea to consume significantly more memory than needed

Why? That memory is already allocated and not usable by others anyway.
If you are concerned of not having visibility on the actual usage, why not having specific metrics for that? It seems an easier, safer system with fewer moving parts and fewer failure modes.

The goal is to eliminate tuning parameters the user has to tweak to get optimal consumer performance. If we manage to do that, then we can start thinking of removing some of those options (as they are not required and have optimal defaults) and therefore moving parts in that sense. I would be aiming for these defaults for all consumers specifically:

auto_resize=True for both input/output block

max_input_block_size/max_output_block_size at either None or 1GB each, or some limit that only a really misbehaving consumer would hit

input_block_size = output_block_size = 10MB so we can be somewhat sure the block can hold a single message

An engineer of the product team should not have to think about how much memory their pod is going to consume and tune Arroyo parameters based off of it.

I don't think this is possible at all with a static approach, because it requires the author of the consumer to think about how much memory their pod has (unclear, gets adjusted by ops), how much their regular code consumes per-process (entirely unclear, especially in a shared codebase like sentry where tons of random stuff gets imported at every CLI invocation) and then think about how much of the remainder can be allocated to input/output blocks.

If you are suggesting a static approach that is also zero-config, I don't know how that would work. Does it mean that arroyo determines free memory and allocates it evenly divided for input/output blocks? And is it evenly, or do input blocks get more than output blocks? And what does it do on a dev machine where there's no k8s request/limit per-consumer?

If you are concerned of not having visibility on the actual usage, why not having specific metrics for that?

I think this is possible but it feels like Java/node heap tuning parameters and I would like to avoid that sort of experience as well.

untitaker · 2023-08-31T11:17:51Z

after talking to lyn, we decided that it makes more sense to expose input_block_size=None (as default) instead of adding additional flags. The initial block sizes are hardcoded at 16kB in those cases.

There was also an idea to use the message size to guide by how much the input block should be reallocated. I tried this and found it was quite hard to implement, I don't think it will significantly impact startup performance.

untitaker requested a review from a team as a code owner June 29, 2023 06:48

untitaker mentioned this pull request Jun 29, 2023

Arroyo provides little insight into what is actually being stored in input/output buffers #271

Open

untitaker added 4 commits June 29, 2023 08:55

try with larger input block size, because linux

be26498

bump it further

88263e3

bump another thing

3c73f4d

ho boy

54f1c6c

lynnagara reviewed Jun 30, 2023

View reviewed changes

fpacifici reviewed Jun 30, 2023

View reviewed changes

fold resize flag into input_block_size

5cbe431

untitaker requested a review from lynnagara August 31, 2023 11:17

lynnagara approved these changes Aug 31, 2023

View reviewed changes

untitaker merged commit dd81bc9 into main Aug 31, 2023
8 checks passed

untitaker deleted the feat/multiprocessing-block-resize branch August 31, 2023 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Automatically resize blocks if they get too small #270

feat: Automatically resize blocks if they get too small #270

untitaker commented Jun 29, 2023 •

edited

Loading

lynnagara Jun 30, 2023

untitaker Jun 30, 2023

lynnagara Jun 30, 2023

untitaker Jun 30, 2023

lynnagara Jun 30, 2023

untitaker Jun 30, 2023

fpacifici Jun 30, 2023

untitaker Jun 30, 2023

fpacifici Jun 30, 2023

untitaker Jun 30, 2023 •

edited

Loading

untitaker Jun 30, 2023

untitaker commented Aug 31, 2023

feat: Automatically resize blocks if they get too small #270

feat: Automatically resize blocks if they get too small #270

Conversation

untitaker commented Jun 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

untitaker Jun 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

untitaker commented Aug 31, 2023

untitaker commented Jun 29, 2023 •

edited

Loading

untitaker Jun 30, 2023 •

edited

Loading