numa container in a "blackhole" accumulator devours memory
Describe the bug
(numaflow-python: v0.13.0)
I made a blackhole accumulator based on the example streamsorter. It takes two input streams like the original streamsorter but emits no output stream.
I found that my blackhole accumulator devoured memory. Grafana told me that the memory consumption of the numa container in the accumulator vertex did not reach a ceiling but continued to increase for 2 hours at least.
The behavior cannot be seen on the original streamsorter. I built it by myself, run the example pipeline with it, and found that the memory consumption reached a ceiling.
I want to know whether this is an actual issue or my fault when using accumulator. Is there any constraint such as "every input datum to an accumulator should be output" or "the number of output datum from an accumulator should be equal to that of input datum"? I think this matters when we make a "multiplexer" accumulator which omits one of the two input data, or a "cross-join" accumulator which put two different input data together into one output datum.
numaproj/numaflow#3262 may be a related issue.
To Reproduce
- Build the image (original streamsorter or modified blackhole) and push it to a registry.
- Deploy the example pipeline with the built image.
- Start sending 2MB data chunks to the two HTTP sources repeatedly.
- Watch the memory consumption of the accumulator vertex.
I have my branch blackhole-v0.13.0 for reproduction. Please (fix build configuration if needed and) build the original streamsorter image on tmenjo@cd06382, and blackhole on tmenjo@00cd516. The branch also contains a simple shell script for reproduction.
Expected behavior
The memory consumption of the accumulator vertex reaches a ceiling.
Screenshots
blackhole
The memory consumption of the numa container in the blackhole accumulator vertex continued to increase for 2 hours at least:
streamsorter
The memory consumption reached a ceiling within 20 minutes.
Environment
- Kubernetes: v1.35.2
- Numaflow: v1.7.1
- Numaflow-Python: v0.13.0
Message from the maintainers:
Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.
numa container in a "blackhole" accumulator devours memory
Describe the bug
(numaflow-python: v0.13.0)
I made a blackhole accumulator based on the example streamsorter. It takes two input streams like the original streamsorter but emits no output stream.
I found that my blackhole accumulator devoured memory. Grafana told me that the memory consumption of the numa container in the accumulator vertex did not reach a ceiling but continued to increase for 2 hours at least.
The behavior cannot be seen on the original streamsorter. I built it by myself, run the example pipeline with it, and found that the memory consumption reached a ceiling.
I want to know whether this is an actual issue or my fault when using accumulator. Is there any constraint such as "every input datum to an accumulator should be output" or "the number of output datum from an accumulator should be equal to that of input datum"? I think this matters when we make a "multiplexer" accumulator which omits one of the two input data, or a "cross-join" accumulator which put two different input data together into one output datum.
numaproj/numaflow#3262 may be a related issue.
To Reproduce
I have my branch blackhole-v0.13.0 for reproduction. Please (fix build configuration if needed and) build the original streamsorter image on tmenjo@cd06382, and blackhole on tmenjo@00cd516. The branch also contains a simple shell script for reproduction.
Expected behavior
The memory consumption of the accumulator vertex reaches a ceiling.
Screenshots
blackhole
The memory consumption of the numa container in the blackhole accumulator vertex continued to increase for 2 hours at least:
streamsorter
The memory consumption reached a ceiling within 20 minutes.
Environment
Message from the maintainers:
Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.