Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arroyo provides little insight into what is actually being stored in input/output buffers #271

Open
untitaker opened this issue Jun 29, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@untitaker
Copy link
Member

While working on #270 I noticed that with certain transformation functions, the returned bytestring did not get stored in the output buffer at all. It's not clear to me why this happens.

I noticed a pattern where returning a plain b"x" * 20 did not get stored in the output buffer, but wrapping it in a KafkaPayload like the tests do in #270 do would store the bytestring out of band.

This behavior could explain some mysterious performance regressions we had in the past (output buffer not being used). Hopefully the new output buffer metrics will provide insight into this. If that metric is much lower than input batch size in bytes, we have a problem in that area.

We also don't know how much data is being transmitted in-band via pool.submit vs what is being sent over the buffer. Right now we only emit metrics for the size of the out-of-band buffer, not whatever else we pickle.

@untitaker untitaker added the bug Something isn't working label Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant