-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPU stuck with large consumer and producer #2116
Comments
This happens when there is no more disk space:
Produce works fine.
But consume from tail doesn't work
Memory is stable after reaching ~900M at one point:
Restarting SPU fixed issue with tail:
|
In another run, SPU gets stuck with half of storage occupied:
|
Looks like the disconnect is due to failure in the SEND_FILE call:
There are 2 problems when this happens:
In the https://github.com/infinyon/fluvio/blob/master/crates/fluvio-spu/src/services/public/stream_fetch.rs
Also, SPU only occupies around 32M of physical memory. |
Attempt to consume with tail option result in:
This occurs both in OSX (M1 Macbook Pro) and Linux. |
Was able to reproduce the issue by just running a large number of producers only as suspected.
Consuming from start works:
But at a certain period, consume with tail either hangs (SPU doesn't respond) or return this error message:
When hangs, disconnecting client cause following error in SPU:
|
The underlying issue is that in high load situation, a "batch" is overwritten by next batch as here:
Here, batch
Write here is designed for general purpose file system. Tokio file implementation suffers sames problem. In our case, we open file using With std file system write, the problem seems to be gone. Test now run 10+ hours without interruption with multiple runs. |
run long-running longevity test:
After while, test failed. It seems that SPU is stuck. Consume is stuck:
Produce seems to be working:
And consuming and producing to different topic is working.
Logs:
The text was updated successfully, but these errors were encountered: