-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segfault in the latest version of livepeer_bench #2211
Comments
Couldn't reproduce it on 2 GPU server - had to reduce number of sessions to 40, because there's not enough VRAM for 70x4, though. I've noticed that OOM error doesn't always has correct message, it may just fail anywhere in CUDA code. Maybe the reason this test is not passing anymore is some new feature which slightly increased VRAM usage? |
Should we rename this issue to |
We can reproduce the error using command:
The error code and message can vary depending where out-of-memory happens. Here is example of OOM in rescaling:
@cyberj0g Is the fix needed in the code or we handle this with proper configuration of session limit? |
Seems like there are two things worth looking into here:
For 1:
From glancing at the "Transcoder" notes for releases going back to v0.5.17 I noticed that in v0.5.19 enabled B-frames in encoded outputs - perhaps that could increase VRAM usage? We could check if the livepeer_bench run passes with https://github.com/livepeer/go-livepeer/releases/tag/v0.5.17 and fails with https://github.com/livepeer/go-livepeer/releases/tag/v0.5.19 |
@AlexKordic great analysis. On 2: yes, I linked it dynamically, you can get static build by calling |
@cyberj0g thanks for confirmation. I was in doubt if 2 installed |
Closing because it doesn't seem like the issue was reproducible. Feel free to re-open if this issue re-emerges. |
This seems to show up when there are a certain number of concurrent sessions. This is on v0.5.26:
That same command works successfully with livepeer_bench v0.5.17. Test case is reproducible, happy to bisect and find the source of the issue if that's helpful!
The text was updated successfully, but these errors were encountered: