-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lorax Hanging in production #149
Comments
Hey @karlbernard2, thanks for reporting. It sounds like there's a deadlock that's occurring here that may be triggered under very specific conditions (requests coming it at just the wrong time). Can you share any additional details about your setup (args to One thing that stands out from the logs you provided is that the adapter I'll try and take a closer look, but if there's anything you can provide to help me repro that would be helpful. |
The fact that the Something you could try:
If you're able to run that on one of the hung pods, that would be very helpful for debugging the error. |
Thanks for the detailed instructions,, I'll try to do that. Here's how I launched teh container |
@tgaddair My first attempt to replicate didn;t have the same issue (althouh earlier today I got it all the time, so will try more.) However, since you talked about offloading that shouln't happen, you might find these logs strange:
Screenshot might be easier to read: We are only dealing with 4 adapters |
Thanks for the back trace @karlbernard2, this is very helpful! Definitely looks like the hanging is occurring the SGMV kernel. In the short term, you can try disabling SGMV with an environment variable: I'll see if I can repro this behavior with the adapters you're using here. |
Hey @karlbernard2, update on this: I tried running some stress tests today with a variety of request patterns to try and replicate your setup, but was unable to trigger the hanging behavior. Can you share a few more details about your environment:
Thanks. |
We’re running H100 on NebiusAI Kubernetes. I’ll have to get back to you on Tuesday with info on drivers. |
Hey @karlbernard2, I managed to track down the root cause of the deadlock, and has been fixed in #156. |
System Info
ghcr.io/predibase/lorax:latest
Running within Kubernetes on H100
Information
Tasks
Reproduction
When putting the instance in production, while it receive simultaneous request for different adapters, it will just hang there.
/generate and /health will stop answering
but /info and /docs will continue to be available.
There's no error getting displayed in the logs
Not sure what's the best way to diagnose what the issue could be, but looks to me like it's having some issues fetching multiple adapters in parallel and processing request queued at the same time?
Expected behavior
Should handle live requests for multiple adapters
The text was updated successfully, but these errors were encountered: