-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lingo should retry on proxy failure #48
Comments
This can easily create exponential load. Can you share some scenarios that make sense for you? |
the scenarios I encountered so far were:
The main use case would be for providers that run Lingo as a managed service or internally to internal end-users and need to minimize the amount of errors returned to their end-users |
It is possible we could do this an the ingress layer into the cluster as well. The biggest source of 503s has been misconfigured termination grace periods on model backends (which can take a long time to process all of their pending requests - longer than the 30s default). This should be mostly solvable by making sure we have knobs turned correctly with max-in-flight and a gracious termination period. |
There are many reasons a backend model server can fail to serve a request. If lingo adds retries on failure it could improve the overall reliability of the system.
The text was updated successfully, but these errors were encountered: