Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lingo should retry on proxy failure #48

Closed
nstogner opened this issue Jan 8, 2024 · 3 comments · Fixed by #61
Closed

Lingo should retry on proxy failure #48

nstogner opened this issue Jan 8, 2024 · 3 comments · Fixed by #61

Comments

@nstogner
Copy link
Contributor

nstogner commented Jan 8, 2024

There are many reasons a backend model server can fail to serve a request. If lingo adds retries on failure it could improve the overall reliability of the system.

@alpe
Copy link
Contributor

alpe commented Jan 8, 2024

This can easily create exponential load. Can you share some scenarios that make sense for you?

@samos123
Copy link
Contributor

samos123 commented Jan 8, 2024

the scenarios I encountered so far were:

  • Backend going away and proxy still routing requests to a backend that is no longer there
  • Client not handing a retry and missing responses. Arguably the user should retry but sadly we have no control over the user.

The main use case would be for providers that run Lingo as a managed service or internally to internal end-users and need to minimize the amount of errors returned to their end-users

@nstogner
Copy link
Contributor Author

It is possible we could do this an the ingress layer into the cluster as well.

The biggest source of 503s has been misconfigured termination grace periods on model backends (which can take a long time to process all of their pending requests - longer than the 30s default). This should be mostly solvable by making sure we have knobs turned correctly with max-in-flight and a gracious termination period.

nstogner added a commit that referenced this issue Feb 15, 2024
Allow for retrying failed requests to backends. Default to 1 retry per
lingo-request.

Fixes #48 

Builds on work done by @alpe in #64 and #51.

Co-authored-by: Alex Peters <alpe@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants