New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
502/503 During deploys and/or pod termination #814
Comments
Hi,
The best way to work around this for now is to use NodePort service with "mode instance" for our ingress. (You can create a separate nodePort service along with your headless service). An more robust way might be support this with ReadinessGate and dynamicAdmissionControllers, i haven't have a deeper thought about this though, will do some prototyping to see whether it works 😄 |
Hi @M00nF1sh, thanks for the response. That would work, however it gets us back to the exact problem I'm trying to solve. We have a large amount of instances, in various node groups. This quickly balloons the amount of attached instances to the target group. The pods we'd like to direct traffic to belong to a small instance group -- so this would work, if we could select those ec2 instances (k8s nodes) directly. Is there a way to filter or limit which cluster nodes get attached (via kubernetes node label, ec2 tag, or otherwise) ? |
@justinwalz |
@M00nF1sh That would work, we can add a node label to exclude a fleet of instances for specific service ALBs. Would it be possible to also have the inverse, maybe |
@justinwalz It's possible(and make sense to me) to have the inverse, but i tend to not have it since it's not in k8s core. By only having |
Got it - no problem. Thanks for the help on this! |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
I've faced a similar issue running |
I am facing a similar issue. My kubernetes services scale up when the number of requests per second reach a certain value. But I get random 502 errors sometime during the peak times.
I get random 502 errors even when all the containers are healthy and are not even restarting. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
We're getting this every single deploy. What are the workarounds available? apiVersion: v1
kind: Service
metadata:
name: fortio
annotations:
alb.ingress.kubernetes.io/healthcheck-path: /
spec:
ports:
- port: 80
targetPort: 8080
protocol: TCP
type: NodePort
selector:
app: fortio |
@douglaz See this thread which covers the same issue with a couple of solutions: #1064 tldr:
|
@jorihardman Could you try if the pod readiness gates feature I added solves the problem? You would need to build a custom docker image from master since it's not released yet: |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@shyr: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@shyr |
Hi! First of all, I appreciate the community and all their work on this project, it is very helpful and a good solution to route directly to pods from an ALB.
However, during testing, I've noticed intermittent 502/503s during deploys of our statefulset. My current hypothesis is that during a deploy, the statefulset controller kills a pod in need of updates, and there is latency between this happening and the alb ingress controller updating the alb target to
draining
. During this delay, requests are sent to the terminating pod and return 502 (our nginx sidecar) and/or 503 (aws alb).Has anyone else seen this problem, and potentially have a solution for it? Ideally we'd remove the pod from the alb target group before killing the pod, if this is in fact what is happening.
I have the following
Service
andIngress
:Ingress
The text was updated successfully, but these errors were encountered: