Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Degraded status when starting an OCP private cluster deployed on AWS #467

Closed
htkmts opened this issue Sep 25, 2020 · 5 comments
Closed

Degraded status when starting an OCP private cluster deployed on AWS #467

htkmts opened this issue Sep 25, 2020 · 5 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@htkmts
Copy link

htkmts commented Sep 25, 2020

When starting an OCP 4.3 private cluster deployed on AWS, the cluster ingress operator stays with "degraded" status.
(By "private cluster", I mean the OCP cluster cannot access the internet.)

It seems that the operator is trying to access "https://tagging.us-east-1.amazonaws.com" and this is causing the problem.

Q1. Are there any workaround for this?
Q2. Is it MANDATORY for the operator to be able to access the internet? (This makes it impossible for any Openshift clusters to be private...)

Thanks.

oc get dnsrecords -n openshift-ingress-operator -o yaml
The DNS provider failed to ensure the record: failed to find hosted zone for record: failed to get tagged resources: RequestError: send request failed
caused by: Post https://tagging.us-east-1.amazonaws.com/: dial tcp 52.94.224.124:443: i/o timeout
reason: ProviderError
status: "True"
type: Failed

oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.3.25 True False 59d Error while reconciling 4.3.25: the cluster operator ingress is degraded

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 2, 2021
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 2, 2021
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this as completed Aug 1, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 1, 2021

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rarguello
Copy link

I have the same issue, this is the message on the installation log:

2021-09-22T15:21:08.439Z ERROR operator.init.controller controller/controller.go:218 Reconciler error {"controller": "dns_controller", "name": "default-wildcard", "namespace": "openshift-ingress-operator", "error": "failed to create DNS provider: failed to create AWS DNS manager: failed to validate aws provider service endpoints: [failed to list route53 hosted zones: RequestError: send request failed\ncaused by: Get \"https://route53.amazonaws.com/2013-04-01/hostedzone?maxitems=1\": dial tcp 52.46.154.111:443: i/o timeout, failed to get group tagging resources: RequestError: send request failed\ncaused by: Post \"https://tagging.us-east-1.amazonaws.com/\": dial tcp 52.94.233.76:443: i/o timeout]"}

I'm trying to do an air-gapped IPI installation on AWS, using OKD 4.7.0-0.okd-2021-09-19-013247.

The machine that executes openshift-install needs to have Internet, but all the instances on the VPC are on a private subnet without a NAT GW, so they don't have Internet access at all. I'm using an internal server as a Registry Mirror. I have configured EC2, S3, and ELB VPC Endpoints. The S3 VPC Endpoint is a Gateway endpoint, the other 2 are Interface endpoints.

I don't think you could have a VPC Endpoint for "tagging", so that's what failing at the end of the installation:

Post "https://tagging.us-east-1.amazonaws.com/": dial tcp 52.94.233.76:443: i/o timeout"

Any ideas for a workaround?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

3 participants